This is an answer to this SO question: Vba comparing 2 columns of data.
In that question the OP code is very slow (also not working properly but the problem is fixed)
I spent some time to implement a version with decent performance at 1M rows+
Note: I know Activesheet is discouraged (and I avoid it) but the Sub is quite generic (and it can be made even more so); the little utility can be convenient to users for quick checks - simply compare values on current sheet
Description
- Sets up 2 dictionaries (late-bound)
- Sets up the main UsedRange (ur) variable
- Deletes the results columns C and D
Setups 4 arrays
arrAfor the first column to be compared (Col A)arrBfor the second column to be compared (Col B)arrCfor the 1st result column C (from Col A, which shows vals not found in col B)arrDfor the 2nd result column D (from Col B, showing vals not found in col A)- (Each column can be a constant, or Optional Param with default vals)
Load data to dictionaries:
Forloop 1: adds all elements in Col A as keys, in dictionary dColAForloop 2: adds all elements in Col B as keys, in dictionary dColB
- Compare dictionary items
Forloop 3: for each itm in dColA checks their existence in dColB- If an item is not found, it places it in the 3th array (arrC)
Forloop 4: for each itm in dColB checks their existence in dColA- If an item is not found, it places it in the 4th array (arrD)
- At the end, it places both
arr3andarr4back on the range (Col C and D respectively)
Option Explicit Public Sub Compare2Cols() Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary") Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary") Dim ur As Range: Set ur = ActiveSheet.UsedRange ur.Columns("C:D").Delete Dim arrA As Variant: arrA = ur.Columns("A") Dim arrB As Variant: arrB = ur.Columns("B") Dim arrC As Variant: arrC = ur.Columns("C") Dim arrD As Variant: arrD = ur.Columns("D") Dim itm As Variant For Each itm In arrA dColA(itm) = 0 Next For Each itm In arrB dColB(itm) = 0 Next Dim r As Long For Each itm In dColA r = r + 1 If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2 Next r = 0 For Each itm In dColB r = r + 1 If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1 Next ur.Columns("C") = arrC ur.Columns("D") = arrD End Sub Test results - dictionaries:
Compare2ColsUsingDictionaries() - Rows: 10,001; Time: 0.047 sec Compare2ColsUsingDictionaries() - Rows: 20,001; Time: 0.109 sec Compare2ColsUsingDictionaries() - Rows: 30,001; Time: 0.156 sec Compare2ColsUsingDictionaries() - Rows: 40,001; Time: 0.234 sec Compare2ColsUsingDictionaries() - Rows: 50,001; Time: 0.296 sec Compare2ColsUsingDictionaries() - Rows: 100,001; Time: 1.232 sec Compare2ColsUsingDictionaries() - Rows: 500,001; Time: 31.934 sec Compare2ColsUsingDictionaries() - Rows: 1,048,576; Time: 126.797 sec Test results - collections (suggested by @juvian, about 4 times faster for large data)
Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.719 sec Image of test results
.
Other versions (combinations of reduced For loops) I've already tried:
.
V2 (combined For 1 and 2) - significant increase in execution time from 2 min to 3
Public Sub Compare2ColsV2() Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary") Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary") Dim ur As Range: Set ur = ActiveSheet.UsedRange ur.Columns("C:D").Delete Dim arrA As Variant: arrA = ur.Columns("A") Dim arrB As Variant: arrB = ur.Columns("B") Dim arrC As Variant: arrC = ur.Columns("C") Dim arrD As Variant: arrD = ur.Columns("D") Dim itm As Variant, r As Long For r = 1 To UBound(arrA) 'based on UsedRange (same LastRow) dColA(arrA(r, 1)) = 0 dColB(arrB(r, 1)) = 0 Next r = 0 For Each itm In dColA r = r + 1 If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2 Next r = 0 For Each itm In dColB r = r + 1 If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1 Next ur.Columns("C") = arrC ur.Columns("D") = arrD End Sub .
V3 (combined For 2 and 4) - No improvement (slower by about 3 secs);
Public Sub Compare2ColsV3() Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary") Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary") Dim ur As Range: Set ur = ActiveSheet.UsedRange ur.Columns("C:D").Delete Dim arrA As Variant: arrA = ur.Columns("A") Dim arrB As Variant: arrB = ur.Columns("B") Dim arrC As Variant: arrC = ur.Columns("C") Dim arrD As Variant: arrD = ur.Columns("D") Dim itm As Variant, r As Long For Each itm In arrA dColA(itm) = 0 Next For Each itm In arrB dColB(itm) = 0 r = r + 1 If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1 Next r = 0 For Each itm In dColA r = r + 1 If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2 Next ur.Columns("C") = arrC ur.Columns("D") = arrD End Sub Any performance improvements or suggestions are appreciated
PS. I also tested 2 types of loops over variant arrays, using For r = 1 To UBound(arrA) vs. For Each itm In arrA with no other changes - there is no noticeable difference in speed between them



