Using sort
As Ed states in his comment, your sort command is sorting on the third field, when in fact you only have two fields (the : is the field separator). So to fix it, replace 3 with 2 for the key.
However, then the original record order in the source file gets messed up, when the records being sorted by their key value rather than by the line/record number:
$ sort -u -t':' -k2,2 test.txt 1:A 2:B 6:C 5:a 4:b $
Which is probably not what you want. Nevertheless, this is easily fixed by piping the output through sort again:
$ sort -u -t':' -k2,2 test.txt | sort 1:A 2:B 4:b 5:a 6:C $
Note: As you say that you have a large file then, in order to speed things up, you may want to consider using the --parallel flag1:
sort --parallel=<n> -u -t':' -k2,2 test.txt | sort --parallel=<n>
When <n> is the number of cores that you have available.
Using awk
Expanding upon your example file, if the original data is in a file called test.txt, like this:
1:A 2:B 3:A 4:b 5:a 6:C
and, again, treating the : as a field separator, then you could use awk2.
For example this line:
awk 'BEGIN{FS=":"}{if (!seen[$2]++)print $0}' test.txt
Gives the following result:
$ awk 'BEGIN{FS=":"}{if (!seen[$2]++)print $0}' test.txt 1:A 2:B 4:b 5:a 6:C $
You can see how this works by looking at the logic, using
$ awk 'BEGIN{FS=":"}{print !seen[$2]++}' test.txt 1 1 0 1 1 1 $
- First, the field separator is specified with
FS=":". - Second, the negation operator gives a "true" result for a second field entry that hasn't yet been seen.
- Finally, the
print $0 prints the whole record, i.e. the current line.
Putting this into a shell script3 rather than an awk script gives:
#!/bin/sh awk -F':' ' (!seen[$2]++) { print $0 } ' "$1"
References:
1 This answer to How to sort big files?
2 This answer to Keeping unique rows based on information from 2 of three columns
3 This answer to Specify other flags in awk script header
-k3,3you're telling sort to sort by the 3rd field when the input has 2 fields.