190

I have a text file that contains a long list of entries (one on each line). Some of these are duplicates, and I would like to know if it is possible (and if so, how) to remove any duplicates. I am interested in doing this from within vi/vim, if possible.

3

16 Answers 16

439

If you're OK with sorting your file, you can use:

:sort u 
Sign up to request clarification or add additional context in comments.

10 Comments

If sorting is unacceptable, use :%!uniq to simply remove duplicate entries without sorting the file.
once you use the command the whole file changes? how do you go back? I already saved the file by mistake ... my bad
Just use Vim's undo command: u
@cryptic0, uniq won't work unless the duplicates are sorted a$b$a$ does nothing
You can select the lines you want sorted and deduplicated first with V or something similar, then issue the command.
|
44

Try this:

:%s/^\(.*\)\(\n\1\)\+$/\1/ 

It searches for any line immediately followed by one or more copies of itself, and replaces it with a single copy.

Make a copy of your file though before you try it. It's untested.

7 Comments

@hop Thanks for testing it for me. I didn't have access to vim at the time.
this hightlights all the duplicate lines for me but doesn't delete, am I missing a step here?
I'm pretty sure this will also highlight a line followed by a line that has the same "prefix" but is longer.
The only issue with this is that if you have multiple duplicates (3 or more of the same lines), you have to run this many times until all dups are gone since this only removes them one set of dups at a time.
Another drawback of this: this won't work unless your duplicate lines are already next to each other. Sorting first would be one way of ensuring they're next to each other. At that point, the other answers are probably better.
|
32

From command line just do:

sort file | uniq > file.new 

5 Comments

This was very handy for me for a huge file. Thanks!
Couldn't get the accepted answer to work, as :sort u was hanging on my large file. This worked very quickly and perfectly. Thank you!
'uniq' is not recognized as an internal or external command, operable program or batch file.
Yes -- I tried this technique on a 2.3 GB file, and it was shockingly quick.
@hippietrail You are on windows PC? Maybe you can use cygwin.
15

awk '!x[$0]++' yourfile.txt if you want to preserve the order (i.e., sorting is not acceptable). In order to invoke it from vim, :! can be used.

3 Comments

This is lovely! Not needing to sort is exactly what I was looking for!
what does it do?
This can also be done in perl if it strikes your fancy perl -nle 'print unless $seen{$_}++' yourfile.txt
6

I would combine two of the answers above:

go to head of file sort the whole file remove duplicate entries with uniq 1G !Gsort 1G !Guniq 

If you were interested in seeing how many duplicate lines were removed, use control-G before and after to check on the number of lines present in your buffer.

1 Comment

'uniq' is not recognized as an internal or external command, operable program or batch file.
6
g/^\(.*\)$\n\1/d 

Works for me on Windows. Lines must be sorted first though.

1 Comment

This will delete a line following a line which is it's prefix: aaaa followed by aaaabb will delete aaaa erroneously.
5

If you don't want to sort/uniq the entire file, you can select the lines you want to make uniq in visual mode and then simply: :sort u.

1 Comment

If you know the line numbers you want sorted to unique you can prefix the starting and ending line numbers, eg. if you want to sort+unique lines 5 through 10 the command would be :5,10 sort u
4

Select the lines in visual-line mode (Shift+v), then :!uniq. That'll only catch duplicates which come one after another.

2 Comments

Just to note this will only work on computers with the uniq program installed i.e. Linux, Mac, Freebsd etc
This will be the best answer to those who don't need sorting. And if you are windows user, consider to try Cygwin or MSYS.
1

Regarding how Uniq can be implemented in VimL, search for Uniq in a plugin I'm maintaining. You'll see various ways to implement it that were given on Vim mailing-list.

Otherwise, :sort u is indeed the way to go.

Comments

1

An alternative method that does not use vi/vim (for very large files), is from the Linux command line use sort and uniq:

sort {file-name} | uniq -u 

Comments

1

From here this will remove adjacent and non-adjacent duplicates without sorting:

:%!awk '\!a[$0]++' 

This technically uses something outside of vim, but is called from within vim (and therefore only works in linux which has awk).

To do this entirely from within vim you can do this using a macro and the norm command to execute it on every line. On linux, this was fast, but on windows it took an oddly long time. Disabling plugins using vim -u NONE seemed to help somewhat.

qa # create macro in register 'a' y$ # yank the current line :.+1,$g/<ctrl-r>0/d # from the next line to the end of file, delete any pattern that matches q # end of macro :%norm! @a # apply macro on every line in file. 

Note this doesn't remove empty lines so performing

:g/^$/d 

to remove any blank spaces may be useful.

Comments

0
:%s/^\(.*\)\(\n\1\)\+$/\1/gec 

or

:%s/^\(.*\)\(\n\1\)\+$/\1/ge 

this is my answer for you ,it can remove multiple duplicate lines and only keep one not remove !

Comments

0

I would use !}uniq, but that only works if there are no blank lines.

For every line in a file use: :1,$!uniq.

Comments

0

This version only removes repeated lines that are contigous. I mean, only deletes consecutive repeated lines. Using the given map the function does note mess up with blank lines. But if change the REGEX to match start of line ^ it will also remove duplicated blank lines.

" function to delete duplicate lines function! DelDuplicatedLines() while getline(".") == getline(line(".") - 1) exec 'norm! ddk' endwhile while getline(".") == getline(line(".") + 1) exec 'norm! dd' endwhile endfunction nnoremap <Leader>d :g/./call DelDuplicatedLines()<CR> 

Comments

0

This command got me a buffer without any duplicate lines without sorting, and it shouldn't be very hard to research why it works or how it could work better:

:%!python3.11 -c 'exec("import fileinput\nLINES = []\nfor line in fileinput.input():\n line = line.splitlines()[0]\n if line not in LINES:\n print(line)\n LINES.append(line)\n")' 

Comments

-1

This worked for me for both .csv and .txt

awk '!seen[$0]++' <filename> > <newFileName>

Explanation: The first part of the command prints unique rows and the second part i.e. after the middle arrow is to save the output of the first part.

awk '!seen[$0]++' <filename>

>

<newFileName>

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.