I have a text file that contains a long list of entries (one on each line). Some of these are duplicates, and I would like to know if it is possible (and if so, how) to remove any duplicates. I am interested in doing this from within vi/vim, if possible.
- 1Looks like a duplicate of stackoverflow.com/questions/746689/…Nathan Fellman– Nathan Fellman2010-02-18 21:01:58 +00:00Commented Feb 18, 2010 at 21:01
- 6This one is 1 year old; that one is 10 months. So, other way around.Sydius– Sydius2010-02-26 19:50:49 +00:00Commented Feb 26, 2010 at 19:50
- 1@Sydius consensus now is to prioritize upvote count (which you also have more of): meta.stackexchange.com/questions/147643/… And those are not duplicates, that one does not mention Vim :-)Ciro Santilli OurBigBook.com– Ciro Santilli OurBigBook.com2016-08-08 08:38:18 +00:00Commented Aug 8, 2016 at 8:38
16 Answers
If you're OK with sorting your file, you can use:
:sort u 10 Comments
:%!uniq to simply remove duplicate entries without sorting the file.ua$b$a$ does nothingV or something similar, then issue the command.Try this:
:%s/^\(.*\)\(\n\1\)\+$/\1/ It searches for any line immediately followed by one or more copies of itself, and replaces it with a single copy.
Make a copy of your file though before you try it. It's untested.
7 Comments
From command line just do:
sort file | uniq > file.new 5 Comments
:sort u was hanging on my large file. This worked very quickly and perfectly. Thank you!'uniq' is not recognized as an internal or external command, operable program or batch file.awk '!x[$0]++' yourfile.txt if you want to preserve the order (i.e., sorting is not acceptable). In order to invoke it from vim, :! can be used.
I would combine two of the answers above:
go to head of file sort the whole file remove duplicate entries with uniq 1G !Gsort 1G !Guniq If you were interested in seeing how many duplicate lines were removed, use control-G before and after to check on the number of lines present in your buffer.
1 Comment
'uniq' is not recognized as an internal or external command, operable program or batch file.g/^\(.*\)$\n\1/d Works for me on Windows. Lines must be sorted first though.
1 Comment
aaaa followed by aaaabb will delete aaaa erroneously.If you don't want to sort/uniq the entire file, you can select the lines you want to make uniq in visual mode and then simply: :sort u.
1 Comment
:5,10 sort uSelect the lines in visual-line mode (Shift+v), then :!uniq. That'll only catch duplicates which come one after another.
2 Comments
Regarding how Uniq can be implemented in VimL, search for Uniq in a plugin I'm maintaining. You'll see various ways to implement it that were given on Vim mailing-list.
Otherwise, :sort u is indeed the way to go.
Comments
From here this will remove adjacent and non-adjacent duplicates without sorting:
:%!awk '\!a[$0]++' This technically uses something outside of vim, but is called from within vim (and therefore only works in linux which has awk).
To do this entirely from within vim you can do this using a macro and the norm command to execute it on every line. On linux, this was fast, but on windows it took an oddly long time. Disabling plugins using vim -u NONE seemed to help somewhat.
qa # create macro in register 'a' y$ # yank the current line :.+1,$g/<ctrl-r>0/d # from the next line to the end of file, delete any pattern that matches q # end of macro :%norm! @a # apply macro on every line in file. Note this doesn't remove empty lines so performing
:g/^$/d to remove any blank spaces may be useful.
Comments
This version only removes repeated lines that are contigous. I mean, only deletes consecutive repeated lines. Using the given map the function does note mess up with blank lines. But if change the REGEX to match start of line ^ it will also remove duplicated blank lines.
" function to delete duplicate lines function! DelDuplicatedLines() while getline(".") == getline(line(".") - 1) exec 'norm! ddk' endwhile while getline(".") == getline(line(".") + 1) exec 'norm! dd' endwhile endfunction nnoremap <Leader>d :g/./call DelDuplicatedLines()<CR> Comments
This command got me a buffer without any duplicate lines without sorting, and it shouldn't be very hard to research why it works or how it could work better:
:%!python3.11 -c 'exec("import fileinput\nLINES = []\nfor line in fileinput.input():\n line = line.splitlines()[0]\n if line not in LINES:\n print(line)\n LINES.append(line)\n")'