Linked Questions
18 questions linked to/from How to remove duplicate lines inside a text file?
0 votes
1 answer
3k views
Delete duplicate entries in a text file [duplicate]
I created a txt file using two requests, one LDAP and one SQL. Results of the two requests are stored in the same txt file. The txt file looks like this : [email protected] [email protected] user3@...
0 votes
1 answer
2k views
How can i remove duplicates from the output of tshark output [duplicate]
sudo tshark -i ppp0 'tcp port 80 \ and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' \ -R'http.request.method == "GET" && http.request.uri contains "/ABC/...
1 vote
2 answers
945 views
Print only one value from duplicates [duplicate]
I have following content in a file. $ cat file.txt code-coverage-api jsch cloudbees-folder apache-httpcomponents-client-4-api apache-httpcomponents-client-4-api jsch apache-httpcomponents-client-...
1 vote
1 answer
1k views
remove all duplicates from a text file without sort [duplicate]
simply put, I have a file with lines of text that are unknown to me, something like abaa dddd bbbb cccc abaa aaaa abaa the result I'd like to get is dddd bbbb cccc aaaa where all the duplicates are ...
-1 votes
1 answer
682 views
How to remove duplicate lines from file? [duplicate]
I have file for example 'a' aaa aaa bbb ccc ccc bbb ddd After executing uniq a c i get file 'c' aaa bbb ccc bbb ddd How to delete duplicate bbb lines?
288 votes
5 answers
79k views
Why is using a shell loop to process text considered bad practice?
Is using a while loop to process text generally considered bad practice in POSIX shells? As Stéphane Chazelas pointed out, some of the reasons for not using shell loop are conceptual, reliability, ...
76 votes
4 answers
41k views
How does awk '!a[$0]++' work?
This one-liner removes duplicate lines from text input without pre-sorting. For example: $ cat >f q w e w r $ awk '!a[$0]++' <f q w e r $ The original code I have found on the internets read: ...
9 votes
5 answers
5k views
Remove duplicate lines from a file that contains a timestamp
This question/answer has some good solutions for deleting identical lines in a file, but won't work in my case since the otherwise duplicate lines have a timestamp. Is it possible to tell awk to ...
18 votes
1 answer
14k views
How to remove duplicate lines in a large multi-GB textfile?
My question is similar to this question but with a couple of different constraints: I have a large \n delimited wordlist -- one word per line. Size of files range from 2GB to as large as 10GB. I ...
10 votes
6 answers
9k views
Find duplicated column value in CSV
im trying to find duplicate ids from a large csv file, there is just on record per line but the condition to find a duplicate will be the first column. <id>,<value>,<date> example....
4 votes
2 answers
2k views
Find the top 5 (according to number of packets sent) source IP addresses
I am doing an assignment, I'm asked to answer certain questions based on pcap file that I'm given. One of the question is to find the top 5 (according to number of packets sent) source IP addresses. ...
3 votes
3 answers
3k views
awk combine two big files and remove duplicated lines [closed]
I have two files: A.txt - about 90GB B.txt - about 80GB I want to combine two files and remove duplicated lines. How do I do this? If commands other than awk are better for this job, ...
2 votes
2 answers
2k views
Removing duplicates in a large text list
I've searched around the internet and stackexchange for this. Even though there are lots of similar topics, I haven't found a solution yet. So, I have a quite large list (approx. 20GB), which ...
1 vote
1 answer
1k views
How to use awk to print nth column and remove duplicates?
I am using awk below to print 8th column and remove duplicates in that very column. awk -F "," '{print $8}' filecsv | awk '!NF || !seen[$0]++' How to do it with just one awk instead running awk twice ...
1 vote
2 answers
924 views
Append words to wordlist with sort -u avoinding duplicata
I have 2 txt file called one.txt with duplicata: yesterday yesterday today today tomorrow tomorrow and the second txt called two.txt with duplicata: mike mike paul paul tomorrow tomorrow Using the ...