2

My file is like this:

alice, bob bob, cat cat, dennis cat, bob dennis, alice 

I want to remove lines where same words have been repeated in reverse order. In this example, bob, cat and cat, bob are repeated, so cat bob should be removed and my output should be

alice, bob bob, cat cat, dennis dennis, alice 

How can I do this?

2
  • Any restrictions regarding the other lines? I.e. can the fields be resorted and the lines be resorted, too? Commented Aug 4, 2019 at 16:18
  • no such restrictions. sorting can be done any number of times.. Commented Aug 4, 2019 at 16:28

3 Answers 3

3

You could use a hash that is keyed on the sorted elements:

$ perl -lne 'print unless $h{join ",", sort split /, /, $_}++' file alice, bob bob, cat cat, dennis dennis, alice 

For exactly 2 fields, something like this might sufficce

$ awk -F', ' '!seen[$2 FS $1]; {seen[$0]++}' file alice, bob bob, cat cat, dennis dennis, alice 
1
  • idk what the perl script does but that awk script will use a lot more memory than necessary, see unix.stackexchange.com/a/533876/133219 for the idiomatic awk approach. Commented Aug 4, 2019 at 22:45
1

The idiomatic awk answer:

$ awk -F', ' '!seen[$1>$2 ? $1 FS $2 : $2 FS $1]++' file alice, bob bob, cat cat, dennis dennis, alice 

The general approach for any number of fields is to sort them and use the sorted list as the index to seen[].

2
  • 1
    Can you please explain how logic? Commented Aug 8, 2019 at 20:28
  • 1
    @DeathMetal It creates a common index out of each pair of key fields by putting them in greatest-first order so A B and B A both become the index B A. Then it just tests to see if the given index has been seen before - first time either A B or B A is encountered in the input seen["B A"]++ is 0, 2nd time it's 1, and so on. The ! at the front ensures that the default action of printing the current input line only occurs when seen["B A"]++ is zero, i.e. the first time its seen in the input. Commented Aug 8, 2019 at 20:52
-1

This sorts every line by its fields, then the file and pick unique lines only

while read line do echo $line | tr ' ,' '\n' | sort | tr '\n' ',' done < 1 | sed -e 's/^,//' -e 's/,$//' -e 's/,,/\n/g' | sort -u 
1

You must log in to answer this question.