Identify lines with duplicate column 1 entries and swap columns if found

Question

We have the following text file:

172.55.34.48 172.55.33.95 172.55.32.163 172.55.34.48 172.55.32.163 172.55.33.95

What we need to do is to swap the two columns if the first column is found to contain a value already encountered previously on that column. For each such value (IP addresses in this case), the max occurence is two.

In the above example, we need to swap 172.55.33.95 with 172.55.32.163 as 172.55.32.163 was already found on the previous line.

I tried

awk 'prev && ($1 != prev) {print seen[prev]} {seen[$1] = $0; prev = $1} END {print seen[$1]}' /tmp/new.txt

but this helps to remove lines where the column 1 entry was already found before.

Is it still a "duplicate" if the fields of a line match a previous line on opposite fields? E.g., is 0.0.0.0 1.1.1.1 a duplicate of 1.1.1.1 0.0.0.0? What if there's more than one duplicate, what should happen to those lines? — kos
– kos, Commented Oct 9, 2024 at 11:01
For each ip ,the max duplicate times is 2,we just don't want the first column duplicate — peng xiao
– peng xiao, Commented Oct 9, 2024 at 11:15
After you swap 172.55.32.163 and 172.55.33.95 is 172.55.33.95 then considered to have appeared in the first column or not? — Ed Morton
– Ed Morton, Commented Oct 9, 2024 at 12:20

AdminBee · Accepted Answer · 2024-10-09 12:12:28Z

The answer might be as simple as this awk program:

$ awk '{found[$1]++; if (found[$1]>1) {buf=$1; $1=$2; $2=buf}} 1' input.txt 172.55.34.48 172.55.33.95 172.55.32.163 172.55.34.48 172.55.33.95 172.55.32.163

This will store the occurence count of each column 1 value in an array found by increasing the corresponding array value. If the value is found to be larger than one, this means the current value was already encountered and the columns need swapping.

To do so, the column 1 value is stored in a buffer, replaced by the column 2 value, and the column 2 value in turn by the buffered value. In the end, the current line is printed including all modifications (if any), which is the meaning of the seemingly "stray" 1.

This simple approach has one drawback: If the new column 1 (previously column 2) value occurs in column 1 again on a later line or did already occur on a previous line, this won't be caught. An improved version, which accounts for occurence on later lines and and omits lines where swapping would also produce a duplicate column 1 entry could look as follows:

awk '{f[$1]++; if (f[$1]>1) {if (f[$2]>0) {next}; buf=$1; $1=$2; $2=buf; f[$1]++; f[$2]--}} 1' input.txt

Ed Morton · Accepted Answer · 2024-10-09 12:24:50Z

Your requirements aren't clear and your example doesn't cover all possibilities of what you might mean but, using any awk, this will implement one thing you might mean:

$ awk 'seen[$1]++{t=$2; $2=$1; $1=t} 1' file 172.55.34.48 172.55.33.95 172.55.32.163 172.55.34.48 172.55.33.95 172.55.32.163

or maybe you want:

$ awk 'seen[$1]++{t=$2; $2=$1; $1=t; seen[$1]++} 1' file 172.55.34.48 172.55.33.95 172.55.32.163 172.55.34.48 172.55.33.95 172.55.32.163

or:

$ awk 'seen[$1]++{t=$2; $2=$1; $1=t; seen[$2]--; seen[$1]++} 1' file 172.55.34.48 172.55.33.95 172.55.32.163 172.55.34.48 172.55.33.95 172.55.32.163

or:

$ awk '$1==prev{t=$2; $2=$1; $1=t} {prev=$1} 1' file 172.55.34.48 172.55.33.95 172.55.32.163 172.55.34.48 172.55.33.95 172.55.32.163

kos · Accepted Answer · 2024-10-10 06:06:46Z

In Perl you could do:

perl -lae 'BEGIN {$, = " "} scalar(grep {$_ eq $F[0]} @buf) ? print(reverse(@F)) : do {print(@F); push(@buf, $F[0])}' input

-l: auto-removes and auto-adds the default record field separator (\n) before and after each evaluation of the script passed to -e;
-a: auto-splits each line on the default input field separator ( ).

The script will set the output field separator to , then, for each line, it will check if the first field is stored already in @buf.

If it is, it will reverse the current record and print it; otherwise, it will print the current record and add the first field to @buf.

% perl -lae 'BEGIN {$, = " "} scalar(grep {$_ eq $F[0]} @buf) ? print(reverse(@F)) : do {print(@F); push(@buf, $F[0])}' input 172.55.34.48 172.55.33.95 172.55.32.163 172.55.34.48 172.55.33.95 172.55.32.163

Stack Exchange Network

Identify lines with duplicate column 1 entries and swap columns if found

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

Identify lines with duplicate column 1 entries and swap columns if found

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions