match value2 in 2 files if value 1 is exact match

Question

I have 2 files containing list. Column 1 is userIds & column 2 is associated values

# cat file1 e3001 75 n5244 30 w1453 500 #cat file2 d1128 30 w1453 515 n5244 30 e3001 55

Things to consider.

userIds may not be sorted exactly in both files
Number of userIds may vary in files

REQUIRED

firstly, userId from file1:column1 must match UserId in file2:column1
next compare their values in file1:column2 with file2:column2
print where values has variance. also extra userIds if any

OUTPUT:

e3001 has differnece, file1 value: 75 & file2 value: 55 w1453 has differnece, file1 value: 500 & file2 value: 515 d1128 is only present in filename: file1|file2

solution with 1liner-awk or bash loop is welcome

I'm trying to loop, but it's spitting garbage, guess there's some mislogic

#!/usr/bin/env bash ## VARIABLES FILE1=file1 FILE2=file2 USERID1=(`awk -F'\t' '{ print $1 }' ${FILE1}`) USERID2=(`awk -F'\t' '{ print $1 }' ${FILE2}`) USERDON1=(`awk -F'\t' '{ print $2 }' ${FILE1}`) USERDON2=(`awk -F'\t' '{ print $2 }' ${FILE2}`) for user in ${USERID1[@]} do for (( i = 0; i < "${#USERID2[@]}"; i++ )) #for user in ${USERID2[@]} do if [[ ${USERID1[$user]} == ${USERID2[i]} ]] then echo ${USERID1[$user]} MATCHES BALANCE FROM ${FILE1}: ${USERDON1[$i]} WITH BALANCE FROM ${FILE2}: ${USERDON2[$i]} else echo ${USERID1[$user]} fi done done

Below is copied file right from linux box. It's tab separated, but awk works with tab also, as far as I know.

#cat file1 e3001 55 n5244 30 w1453 515

@RudiC Hi Rudic, I updated what I'm trying in code, but short of some logic I guess — Sollosa
– Sollosa, Commented Apr 10, 2022 at 14:51
Don't use all upper case variable names in awk or shell to avoid clashes with builtin names and obfuscate your code. See correct-bash-and-shell-script-variable-capitalization — Ed Morton
– Ed Morton, Commented Apr 10, 2022 at 19:08

terdon · Accepted Answer · 2022-04-10 15:59:00Z

Hmmm - your script takes the scenic route, so to speak. How about a simple awk approach? Like

awk ' NR==FNR {ARR[$1] = $2 F1 = FILENAME next } ($1 in ARR) {if ($2 != ARR[$1]) print $1 " has difference," \ F1 " value: " ARR[$1] \ " & " FILENAME " value: " $2 delete ARR[$1] next } {print $1 " is only present in filename: " FILENAME } END {for (a in ARR) print a " is only present in filename: " F1 } ' file[12] d1128 is only present in filename: file2 w1453 has difference, file1 value: 500 & file2 value: 515 e3001 has difference, file1 value: 75 & file2 value: 55

It reads all of file1 into an array, then, with every line in file2, checks $1 against the array indices, and, if present, prints the difference (or doesn't print if none), and deletes the array element (that delete may be missing in some awk implementations, BTW). If not present, print accordingly. In the END section, all remaining array elements are printed as they exist only in file1.

In my case, it's only printing this line d1128 is only present in filename: file2 — Sollosa
– Sollosa, Commented Apr 10, 2022 at 15:08
What awk version do you use? Did you copy the script verbatim? — RudiC
– RudiC, Commented Apr 10, 2022 at 15:41
Sollosa: With your NEW file1 (that from the third edit), there's NO differences; just the d1128 is missing! That's why my as well as @terdon 's approach are outputting just that line! What if you run it with your initial file1? — RudiC
– RudiC, Commented Apr 10, 2022 at 17:52
I really want to upvote this answer instead of posting my own that's the same approach but your insistence on using all upper case variable names is a real show-stopper. Since multiple people have already suggested you not use all upper case variable names and you keep doing it anyway there's no point commenting about that again so unfortunately I guess all I can do is post my own very similar answer.. — Ed Morton
– Ed Morton, Commented Apr 10, 2022 at 19:28

DanieleGrassini · Accepted Answer · 2022-04-10 16:31:53Z

Comment are self explanatory :

awk ' BEGIN {file1 = ARGV[1]; file2 = ARGV[2]} # Load all file1 contents NR == FNR {map[$1] = $2; next} # If $1 is not in m then this key is unique to file2 !($1 in map) {uniq[$1]; next} # If $1 is in m and the value differs there are delta # between the two files. Save it. $1 in map && map[$1] != $2 {diff[$1] = $2; next} # The two files have all the same data. {delete map[$1]} END { # Anything is in diff are in both files but # with different values for ( i in diff ) print i, "has difference,", file1, "value:", map[i], "&", file2, "value:", diff[i] # Anything is still in m is only in file 1 for ( i in map ) if (!(i in diff)) print i, "is only present in filename :", file1 # Anything is in uniq is unique to file2 for ( i in uniq ) print i, "is only present in filename :", file2 } ' file1 file2

terdon · Accepted Answer · 2022-04-10 22:44:50Z

The shell is a horrible tool for this sort of thing. Also, as a general rule, you should avoid CAPS for your shell variables in your shell scripts. Since, by convention, global environment shell variables are capitalized, this can lead to naming collisions and hard to debug issues. Finally, your script requires reading the file 4 separate times(!) and then processing the data.

With that said, here's another awk approach (frankly, RudiC's is better, but I'd already written this so I'm posting anyway):

$ awk '{ if(NR==FNR) { fn1=FILENAME; f1[$1]=$2; next } f2[$1]=$2; if($1 in f1){ if($2 != f1[$1]){ printf "%s is different; %s value: %s & %s value: %s\n", \ $1,fn1,$2,FILENAME,f1[$1] } } else{ print $1,"is only present in filename:", FILENAME } } END{ for(id in f1){ if( !(id in f2) ){print id,"is only present in afilename:",fn1} } }' file1 file2 d1128 is only present in filename: file2 w1453 is different; file1 value: 515 & file2 value: 500 e3001 is different; file1 value: 55 & file2 value: 75

Don't see c[...] used any further after being incremented? — RudiC
– RudiC, Commented Apr 10, 2022 at 15:56
Thanks, @RudiC, that was a remnant from a different approach I'd tried. Fixed now. Sadly, I didn't think of del to clear the element from the array so your approach is much cleverer and more elegant. — terdon
– terdon ♦, Commented Apr 10, 2022 at 16:00
@Sollosa then your files are not as you show. Are these Windows text files, perhaps? Did you ever open them on a Windows machine? Do they have \r characters? Try sed -n '/\r/p' file1 file2, if that prints anything, you need to remove the \r characters. — terdon
– terdon ♦, Commented Apr 10, 2022 at 17:01
@terdon no such characters, even files created on linux, I even tried to add FS as tab in begin section but no luck. I'll copy paste sample in below section of my question from my linux vm. — Sollosa
– Sollosa, Commented Apr 10, 2022 at 17:05

Ed Morton · Accepted Answer · 2022-04-10 19:32:32Z

Essentially the same solution as posted by RudiC but without the all upper case variable names and with a couple of other minor improvements to clarity:

$ cat tst.awk NR==FNR { file1[$1] = $2 next } $1 in file1 { if ( $2 != file1[$1] ) { printf "%s has difference, %s value: %s & value: %s\n", $1, ARGV[1], file1[$1], FILENAME, $2 } delete file1[$1] next } { print $1, "is only present in filename:", FILENAME } END { for ( id in file1 ) { print id, "is only present in filename:", ARGV[1] } }

$ awk -f tst.awk file1 file2 d1128 is only present in filename: file2 w1453 has difference, file1 value: 500 & value: file2 e3001 has difference, file1 value: 75 & value: file2

αғsнιη · Accepted Answer · 2022-04-10 19:58:44Z

awk 'function printUniq(Id, fName){ printf("%s is only present in filename: %s\n", Id, fName) } { fileName[nxtinput+0]=FILENAME } !nxtinput{ Ids[$1]=$2; next } ($1 in Ids){ if($2!=Ids[$1]) printf ("%s has difference, %s value: %s & %s value: %s\n",\ $1, fileName[0], Ids[$1], fileName[1], $2); delete Ids[$1]; next } { printUniq($1, fileName[1]) } END{ for(id in Ids) printUniq(id, fileName[0]) }' file1 nxtinput=1 file2

Stack Exchange Network

match value2 in 2 files if value 1 is exact match

5 Answers 5

You must log in to answer this question.

Linked

Hot Network Questions

match value2 in 2 files if value 1 is exact match

5 Answers 5

You must log in to answer this question.

Linked

Related

Hot Network Questions