Search a value for one column and retrieve concatenated values from other columns of a file

Question

I have a text file which has 4 columns and data looks like:

P_ID C_ID Code MSG 10 12 001 abcd 20 21 003 jklm 10 12 002 hijk

Here P_ID, C_ID, Code and MSG are columns.

A search needs to be made against column C_ID , if there are multiple entries for a same C_ID value but have different code and MSG column values then the final file should have results as shown in the expected output file below where code and msg column values are concatenated with comma in a single row.

Expected output should be like :

P_ID C_ID Code MSG 10 12 001,002 abcd,hijk 20 21 003 jklm

following is the output:

1: NF=4 $1=[P_ID] $2=[C_ID] $3=[Code] $4=[MSG] 2: NF=4 $1=[10] $2=[12] $3=[001] $4=[abcd] 3: NF=4 $1=[20] $2=[21] $3=[003] $4=[jklm] 4: NF=4 $1=[10] $2=[12] $3=[002] $4=[hijk]

The output for the solution provided in first answer was:

P_ID C_ID Code MSG 10 12 001 abcd 20 21 003 jklm 10 12 002 hijk

The awk command to check the column name and corresponding data is working fine however the first command provided as an answer is not giving the expected result.

What have you tried so far? Also, please update your question with real text files, currently you include images instead of text files. — thanasisp
– thanasisp, Commented Dec 7, 2020 at 6:25
I am not able to put the text in the question in the format and orientation i want so have added as an attachment , I am new to unix and is just trying to search for a column value and and get corresponding code and message column values against it. — Ayush
– Ayush, Commented Dec 7, 2020 at 6:43
Source File P_ID C_ID Code MSG 10 12 001 abcd 20 21 003 jklm 10 12 002 hijk Target File required P_ID C_ID Code MSG 10 12 001,002 abcd, hijk 20 21 003 jklm Here C_ID need to be searched first and then we have to look for values in code and MSG columns if there are different code and msg against the same C_ID then the final file should have a single row against that C_ID with code and MSG concatenated using a comma, — Ayush
– Ayush, Commented Dec 7, 2020 at 6:53
(1) I’ve fixed the files for you. (1b) Please tell us what kind of text files they are — tab-separated, fixed-width fields, comma-separated, something else? (2) I don’t recognize the term “diff file”. Just say something like “The output should be:” unless you mean something specific, in which case you should explain it. (3) For that matter, please explain what you want to do. (I mean more than a half sentence and a three-line example.) For example, (3a) what if C_ID is the same but P_ID is different? (3b) what if Code is the same? (3c) what if MSG is the same? … (Cont’d) — G-Man Says 'Reinstate Monica'
– G-Man Says 'Reinstate Monica', Commented Dec 7, 2020 at 7:03
(Cont’d) … Please do not respond in comments; edit your question to make it clearer and more complete. — G-Man Says 'Reinstate Monica'
– G-Man Says 'Reinstate Monica', Commented Dec 7, 2020 at 7:03

thanasisp · Accepted Answer · 2020-12-08 09:06:57Z

This may need to be refined when the question is clarified, but, based on its current state,

awk ' BEGIN { unique_vals = 0 } NR == 1 { print } NR > 1 { if (seen[$2] == "") { i = seen[$2] = unique_vals++ P_ID[i] = $1 C_ID[i] = $2 Code[i] = $3 MSG[i] = $4 } else { i = seen[$2] Code[i] = Code[i] "," $3 MSG[i] = MSG[i] "," $4 } } END { for (i=0; i<unique_vals; i++) { printf "%-15s%-11s%-15s%s\n", P_ID[i], C_ID[i], Code[i], MSG[i] } } ' file

seems to do the job.

OK, I assume that you know how to run awk. If you don’t, say so. If you do, run this debug script:

awk ' { print NR ": NF=" NF print " $1=[" $1 "]" print " $2=[" $2 "]" print " $3=[" $3 "]" print " $4=[" $4 "]" } ' file

on your input file and post the output in your question. (Please use the ``` “code fences”.) Then post another comment here to let me know that you’ve done that.

Hey I tried it but this is not working and instead is showing the content, can someone please help. — Ayush
– Ayush, Commented Dec 7, 2020 at 11:24
@Ayush please check your input file provided into the question. Do you really have an empty second line? Is the example complete? If not, repair it. Then copy paste the answer and execute it. If you see that it doesn't work as expected, update with a detailed explanation of what exactly is printed. (I tested with the current example and it worked) — thanasisp
– thanasisp, Commented Dec 7, 2020 at 13:05
Good point — my answer assumes that there are two header rows. If the second one isn’t there, it will treat the abcd line as the second header and pass it through unexamined, and therefore it won’t combine it with the hijk line. — G-Man Says 'Reinstate Monica'
– G-Man Says 'Reinstate Monica', Commented Dec 7, 2020 at 13:12
@thanasisp Sorry if there was any confusion with the data I have updated the question with the result of awk command to get column names and values along with result from original supplied answer. No I do not have an empty line and I am removing it to avoid confusion. — Ayush
– Ayush, Commented Dec 8, 2020 at 8:35
Thanks for the update. I think it's clear now, and the debugging advise was useful. — thanasisp
– thanasisp, Commented Dec 8, 2020 at 9:09

Stack Exchange Network

Search a value for one column and retrieve concatenated values from other columns of a file

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Search a value for one column and retrieve concatenated values from other columns of a file

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions