Skip to main content

I have an input file with data as follows:

1484523745 96000 2856 25059 0
1484523745 96000 2856 25150 0
1484523745 4864960 2856 997962 193
1484523745 96000 2856 24923 1
1484523745 280000 2856 61454 12
1484523746 1179968 2856 309430 1
1484523746 4864960 2856 1115576 300
1484523746 96000 2856 25059 0
1484523746 4864960 2856 997962 116
1484523746 96000 2856 25059 0
1484523746 96000 2856 25059 0
1484523746 4864960 2856 1146028 211
1484523746 4864960 2856 1115576 371
1484523746 3184960 2856 875340 1

1484523745 96000 2856 25059 0 1484523745 96000 2856 25150 0 1484523745 4864960 2856 997962 193 1484523745 96000 2856 24923 1 1484523745 280000 2856 61454 12 1484523746 1179968 2856 309430 1 1484523746 4864960 2856 1115576 300 1484523746 96000 2856 25059 0 1484523746 4864960 2856 997962 116 1484523746 96000 2856 25059 0 1484523746 96000 2856 25059 0 1484523746 4864960 2856 1146028 211 1484523746 4864960 2856 1115576 371 1484523746 3184960 2856 875340 1 

The requirement is to find the aggregate of columns 4 and 5 based on unique combination of columns 2 and 3, finding the count of each unique combination and showing this result with the value of column 1 (epoch time) for the first occurrence of each unique combination. So the output should look like this:

96000 2856 150309 1 6 1484523745
3184960 2856 875340 1 1 1484523746
1179968 2856 309430 1 1 1484523746
280000 2856 61454 12 1 1484523745
4864960 2856 5373104 1191 5 1484523745

96000 2856 150309 1 6 1484523745 3184960 2856 875340 1 1 1484523746 1179968 2856 309430 1 1 1484523746 280000 2856 61454 12 1 1484523745 4864960 2856 5373104 1191 5 1484523745 

This was easily done in my Mac PC with a one liner command using datamash:

datamash -W --sort -g 2,3 sum 4,5 count 5 first 1 < inputfile

datamash -W --sort -g 2,3 sum 4,5 count 5 first 1 < inputfile 

However, the Linux production server where the input files are present does not have datamash and installation access is restricted. (There are thousands of input files, so I can't FTP them to my Mac). So I am trying to achieve the same with awk command. I have achieved the required result except for printing the value of column 1 for first occurrence of unique combination:

awk -F " " '{a[$2" "$3]+=$4; b[$2" "$3]+=$5; c[$2" "$3]++} END{for(i in a)print i, a[i], b[i], c[i]}' inputfile

awk -F " " '{a[$2" "$3]+=$4; b[$2" "$3]+=$5; c[$2" "$3]++} END{for(i in a)print i, a[i], b[i], c[i]}' inputfile 

Using awkawk, How to store the value of column 1 for the first occurrence of each unique combination of columns 2 and 3?

I have an input file with data as follows:

1484523745 96000 2856 25059 0
1484523745 96000 2856 25150 0
1484523745 4864960 2856 997962 193
1484523745 96000 2856 24923 1
1484523745 280000 2856 61454 12
1484523746 1179968 2856 309430 1
1484523746 4864960 2856 1115576 300
1484523746 96000 2856 25059 0
1484523746 4864960 2856 997962 116
1484523746 96000 2856 25059 0
1484523746 96000 2856 25059 0
1484523746 4864960 2856 1146028 211
1484523746 4864960 2856 1115576 371
1484523746 3184960 2856 875340 1

The requirement is to find the aggregate of columns 4 and 5 based on unique combination of columns 2 and 3, finding the count of each unique combination and showing this result with the value of column 1 (epoch time) for the first occurrence of each unique combination. So the output should look like this:

96000 2856 150309 1 6 1484523745
3184960 2856 875340 1 1 1484523746
1179968 2856 309430 1 1 1484523746
280000 2856 61454 12 1 1484523745
4864960 2856 5373104 1191 5 1484523745

This was easily done in my Mac PC with a one liner command using datamash:

datamash -W --sort -g 2,3 sum 4,5 count 5 first 1 < inputfile

However, the Linux production server where the input files are present does not have datamash and installation access is restricted. (There are thousands of input files, so I can't FTP them to my Mac). So I am trying to achieve the same with awk command. I have achieved the required result except for printing the value of column 1 for first occurrence of unique combination:

awk -F " " '{a[$2" "$3]+=$4; b[$2" "$3]+=$5; c[$2" "$3]++} END{for(i in a)print i, a[i], b[i], c[i]}' inputfile

Using awk, How to store the value of column 1 for the first occurrence of each unique combination of columns 2 and 3?

I have an input file with data as follows:

1484523745 96000 2856 25059 0 1484523745 96000 2856 25150 0 1484523745 4864960 2856 997962 193 1484523745 96000 2856 24923 1 1484523745 280000 2856 61454 12 1484523746 1179968 2856 309430 1 1484523746 4864960 2856 1115576 300 1484523746 96000 2856 25059 0 1484523746 4864960 2856 997962 116 1484523746 96000 2856 25059 0 1484523746 96000 2856 25059 0 1484523746 4864960 2856 1146028 211 1484523746 4864960 2856 1115576 371 1484523746 3184960 2856 875340 1 

The requirement is to find the aggregate of columns 4 and 5 based on unique combination of columns 2 and 3, finding the count of each unique combination and showing this result with the value of column 1 (epoch time) for the first occurrence of each unique combination. So the output should look like this:

96000 2856 150309 1 6 1484523745 3184960 2856 875340 1 1 1484523746 1179968 2856 309430 1 1 1484523746 280000 2856 61454 12 1 1484523745 4864960 2856 5373104 1191 5 1484523745 

This was easily done in my Mac PC with a one liner command using datamash:

datamash -W --sort -g 2,3 sum 4,5 count 5 first 1 < inputfile 

However, the Linux production server where the input files are present does not have datamash and installation access is restricted. (There are thousands of input files, so I can't FTP them to my Mac). So I am trying to achieve the same with awk command. I have achieved the required result except for printing the value of column 1 for first occurrence of unique combination:

awk -F " " '{a[$2" "$3]+=$4; b[$2" "$3]+=$5; c[$2" "$3]++} END{for(i in a)print i, a[i], b[i], c[i]}' inputfile 

Using awk, How to store the value of column 1 for the first occurrence of each unique combination of columns 2 and 3?

Source Link

Storing a column value while doing awk group by

I have an input file with data as follows:

1484523745 96000 2856 25059 0
1484523745 96000 2856 25150 0
1484523745 4864960 2856 997962 193
1484523745 96000 2856 24923 1
1484523745 280000 2856 61454 12
1484523746 1179968 2856 309430 1
1484523746 4864960 2856 1115576 300
1484523746 96000 2856 25059 0
1484523746 4864960 2856 997962 116
1484523746 96000 2856 25059 0
1484523746 96000 2856 25059 0
1484523746 4864960 2856 1146028 211
1484523746 4864960 2856 1115576 371
1484523746 3184960 2856 875340 1

The requirement is to find the aggregate of columns 4 and 5 based on unique combination of columns 2 and 3, finding the count of each unique combination and showing this result with the value of column 1 (epoch time) for the first occurrence of each unique combination. So the output should look like this:

96000 2856 150309 1 6 1484523745
3184960 2856 875340 1 1 1484523746
1179968 2856 309430 1 1 1484523746
280000 2856 61454 12 1 1484523745
4864960 2856 5373104 1191 5 1484523745

This was easily done in my Mac PC with a one liner command using datamash:

datamash -W --sort -g 2,3 sum 4,5 count 5 first 1 < inputfile

However, the Linux production server where the input files are present does not have datamash and installation access is restricted. (There are thousands of input files, so I can't FTP them to my Mac). So I am trying to achieve the same with awk command. I have achieved the required result except for printing the value of column 1 for first occurrence of unique combination:

awk -F " " '{a[$2" "$3]+=$4; b[$2" "$3]+=$5; c[$2" "$3]++} END{for(i in a)print i, a[i], b[i], c[i]}' inputfile

Using awk, How to store the value of column 1 for the first occurrence of each unique combination of columns 2 and 3?