Skip to main content
added 1007 characters in body
Source Link
muru
  • 78.3k
  • 16
  • 214
  • 320

Another, more memory-efficient option could be to use paste to get all the relevant lines from each file together:

% paste -d '\n' file*.dat 1 1 3 0 1 3 4 8 9 0 5 9 10 11 3 9 2 4 

And then use awk on them:

# cat rowsum-paste.awk NR > 1 && NF != prevNF { for (i = 1; i <= prevNF; i++) { printf "%s ", sum[i]; sum[i] = 0 }; printf "\n" } { for (i = 1; i <= NF; i++) sum[i] += $i; prevNF = NF } 
% (paste -d '\n' file*.dat; echo) | awk -f rowsum-paste.awk 4 1 9 12 4 8 18 12 15 

This awk code sums lines until the number of fields changes, and then prints and resets the current sums. The extra echo is to change the number of fields at the end and trigger the final print, which can also be done with the printing code duplicated in an END block.


Another, more memory-efficient option could be to use paste to get all the relevant lines from each file together:

% paste -d '\n' file*.dat 1 1 3 0 1 3 4 8 9 0 5 9 10 11 3 9 2 4 

And then use awk on them:

# cat rowsum-paste.awk NR > 1 && NF != prevNF { for (i = 1; i <= prevNF; i++) { printf "%s ", sum[i]; sum[i] = 0 }; printf "\n" } { for (i = 1; i <= NF; i++) sum[i] += $i; prevNF = NF } 
% (paste -d '\n' file*.dat; echo) | awk -f rowsum-paste.awk 4 1 9 12 4 8 18 12 15 

This awk code sums lines until the number of fields changes, and then prints and resets the current sums. The extra echo is to change the number of fields at the end and trigger the final print, which can also be done with the printing code duplicated in an END block.

Source Link
muru
  • 78.3k
  • 16
  • 214
  • 320

The following awk script can pretty much replace the whole shell script:

# cat rowsum.awk FNR <= rows { for (i = 1; i <= NF; i++) sum[FNR,i] += $i } END { for (i = 1; i <= rows; i++) { for (j = 1; j <= rows + 1; j++) { printf "%s ", sum[i, j] } printf "\n"; } } 

Example:

% awk -f rowsum.awk -v rows=2 file*.dat 4 1 9 12 4 % awk -f rowsum.awk -v rows=3 file*.dat 4 1 9 12 4 8 18 12 15 

This should be faster than going through all files again and again for each row.

Note: I'm assuming the nth row has n+1 columns. If not, save the number of columns for each row (e.g., cols[FNR]=NF) and use that in the final loop.