Timeline for awk based solution for summing the rows of multiple files
Current License: CC BY-SA 4.0
12 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jul 20, 2022 at 16:17 | review | Suggested edits | |||
| Jul 23, 2022 at 2:04 | |||||
| May 3, 2022 at 10:02 | comment | added | algae | Yes! This is what I had in mind. Just couldn't see past the awk. It is also the fastest of all 4 solutions. | |
| May 3, 2022 at 8:12 | comment | added | muru | @algae I added an alternative that should use much less memory | |
| May 3, 2022 at 8:10 | history | edited | muru | CC BY-SA 4.0 | added 1007 characters in body |
| May 3, 2022 at 6:23 | vote | accept | algae | ||
| May 3, 2022 at 6:21 | comment | added | Kusalananda♦ | @algae FNR is the line number in the current file. The fields of that line has to be added to the corresponding entries in each of the other files. When FNR increments, that just means you have started working on the next line from the same file. | |
| May 3, 2022 at 6:20 | comment | added | algae | @Kusalananda Then I don't understand how the first for loop works. arr[NFR,i] points to the FNRth row and the ith column of all the file*.dat? After each FNR increment occurs you have a line ready to print. Though this would require another loop and not be worth it I guess | |
| May 3, 2022 at 6:14 | comment | added | Kusalananda♦ | @algae Since all the fields on each line have to be summed with the corresponding elements in the other files, you can't really print each line as you go, as the sum would not be "done" yet. Also, the amount of memory taken should be no more than the amount needed to store one of the files in RAM. | |
| May 3, 2022 at 6:08 | comment | added | algae | @muru Thanks! Definitely faster than my script, at the cost of quite a bit of memory, which is fine. I hadn't realised that every line of the files supplied is looped over implicitly. Would it be faster / less memory expensive to print each line as you go? e.g. instead of sum[FNR,i] it is just sum[i], print the line and repeat. | |
| May 3, 2022 at 5:53 | comment | added | algae | @Kusalananda Yes I've removed it. The script should have set ROWS=$(wc -l) or something. | |
| May 3, 2022 at 5:52 | comment | added | Kusalananda♦ | You could probably use rows == "" || FNR <= rows as the first condition, or remove it completely. I'm assuming that the ROWS variable in their code is a byproduct of their way of thinking about the problem rather than a necessary part of the solution. | |
| May 3, 2022 at 5:20 | history | answered | muru | CC BY-SA 4.0 |