A weak point in my cli foo is awk. I could probably solve the following with elaborate scripting, but I'm pretty sure awk is the best tool for the job and for the life of me I can't figure out the right approach.
Lets say I have a data file like this (Ledger):
2019/05/31 (MMEX948) Gürmar Assets:Cash:Marina ₺-28,14 Expenses:Food:Groceries:Meat ₺28,14 Assets:Cash:Marina ₺-28,14 Expenses:Food:Groceries:Meat ₺28,14 Assets:Cash:Marina ₺-3,45 Expenses:Food:Groceries:Basic ₺3,45 Assets:Cash:Marina ₺-15,00 Expenses:Food:Groceries:Produce ₺15,00 2019/06/01 (MMEX932) A101 Assets:Cash:Caleb $-3.00 Assets:Cash:Marina $-2.50 Expenses:Food:Groceries:Basic $5.50 2019/06/01 (MMEX931) Şemikler Pazar Yeri Assets:Cash:Marina ₺-24,00 Expenses:Food:Groceries:Basic ₺24,00 Assets:Cash:Marina ₺-31,00 Expenses:Food:Groceries:Meat ₺31,00 Assets:Cash:Marina ₺-65,00 Expenses:Food:Groceries:Produce ₺65,00 Each blank line separated paragraph is a transaction, each indented line is a posting, each posting has an account and an amount (separated by at least 2 spaces).
I want two things to happen to this data. I don't care if these happen in the same command or not, it might be easier to do in one pass or two depending on the tool...
All the postings with negative amounts should be arranged after the postings with positive amounts.
Any postings with negative amounts and duplicate accounts should be merged. Ideally the amounts would be summed, but that is really complicated because of currency formats and is not necessary because I can regenerate the amount lines. Removing the amount entirely from merged postings is sufficient so long as no more than one unique account gets merged per pass.
The result should look like this:
2019/05/31 (MMEX948) Gürmar Expenses:Food:Groceries:Meat ₺28,14 Expenses:Food:Groceries:Meat ₺28,14 Expenses:Food:Groceries:Basic ₺3,45 Expenses:Food:Groceries:Produce ₺15,00 Assets:Cash:Marina 2019/06/01 (MMEX932) A101 Expenses:Food:Groceries:Basic $5.50 Assets:Cash:Marina $-2.50 Assets:Cash:Caleb 2019/06/01 (MMEX931) Şemikler Pazar Yeri Expenses:Food:Groceries:Basic ₺24,00 Expenses:Food:Groceries:Meat ₺31,00 Expenses:Food:Groceries:Produce ₺65,00 Assets:Cash:Marina Notes that make this a little more complicated than just a scan for duplicates:
- In the first transaction, there are two different accounts that are duplicated. Only one of them should be merged and cleared (it would be possible to merge both, but only one per pass or I won't be able to fix the ammounts).
- In the middle transaction there is nothing to merge, but it would be a mistake to blindly clear the amounts from all negative transactions. Since there is no merge it doesn't need to be cleared at all, but could be if that makes it easier to process.
How would I step through this problem in awk? Or if Awk isn't the best solution, what is? In most scripting languages (perl, python, zsh) I would parse everything, throw it all into a multi dimensional array, sort based on regex matches of the ammount then and secondarily on alpha for the accounts, then iterate over it to output it, always drop the last ammount and merge only the last duplicate (if any).
Note I did work up a way to parse and merge duplicate transactions in Awk the other day:
awk 'NF { if (/^20/) { if (last != $$0) print "\n" $$0; last = $$0 } else { print $$0 } }' | But more complicated awk logic is defying me right now.