Timeline for Remove duplicates in file (without sorting!) leaving the _last_ of the occurences
Current License: CC BY-SA 4.0
7 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jul 8, 2024 at 10:18 | comment | added | Kaz | @Make42 Whether this solution takes more memory due to the parallel arrays depends on the Awk implementation. I can't think of any reason why the $0 string would not simply be reference counted so that line[NR] = $0 increments the refcount to store the value, and lnum[$0] increments the refcount to store the key. More memory will be used due to more assoc array entries, but not for a wholesale copy of the string data. Awk strings are not mutable, so they are shareable, similarly to strings in Javascript. | |
| Jul 8, 2024 at 10:16 | comment | added | Kaz | @Make42 Awk is very loose compared to many other scripting languages. You can delete things that don't exist, and such. If you simply mention a variable, that causes it to exist. Strings that look like numbers can be subject to arithmetic, ... | |
| Jul 8, 2024 at 10:15 | comment | added | Kaz | @Make42 line[lnum[$0]] gets deleted. I don't know whether the delete operator creates the entry first if it doesn't exist and then deletes it, or whether it does nothing in the non-existence case. Certainly lnum[$0] gets created if it does not exist, with an undefined value, but we clobber it with a new value two statements later. | |
| Jul 8, 2024 at 10:05 | vote | accept | Make42 | ||
| Jul 8, 2024 at 9:26 | comment | added | Make42 | After some research regarding 3: It seems to me that the entry for lnum[$0] is created if it does not exist when lnum[$0] is called. Likewise for line[lnum[$0]]. And afterwards it is deleted with delete. Is that correct? | |
| Jul 8, 2024 at 8:39 | comment | added | Make42 | A couple of questions: 1. Why is the tac approach less portable? 2. So, the tac approach requires half of memory, because it only needs to keep the lines in memory once, while you (approximately) need to have all lines in memory twice - once for line and once for lnum. Is that correct? 3. Does this not produce errors when you try to delete line[lnum[$0]], but it cannot find the lookup entry? Or is this different then in other programming languages? | |
| Jul 2, 2024 at 7:45 | history | answered | Kaz | CC BY-SA 4.0 |