I have a big text file (>500GB), all the ways I can find (sed/tail and others) all require write the 500GB content to disk. Is there anyway to quickly remove the first a few lines in place without writing 500GB to disk?
- 2There is no way to efficiently remove things from the start of a file.don_crissti– don_crissti2017-02-16 23:44:46 +00:00Commented Feb 16, 2017 at 23:44
- Good find, don. I was about to suggest ed, but the other Q covers it.Jeff Schaller– Jeff Schaller ♦2017-02-16 23:51:30 +00:00Commented Feb 16, 2017 at 23:51
- Thank you for pointing out! How about removing the last line? I see it says removing last line can be very fast, but it didn't say how. @don_crissti1a1a11a– 1a1a11a2017-02-17 00:22:25 +00:00Commented Feb 17, 2017 at 0:22
- Well, if you know the size in bytes you can truncate the file. For your actual problem there's also this approach...don_crissti– don_crissti2017-02-17 00:29:46 +00:00Commented Feb 17, 2017 at 0:29
2 Answers
You can use sed to delete lines in place with the -i option:
$ cat foo.txt bar baz lorem $ sed -i '1d' foo.txt $ cat foo.txt baz lorem You can also delete a range of lines; for example sed -i '1,4d' foo.txt will remove lines 1-4.
EDIT: as don pointed out in the comments, the -i option still creates a copy.
- 3This will also create a temporary file, write the 500GB minus a few lines to the temporary file then overwrite the original.don_crissti– don_crissti2017-02-16 23:39:44 +00:00Commented Feb 16, 2017 at 23:39
- @don_crissti: does it? It's possible, I'm not 100% familiar with sed's inner workings, but the
-ioption in the manual says: "edit files in place". I always assumed that meant it would just modify the file without having to create a copy.edaemon– edaemon2017-02-16 23:42:42 +00:00Commented Feb 16, 2017 at 23:42 - 2As Don says.
sed -i ...is equivalent tosed ... file >tmpfile && mv tmpfile file. Removing lines from a file in place (properly) is not possible as the length of the file changes.2017-02-16 23:43:23 +00:00Commented Feb 16, 2017 at 23:43 - @Kusalananda: huh, okay. Learned something new, I guess.edaemon– edaemon2017-02-16 23:45:22 +00:00Commented Feb 16, 2017 at 23:45
- Thank you for your answer even though it didn't solve the problem.1a1a11a– 1a1a11a2017-02-17 00:18:27 +00:00Commented Feb 17, 2017 at 0:18
By using the tail command in that way:
# tail -n +<lines to skip> filename for example:
tail -n +1000 hugefile.txt > hugefile-wo-the-first-1000-lines.txt And that's all.- For more information: https://es.wikipedia.org/wiki/Tail
BTW: Don't be fooled if someone tell you this is exactly the opposite what you want to do, I've tested it:
$ tail -n +3 /tmp/test 3 4 5 $ cat /tmp/test 1 2 3 4 5 - 2This is exactly what the OP does not want to do.don_crissti– don_crissti2017-02-16 23:20:25 +00:00Commented Feb 16, 2017 at 23:20
- 1this method needs to write 500GB data to the disk, my question is how to in-place remove the first few lines without writing so much data.1a1a11a– 1a1a11a2017-02-17 00:16:42 +00:00Commented Feb 17, 2017 at 0:16