fast ways of removing beginning lines from large text file

Question

I have a big text file (>500GB), all the ways I can find (sed/tail and others) all require write the 500GB content to disk. Is there anyway to quickly remove the first a few lines in place without writing 500GB to disk?

There is no way to efficiently remove things from the start of a file. — don_crissti
– don_crissti, Commented Feb 16, 2017 at 23:44
Good find, don. I was about to suggest ed, but the other Q covers it. — Jeff Schaller
– Jeff Schaller ♦, Commented Feb 16, 2017 at 23:51
Thank you for pointing out! How about removing the last line? I see it says removing last line can be very fast, but it didn't say how. @don_crissti — 1a1a11a
– 1a1a11a, Commented Feb 17, 2017 at 0:22
Well, if you know the size in bytes you can truncate the file. For your actual problem there's also this approach... — don_crissti
– don_crissti, Commented Feb 17, 2017 at 0:29

edaemon · Accepted Answer · 2017-02-16 23:46:56Z

0

You can use sed to delete lines in place with the -i option:

$ cat foo.txt bar baz lorem $ sed -i '1d' foo.txt $ cat foo.txt baz lorem

You can also delete a range of lines; for example sed -i '1,4d' foo.txt will remove lines 1-4.

EDIT: as don pointed out in the comments, the -i option still creates a copy.

edited Feb 16, 2017 at 23:46

answered Feb 16, 2017 at 23:38

edaemon

3862 silver badges5 bronze badges

3

This will also create a temporary file, write the 500GB minus a few lines to the temporary file then overwrite the original.

don_crissti
– don_crissti

2017-02-16 23:39:44 +00:00
Commented Feb 16, 2017 at 23:39
@don_crissti: does it? It's possible, I'm not 100% familiar with sed's inner workings, but the -i option in the manual says: "edit files in place". I always assumed that meant it would just modify the file without having to create a copy.

edaemon
– edaemon

2017-02-16 23:42:42 +00:00
Commented Feb 16, 2017 at 23:42
2

As Don says. sed -i ... is equivalent to sed ... file >tmpfile && mv tmpfile file. Removing lines from a file in place (properly) is not possible as the length of the file changes.

Kusalananda
– Kusalananda ♦

2017-02-16 23:43:23 +00:00
Commented Feb 16, 2017 at 23:43
@Kusalananda: huh, okay. Learned something new, I guess.

edaemon
– edaemon

2017-02-16 23:45:22 +00:00
Commented Feb 16, 2017 at 23:45
Thank you for your answer even though it didn't solve the problem.

1a1a11a
– 1a1a11a

2017-02-17 00:18:27 +00:00
Commented Feb 17, 2017 at 0:18

Add a comment |

guile · Accepted Answer · 2017-02-16 23:25:53Z

By using the tail command in that way:

# tail -n +<lines to skip> filename

for example:

tail -n +1000 hugefile.txt > hugefile-wo-the-first-1000-lines.txt

And that's all.- For more information: https://es.wikipedia.org/wiki/Tail

BTW: Don't be fooled if someone tell you this is exactly the opposite what you want to do, I've tested it:

$ tail -n +3 /tmp/test 3 4 5 $ cat /tmp/test 1 2 3 4 5

this method needs to write 500GB data to the disk, my question is how to in-place remove the first few lines without writing so much data. — 1a1a11a
– 1a1a11a, Commented Feb 17, 2017 at 0:16

Stack Exchange Network

fast ways of removing beginning lines from large text file

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

fast ways of removing beginning lines from large text file

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions