What's the best way to take a segment out of a text file?

Question

What's a good way of extracting say, lines 20 -45 out of a huge text file. Non-interactively of course!

dkagedal · Accepted Answer · 2015-07-21 09:52:32Z

13

Even simpler:

sed -n '20,45p;45q' < textfile

The -n flag disables the default output. The "20,45" addresses lines 20 to 45, inclusive. The "p" command prints the current line. And the q quits after printing the line.

edited Jul 21, 2015 at 9:52

user79743

answered Sep 15, 2010 at 16:46

dkagedal

7861 gold badge6 silver badges10 bronze badges

1

ok ok, I edited it to say 20,45 :-)

dkagedal
– dkagedal

2010-09-15 23:20:43 +00:00
Commented Sep 15, 2010 at 23:20
Removing the q command (everything starting from ;) improved performance for me when extracting single line 26995107 from a 27169334-line file.

Ruslan
– Ruslan

2019-04-16 11:32:07 +00:00
Commented Apr 16, 2019 at 11:32

Add a comment |

Stefan · Accepted Answer · 2010-09-15 22:08:03Z

11

you could try:

cat textfile | head -n 45 | tail -n 26

or

cat textfile | awk "20 <= NR && NR <= 45"

update:

As Mahomedalid pointed out, cat is not necessary and a bit redundant, but it does make for a clean, readable command.

If cat does bother you, a better sollution would be:

<textfile awk "20 <= NR && NR <= 45"

edited Sep 15, 2010 at 22:08

answered Sep 15, 2010 at 13:25

Stefan

26.1k41 gold badges103 silver badges127 bronze badges

4

awk NR==20,NR==45 textfile works too, and reads easily.

ephemient
– ephemient

2010-09-16 01:47:11 +00:00
Commented Sep 16, 2010 at 1:47
I like the the use of stdin more, it has some global consistancy with the rest of nix

Stefan
– Stefan

2010-09-16 08:52:02 +00:00
Commented Sep 16, 2010 at 8:52
2

Reading from command line arguments has consistency with other UNIX utilities too, and my main point was to demonstrate awk's , range operator.

ephemient
– ephemient

2010-09-17 18:10:58 +00:00
Commented Sep 17, 2010 at 18:10
lol, i meant @adam. but yes, I like your suggestion

Stefan
– Stefan

2010-11-03 04:03:39 +00:00
Commented Nov 3, 2010 at 4:03
1

I think @ephemient's answer is the best one here. Otherwise, the commands are rather cryptic.

Léo Léopold Hertz 준영
– Léo Léopold Hertz 준영

2015-09-11 12:52:24 +00:00
Commented Sep 11, 2015 at 12:52

Add a comment |

Community · Accepted Answer · 2017-04-13 12:36:37Z

This is not an answer but can't post it as a comment.

Another (very fast) way to do it was suggested by mikeserv here:

{ head -n 19 >/dev/null; head -n 26; } <infile

Using the same test file as here and the same procedure, here are some benchmarks (extracting lines 1000020-1000045):

mikeserv:

{ head -n 1000019 >/dev/null; head -n 26; } <iplist real 0m0.059s

Stefan:

head iplist -n 1000045 | tail -n 26 real 0m0.054s

These are by far the fastest solutions and the differences are negligible (for a single pass) (I tried with different ranges: a couple of lines, millions of lines etc).

Doing it without the pipe might offer a significant advantage, however, to an application which needed to seek over multiple ranges of lines in similar fashion, like:

for pass in 0 1 2 3 4 5 6 7 8 9 do printf "pass#$pass:\t" head -n99 >&3; head -n1 done <<1000LINES 3>/dev/null $(seq 1000) 1000LINES

...which prints...

pass#0: 100 pass#1: 200 pass#2: 300 pass#3: 400 pass#4: 500 pass#5: 600 pass#6: 700 pass#7: 800 pass#8: 900 pass#9: 1000

...and only reads the file through the one time.

The other sed/awk/perl solutions read the whole file and since this is about huge files, they're not very efficient. I threw in some alternatives that exit or quit after the last line in the specified range:

Stefan:

awk "1000020 <= NR && NR <= 1000045" iplist real 0m2.448s

vs.

awk "NR >= 1000020;NR==1000045{exit}" iplist real 0m0.243s

dkagedal (sed):

sed -n 1000020,1000045p iplist real 0m0.947s

vs.

sed '1,1000019d;1000045q' iplist real 0m0.143s

Steven D:

perl -ne 'print if 1000020..1000045' iplist real 0m2.041s

vs.

perl -ne 'print if $. >= 1000020; exit if $. >= 1000045;' iplist real 0m0.369s

+1 I think this is the best answer here! It would be nice to get how much it takes time with this awk NR==1000020,NR==1000045 textfile in your system. — Léo Léopold Hertz 준영
– Léo Léopold Hertz 준영, Commented Sep 11, 2015 at 13:02

user1606 · Accepted Answer · 2010-09-16 04:33:38Z

3

ruby -ne 'print if 20 .. 45' file

answered Sep 16, 2010 at 4:33

user1606

9895 silver badges3 bronze badges

1

a fellow rubyist, you get my vote sir

Stefan
– Stefan

2010-09-16 16:36:46 +00:00
Commented Sep 16, 2010 at 16:36
1

While we're at it, why not python -c 'import fileinput, sys; [sys.stdout.write(line) for nr, line in enumerate(fileinput.input()) if 19 <= nr <= 44]' too? :-P This is something that Ruby, modeled after Perl, inspired by awk/sed, can do easily.

ephemient
– ephemient

2010-09-17 18:21:40 +00:00
Commented Sep 17, 2010 at 18:21

Add a comment |

Steven D · Accepted Answer · 2011-03-05 05:21:24Z

2

Since sed and awk were already taken, here is a perl solution:

perl -nle "print if ($. > 19 && $. < 46)" < textfile

Or, as pointed out in the comments:

perl -ne 'print if 20..45' textfile

edited Mar 5, 2011 at 5:21

answered Sep 15, 2010 at 19:46

Steven D

47.6k15 gold badges123 silver badges117 bronze badges

3

What's with all those extra characters? No need to strip and re-add newlines, flip-flop assumes comparison to line number, and diamond operator runs through arguments if provided. perl -ne'print if 20..45' textfile

ephemient
– ephemient

2010-09-15 21:09:43 +00:00
Commented Sep 15, 2010 at 21:09
1

Nice. -nle is a bit of a reflex I suppose, as for the rest, I have no excuse save ignorance.

Steven D
– Steven D

2010-09-15 21:17:41 +00:00
Commented Sep 15, 2010 at 21:17

Add a comment |

Stack Exchange Network

What's the best way to take a segment out of a text file?

5 Answers 5

You must log in to answer this question.

Linked

Hot Network Questions

What's the best way to take a segment out of a text file?

5 Answers 5

You must log in to answer this question.

Linked

Related

Hot Network Questions