15

What's a good way of extracting say, lines 20 -45 out of a huge text file. Non-interactively of course!

0

5 Answers 5

13

Even simpler:

sed -n '20,45p;45q' < textfile 

The -n flag disables the default output. The "20,45" addresses lines 20 to 45, inclusive. The "p" command prints the current line. And the q quits after printing the line.

2
  • 1
    ok ok, I edited it to say 20,45 :-) Commented Sep 15, 2010 at 23:20
  • Removing the q command (everything starting from ;) improved performance for me when extracting single line 26995107 from a 27169334-line file. Commented Apr 16, 2019 at 11:32
11

you could try:

cat textfile | head -n 45 | tail -n 26 

or

cat textfile | awk "20 <= NR && NR <= 45" 

update:

As Mahomedalid pointed out, cat is not necessary and a bit redundant, but it does make for a clean, readable command.

If cat does bother you, a better sollution would be:

<textfile awk "20 <= NR && NR <= 45" 
5
  • 4
    awk NR==20,NR==45 textfile works too, and reads easily. Commented Sep 16, 2010 at 1:47
  • I like the the use of stdin more, it has some global consistancy with the rest of nix Commented Sep 16, 2010 at 8:52
  • 2
    Reading from command line arguments has consistency with other UNIX utilities too, and my main point was to demonstrate awk's , range operator. Commented Sep 17, 2010 at 18:10
  • lol, i meant @adam. but yes, I like your suggestion Commented Nov 3, 2010 at 4:03
  • 1
    I think @ephemient's answer is the best one here. Otherwise, the commands are rather cryptic. Commented Sep 11, 2015 at 12:52
7

This is not an answer but can't post it as a comment.

Another (very fast) way to do it was suggested by mikeserv here:

{ head -n 19 >/dev/null; head -n 26; } <infile 

Using the same test file as here and the same procedure, here are some benchmarks (extracting lines 1000020-1000045):

mikeserv:

{ head -n 1000019 >/dev/null; head -n 26; } <iplist real 0m0.059s 

Stefan:

head iplist -n 1000045 | tail -n 26 real 0m0.054s 

These are by far the fastest solutions and the differences are negligible (for a single pass) (I tried with different ranges: a couple of lines, millions of lines etc).

Doing it without the pipe might offer a significant advantage, however, to an application which needed to seek over multiple ranges of lines in similar fashion, like:

for pass in 0 1 2 3 4 5 6 7 8 9 do printf "pass#$pass:\t" head -n99 >&3; head -n1 done <<1000LINES 3>/dev/null $(seq 1000) 1000LINES 

...which prints...

pass#0: 100 pass#1: 200 pass#2: 300 pass#3: 400 pass#4: 500 pass#5: 600 pass#6: 700 pass#7: 800 pass#8: 900 pass#9: 1000 

...and only reads the file through the one time.


The other sed/awk/perl solutions read the whole file and since this is about huge files, they're not very efficient. I threw in some alternatives that exit or quit after the last line in the specified range:

Stefan:

awk "1000020 <= NR && NR <= 1000045" iplist real 0m2.448s 

vs.

awk "NR >= 1000020;NR==1000045{exit}" iplist real 0m0.243s 

dkagedal (sed):

sed -n 1000020,1000045p iplist real 0m0.947s 

vs.

sed '1,1000019d;1000045q' iplist real 0m0.143s 

Steven D:

perl -ne 'print if 1000020..1000045' iplist real 0m2.041s 

vs.

perl -ne 'print if $. >= 1000020; exit if $. >= 1000045;' iplist real 0m0.369s 
1
  • +1 I think this is the best answer here! It would be nice to get how much it takes time with this awk NR==1000020,NR==1000045 textfile in your system. Commented Sep 11, 2015 at 13:02
3
ruby -ne 'print if 20 .. 45' file 
2
  • 1
    a fellow rubyist, you get my vote sir Commented Sep 16, 2010 at 16:36
  • 1
    While we're at it, why not python -c 'import fileinput, sys; [sys.stdout.write(line) for nr, line in enumerate(fileinput.input()) if 19 <= nr <= 44]' too? :-P This is something that Ruby, modeled after Perl, inspired by awk/sed, can do easily. Commented Sep 17, 2010 at 18:21
2

Since sed and awk were already taken, here is a perl solution:

perl -nle "print if ($. > 19 && $. < 46)" < textfile 

Or, as pointed out in the comments:

perl -ne 'print if 20..45' textfile 
2
  • 3
    What's with all those extra characters? No need to strip and re-add newlines, flip-flop assumes comparison to line number, and diamond operator runs through arguments if provided. perl -ne'print if 20..45' textfile Commented Sep 15, 2010 at 21:09
  • 1
    Nice. -nle is a bit of a reflex I suppose, as for the rest, I have no excuse save ignorance. Commented Sep 15, 2010 at 21:17

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.