What's a good way of extracting say, lines 20 -45 out of a huge text file. Non-interactively of course!
5 Answers
Even simpler:
sed -n '20,45p;45q' < textfile The -n flag disables the default output. The "20,45" addresses lines 20 to 45, inclusive. The "p" command prints the current line. And the q quits after printing the line.
- 1ok ok, I edited it to say 20,45 :-)dkagedal– dkagedal2010-09-15 23:20:43 +00:00Commented Sep 15, 2010 at 23:20
- Removing the
qcommand (everything starting from;) improved performance for me when extracting single line 26995107 from a 27169334-line file.Ruslan– Ruslan2019-04-16 11:32:07 +00:00Commented Apr 16, 2019 at 11:32
you could try:
cat textfile | head -n 45 | tail -n 26 or
cat textfile | awk "20 <= NR && NR <= 45" update:
As Mahomedalid pointed out, cat is not necessary and a bit redundant, but it does make for a clean, readable command.
If cat does bother you, a better sollution would be:
<textfile awk "20 <= NR && NR <= 45" - 4
awk NR==20,NR==45 textfileworks too, and reads easily.ephemient– ephemient2010-09-16 01:47:11 +00:00Commented Sep 16, 2010 at 1:47 - I like the the use of stdin more, it has some global consistancy with the rest of nixStefan– Stefan2010-09-16 08:52:02 +00:00Commented Sep 16, 2010 at 8:52
- 2Reading from command line arguments has consistency with other UNIX utilities too, and my main point was to demonstrate awk's
,range operator.ephemient– ephemient2010-09-17 18:10:58 +00:00Commented Sep 17, 2010 at 18:10 - lol, i meant @adam. but yes, I like your suggestionStefan– Stefan2010-11-03 04:03:39 +00:00Commented Nov 3, 2010 at 4:03
- 1I think @ephemient's answer is the best one here. Otherwise, the commands are rather cryptic.Léo Léopold Hertz 준영– Léo Léopold Hertz 준영2015-09-11 12:52:24 +00:00Commented Sep 11, 2015 at 12:52
This is not an answer but can't post it as a comment.
Another (very fast) way to do it was suggested by mikeserv here:
{ head -n 19 >/dev/null; head -n 26; } <infile Using the same test file as here and the same procedure, here are some benchmarks (extracting lines 1000020-1000045):
mikeserv:
{ head -n 1000019 >/dev/null; head -n 26; } <iplist real 0m0.059s Stefan:
head iplist -n 1000045 | tail -n 26 real 0m0.054s These are by far the fastest solutions and the differences are negligible (for a single pass) (I tried with different ranges: a couple of lines, millions of lines etc).
Doing it without the pipe might offer a significant advantage, however, to an application which needed to seek over multiple ranges of lines in similar fashion, like:
for pass in 0 1 2 3 4 5 6 7 8 9 do printf "pass#$pass:\t" head -n99 >&3; head -n1 done <<1000LINES 3>/dev/null $(seq 1000) 1000LINES ...which prints...
pass#0: 100 pass#1: 200 pass#2: 300 pass#3: 400 pass#4: 500 pass#5: 600 pass#6: 700 pass#7: 800 pass#8: 900 pass#9: 1000 ...and only reads the file through the one time.
The other sed/awk/perl solutions read the whole file and since this is about huge files, they're not very efficient. I threw in some alternatives that exit or quit after the last line in the specified range:
Stefan:
awk "1000020 <= NR && NR <= 1000045" iplist real 0m2.448s vs.
awk "NR >= 1000020;NR==1000045{exit}" iplist real 0m0.243s dkagedal (sed):
sed -n 1000020,1000045p iplist real 0m0.947s vs.
sed '1,1000019d;1000045q' iplist real 0m0.143s Steven D:
perl -ne 'print if 1000020..1000045' iplist real 0m2.041s vs.
perl -ne 'print if $. >= 1000020; exit if $. >= 1000045;' iplist real 0m0.369s - +1 I think this is the best answer here! It would be nice to get how much it takes time with this
awk NR==1000020,NR==1000045 textfilein your system.Léo Léopold Hertz 준영– Léo Léopold Hertz 준영2015-09-11 13:02:45 +00:00Commented Sep 11, 2015 at 13:02
ruby -ne 'print if 20 .. 45' file - 1a fellow rubyist, you get my vote sirStefan– Stefan2010-09-16 16:36:46 +00:00Commented Sep 16, 2010 at 16:36
- 1While we're at it, why not
python -c 'import fileinput, sys; [sys.stdout.write(line) for nr, line in enumerate(fileinput.input()) if 19 <= nr <= 44]'too? :-P This is something that Ruby, modeled after Perl, inspired by awk/sed, can do easily.ephemient– ephemient2010-09-17 18:21:40 +00:00Commented Sep 17, 2010 at 18:21
Since sed and awk were already taken, here is a perl solution:
perl -nle "print if ($. > 19 && $. < 46)" < textfile Or, as pointed out in the comments:
perl -ne 'print if 20..45' textfile - 3What's with all those extra characters? No need to strip and re-add newlines, flip-flop assumes comparison to line number, and diamond operator runs through arguments if provided.
perl -ne'print if 20..45' textfileephemient– ephemient2010-09-15 21:09:43 +00:00Commented Sep 15, 2010 at 21:09 - 1Nice. -nle is a bit of a reflex I suppose, as for the rest, I have no excuse save ignorance.Steven D– Steven D2010-09-15 21:17:41 +00:00Commented Sep 15, 2010 at 21:17