Revisions to cat line X to line Y on a huge file

added 39 characters in body

edited Nov 14, 2019 at 17:02

586.4k
96
1.1k
1.7k

I suggest the sed solution, but for the sake of completeness,

awk 'NR >= 57890000 && NR <= 57890010' /path/to/file

To cut out after the last line:

awk 'NR < 57890000 { next } { print } NR == 57890010 { exit }' /path/to/file

Speed test (here on macOS, YMMV on other systems):

100,000,000-line file generated by seq 100000000 > test.in
Reading lines 50,000,000-50,000,010
Tests in no particular order
real time as reported by bash's builtin time

 4.373 4.418 4.395 tail -n+50000000 test.in | head -n10 5.210 5.179 6.181 sed -n '50000000,50000010p;57890010q' test.in 5.525 5.475 5.488 head -n50000010 test.in | tail -n10 8.497 8.352 8.438 sed -n '50000000,50000010p' test.in 22.826 23.154 23.195 tail -n50000001 test.in | head -n10 25.694 25.908 27.638 ed -s test.in <<<"50000000,50000010p" 31.348 28.140 30.574 awk 'NR<57890000{next}1;NR==57890010{exit}' test.in 51.359 50.919 51.127 awk 'NR >= 57890000 && NR <= 57890010' test.in

These are by no means precise benchmarks, but the difference is clear and repeatable enough* to give a good sense of the relative speed of each of these commands.

*: Except between the first two, sed -n p;q and head|tail, which seem to be essentially the same.

I suggest the sed solution, but for the sake of completeness,

awk 'NR >= 57890000 && NR <= 57890010' /path/to/file

To cut out after the last line:

awk 'NR < 57890000 { next } { print } NR == 57890010 { exit }' /path/to/file

Speed test:

100,000,000-line file generated by seq 100000000 > test.in
Reading lines 50,000,000-50,000,010
Tests in no particular order
real time as reported by bash's builtin time

 4.373 4.418 4.395 tail -n+50000000 test.in | head -n10 5.210 5.179 6.181 sed -n '50000000,50000010p;57890010q' test.in 5.525 5.475 5.488 head -n50000010 test.in | tail -n10 8.497 8.352 8.438 sed -n '50000000,50000010p' test.in 22.826 23.154 23.195 tail -n50000001 test.in | head -n10 25.694 25.908 27.638 ed -s test.in <<<"50000000,50000010p" 31.348 28.140 30.574 awk 'NR<57890000{next}1;NR==57890010{exit}' test.in 51.359 50.919 51.127 awk 'NR >= 57890000 && NR <= 57890010' test.in

These are by no means precise benchmarks, but the difference is clear and repeatable enough* to give a good sense of the relative speed of each of these commands.

*: Except between the first two, sed -n p;q and head|tail, which seem to be essentially the same.

I suggest the sed solution, but for the sake of completeness,

awk 'NR >= 57890000 && NR <= 57890010' /path/to/file

To cut out after the last line:

awk 'NR < 57890000 { next } { print } NR == 57890010 { exit }' /path/to/file

Speed test (here on macOS, YMMV on other systems):

100,000,000-line file generated by seq 100000000 > test.in
Reading lines 50,000,000-50,000,010
Tests in no particular order
real time as reported by bash's builtin time

 4.373 4.418 4.395 tail -n+50000000 test.in | head -n10 5.210 5.179 6.181 sed -n '50000000,50000010p;57890010q' test.in 5.525 5.475 5.488 head -n50000010 test.in | tail -n10 8.497 8.352 8.438 sed -n '50000000,50000010p' test.in 22.826 23.154 23.195 tail -n50000001 test.in | head -n10 25.694 25.908 27.638 ed -s test.in <<<"50000000,50000010p" 31.348 28.140 30.574 awk 'NR<57890000{next}1;NR==57890010{exit}' test.in 51.359 50.919 51.127 awk 'NR >= 57890000 && NR <= 57890010' test.in

These are by no means precise benchmarks, but the difference is clear and repeatable enough* to give a good sense of the relative speed of each of these commands.

*: Except between the first two, sed -n p;q and head|tail, which seem to be essentially the same.

not sure, but I think the second ' added to the second awk was necessary. Added /path/to/file where it was possibly missing.

Source Link

edit approved Nov 20, 2016 at 6:01

Rodrigo

1.9k
4
20
32

I suggest the sed solution, but for the sake of completeness,

awk 'NR >= 57890000 && NR <= 57890010' /path/to/file

To cut out after the last line:

awk 'NR < 57890000 { next } { print } NR == 57890010 { exit }' /path/to/file

Speed test:

100,000,000-line file generated by seq 100000000 > test.in
Reading lines 50,000,000-50,000,010
Tests in no particular order
real time as reported by bash's builtin time

 4.373 4.418 4.395 tail -n+50000000 test.in | head -n10 5.210 5.179 6.181 sed -n '50000000,50000010p;57890010q' test.in 5.525 5.475 5.488 head -n50000010 test.in | tail -n10 8.497 8.352 8.438 sed -n '50000000,50000010p' test.in 22.826 23.154 23.195 tail -n50000001 test.in | head -n10 25.694 25.908 27.638 ed -s test.in <<<"50000000,50000010p" 31.348 28.140 30.574 awk 'NR<57890000{next}1;NR==57890010{exit}' test.in 51.359 50.919 51.127 awk 'NR >= 57890000 && NR <= 57890010' test.in

These are by no means precise benchmarks, but the difference is clear and repeatable enough* to give a good sense of the relative speed of each of these commands.

*: Except between the first two, sed -n p;q and head|tail, which seem to be essentially the same.

I suggest the sed solution, but for the sake of completeness,

awk 'NR >= 57890000 && NR <= 57890010'

To cut out after the last line:

awk 'NR < 57890000 { next } { print } NR == 57890010 { exit }

Speed test:

100,000,000-line file generated by seq 100000000 > test.in
Reading lines 50,000,000-50,000,010
Tests in no particular order
real time as reported by bash's builtin time

 4.373 4.418 4.395 tail -n+50000000 test.in | head -n10 5.210 5.179 6.181 sed -n '50000000,50000010p;57890010q' test.in 5.525 5.475 5.488 head -n50000010 test.in | tail -n10 8.497 8.352 8.438 sed -n '50000000,50000010p' test.in 22.826 23.154 23.195 tail -n50000001 test.in | head -n10 25.694 25.908 27.638 ed -s test.in <<<"50000000,50000010p" 31.348 28.140 30.574 awk 'NR<57890000{next}1;NR==57890010{exit}' test.in 51.359 50.919 51.127 awk 'NR >= 57890000 && NR <= 57890010' test.in

These are by no means precise benchmarks, but the difference is clear and repeatable enough* to give a good sense of the relative speed of each of these commands.

*: Except between the first two, sed -n p;q and head|tail, which seem to be essentially the same.

I suggest the sed solution, but for the sake of completeness,

awk 'NR >= 57890000 && NR <= 57890010' /path/to/file

To cut out after the last line:

awk 'NR < 57890000 { next } { print } NR == 57890010 { exit }' /path/to/file

Speed test:

100,000,000-line file generated by seq 100000000 > test.in
Reading lines 50,000,000-50,000,010
Tests in no particular order
real time as reported by bash's builtin time

 4.373 4.418 4.395 tail -n+50000000 test.in | head -n10 5.210 5.179 6.181 sed -n '50000000,50000010p;57890010q' test.in 5.525 5.475 5.488 head -n50000010 test.in | tail -n10 8.497 8.352 8.438 sed -n '50000000,50000010p' test.in 22.826 23.154 23.195 tail -n50000001 test.in | head -n10 25.694 25.908 27.638 ed -s test.in <<<"50000000,50000010p" 31.348 28.140 30.574 awk 'NR<57890000{next}1;NR==57890010{exit}' test.in 51.359 50.919 51.127 awk 'NR >= 57890000 && NR <= 57890010' test.in

These are by no means precise benchmarks, but the difference is clear and repeatable enough* to give a good sense of the relative speed of each of these commands.

*: Except between the first two, sed -n p;q and head|tail, which seem to be essentially the same.

added 64 characters in body

Source Link

edited Sep 8, 2012 at 13:47

Kevin

41.7k
17
91
113

I suggest the sed solution, but for the sake of completeness,

awk 'NR >= 57890000 && NR <= 57890010'

To cut out after the last line:

awk 'NR < 57890000 { next } { print } NR == 57890010 { exit }

Speed test:

100,000,000-line file generated by seq 100000000 > test.in
Reading lines 50,000,000-50,000,010
Tests in no particular order
real time as reported by bash's builtin time

 4.373 4.418 4.395 tail -n+50000000 test.in | head -n10 5.210 5.179 6.181 sed -n '50000000,50000010p;57890010q' test.in 5.525 5.475 5.488 head -n50000010 test.in | tail -n-10n10 8.497 8.352 8.438 sed -n '50000000,50000010p' test.in 22.950826 2223.939154 23.441195 tail -n-50000000n50000001 test.in | head -n10 25.694 25.908 27.638 ed -s test.in <<<"50000000,50000010p" 31.348 28.140 30.574 awk 'NR<57890000{next}1;NR==57890010{exit}' test.in 51.359 50.919 51.127 awk 'NR >= 57890000 && NR <= 57890010' test.in

These are by no means precise benchmarks, but the difference is clear and repeatable enough* to give a good sense of the relative speed of each of these commands.

*: Except between the first two, sed -n p;q and head|tail, which seem to be essentially the same.

I suggest the sed solution, but for the sake of completeness,

awk 'NR >= 57890000 && NR <= 57890010'

To cut out after the last line:

awk 'NR < 57890000 { next } { print } NR == 57890010 { exit }

Speed test:

100,000,000-line file generated by seq 100000000 > test.in
Reading lines 50,000,000-50,000,010
Tests in no particular order
real time as reported by bash's builtin time

 5.210 5.179 6.181 sed -n '50000000,50000010p;57890010q' test.in 5.525 5.475 5.488 head -n50000010 test.in | tail -n-10 8.497 8.352 8.438 sed -n '50000000,50000010p' test.in 22.950 22.939 23.441 tail -n-50000000 test.in | head -n10 25.694 25.908 27.638 ed -s test.in <<<"50000000,50000010p" 31.348 28.140 30.574 awk 'NR<57890000{next}1;NR==57890010{exit}' test.in 51.359 50.919 51.127 awk 'NR >= 57890000 && NR <= 57890010' test.in

These are by no means precise benchmarks, but the difference is clear and repeatable enough* to give a good sense of the relative speed of each of these commands.

*: Except between the first two, sed -n p;q and head|tail, which seem to be essentially the same.

I suggest the sed solution, but for the sake of completeness,

awk 'NR >= 57890000 && NR <= 57890010'

To cut out after the last line:

awk 'NR < 57890000 { next } { print } NR == 57890010 { exit }

Speed test:

100,000,000-line file generated by seq 100000000 > test.in
Reading lines 50,000,000-50,000,010
Tests in no particular order
real time as reported by bash's builtin time

 4.373 4.418 4.395 tail -n+50000000 test.in | head -n10 5.210 5.179 6.181 sed -n '50000000,50000010p;57890010q' test.in 5.525 5.475 5.488 head -n50000010 test.in | tail -n10 8.497 8.352 8.438 sed -n '50000000,50000010p' test.in 22.826 23.154 23.195 tail -n50000001 test.in | head -n10 25.694 25.908 27.638 ed -s test.in <<<"50000000,50000010p" 31.348 28.140 30.574 awk 'NR<57890000{next}1;NR==57890010{exit}' test.in 51.359 50.919 51.127 awk 'NR >= 57890000 && NR <= 57890010' test.in

These are by no means precise benchmarks, but the difference is clear and repeatable enough* to give a good sense of the relative speed of each of these commands.

*: Except between the first two, sed -n p;q and head|tail, which seem to be essentially the same.