Revisions to cat line X to line Y on a huge file

missing quotes

edited Nov 14, 2019 at 16:59

586.4k
96
1.1k
1.7k

If you want lines X to Y inclusive (starting the numbering at 1), use

tail -n +$X"+$X" /path/to/file | head -n $"$((Y-X+1))"

tail will read and discard the first X-1 lines (there's no way around that), then read and print the following lines. head will read and print the requested number of lines, then exit. When head exits, tail receives a SIGPIPE signal and dies, so it won't have read more than a buffer size's worth (typically a few kilobytes) of lines from the input file.

Alternatively, as gorkypl suggested, use sed:

sed -n -e "$X,$Y p" -e "$Y q" /path/to/file

The sed solution is significantly slower though (at least for GNU utilities and Busybox utilities; sed might be more competitive if you extract a large part of the file on an OS where piping is slow and sed is fast). Here are quick benchmarks under Linux; the data was generated by seq 100000000 >/tmp/a, the environment is Linux/amd64, /tmp is tmpfs and the machine is otherwise idle and not swapping.

real user sys command 0.47 0.32 0.12 </tmp/a tail -n +50000001 | head -n 10 #GNU 0.86 0.64 0.21 </tmp/a tail -n +50000001 | head -n 10 #BusyBox 3.57 3.41 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #GNU 11.91 11.68 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #BusyBox 1.04 0.60 0.46 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #GNU 7.12 6.58 0.55 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #BusyBox 9.95 9.54 0.28 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #GNU 23.76 23.13 0.31 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #BusyBox

If you know the byte range you want to work with, you can extract it faster by skipping directly to the start position. But for lines, you have to read from the beginning and count newlines. To extract blocks from x inclusive to y exclusive starting at 0, with a block size of b:

dd bs=$bbs="$b" seek=$xseek="$x" count=$count="$((y-x))" </path/to/file

If you want lines X to Y inclusive (starting the numbering at 1), use

tail -n +$X /path/to/file | head -n $((Y-X+1))

tail will read and discard the first X-1 lines (there's no way around that), then read and print the following lines. head will read and print the requested number of lines, then exit. When head exits, tail receives a SIGPIPE signal and dies, so it won't have read more than a buffer size's worth (typically a few kilobytes) of lines from the input file.

Alternatively, as gorkypl suggested, use sed:

sed -n -e "$X,$Y p" -e "$Y q" /path/to/file

The sed solution is significantly slower though (at least for GNU utilities and Busybox utilities; sed might be more competitive if you extract a large part of the file on an OS where piping is slow and sed is fast). Here are quick benchmarks under Linux; the data was generated by seq 100000000 >/tmp/a, the environment is Linux/amd64, /tmp is tmpfs and the machine is otherwise idle and not swapping.

real user sys command 0.47 0.32 0.12 </tmp/a tail -n +50000001 | head -n 10 #GNU 0.86 0.64 0.21 </tmp/a tail -n +50000001 | head -n 10 #BusyBox 3.57 3.41 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #GNU 11.91 11.68 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #BusyBox 1.04 0.60 0.46 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #GNU 7.12 6.58 0.55 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #BusyBox 9.95 9.54 0.28 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #GNU 23.76 23.13 0.31 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #BusyBox

If you know the byte range you want to work with, you can extract it faster by skipping directly to the start position. But for lines, you have to read from the beginning and count newlines. To extract blocks from x inclusive to y exclusive starting at 0, with a block size of b:

dd bs=$b seek=$x count=$((y-x)) </path/to/file

If you want lines X to Y inclusive (starting the numbering at 1), use

tail -n "+$X" /path/to/file | head -n "$((Y-X+1))"

tail will read and discard the first X-1 lines (there's no way around that), then read and print the following lines. head will read and print the requested number of lines, then exit. When head exits, tail receives a SIGPIPE signal and dies, so it won't have read more than a buffer size's worth (typically a few kilobytes) of lines from the input file.

Alternatively, as gorkypl suggested, use sed:

sed -n -e "$X,$Y p" -e "$Y q" /path/to/file

The sed solution is significantly slower though (at least for GNU utilities and Busybox utilities; sed might be more competitive if you extract a large part of the file on an OS where piping is slow and sed is fast). Here are quick benchmarks under Linux; the data was generated by seq 100000000 >/tmp/a, the environment is Linux/amd64, /tmp is tmpfs and the machine is otherwise idle and not swapping.

real user sys command 0.47 0.32 0.12 </tmp/a tail -n +50000001 | head -n 10 #GNU 0.86 0.64 0.21 </tmp/a tail -n +50000001 | head -n 10 #BusyBox 3.57 3.41 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #GNU 11.91 11.68 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #BusyBox 1.04 0.60 0.46 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #GNU 7.12 6.58 0.55 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #BusyBox 9.95 9.54 0.28 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #GNU 23.76 23.13 0.31 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #BusyBox

If you know the byte range you want to work with, you can extract it faster by skipping directly to the start position. But for lines, you have to read from the beginning and count newlines. To extract blocks from x inclusive to y exclusive starting at 0, with a block size of b:

dd bs="$b" seek="$x" count="$((y-x))" </path/to/file

replaced http://unix.stackexchange.com/ with https://unix.stackexchange.com/

Source Link

edited Apr 13, 2017 at 12:36

Community Bot

1

If you want lines X to Y inclusive (starting the numbering at 1), use

tail -n +$X /path/to/file | head -n $((Y-X+1))

tail will read and discard the first X-1 lines (there's no way around that), then read and print the following lines. head will read and print the requested number of lines, then exit. When head exits, tail receives a SIGPIPE signal and dies, so it won't have read more than a buffer size's worth (typically a few kilobytes) of lines from the input file.

Alternatively, as gorkypl gorkypl suggested, use sed:

sed -n -e "$X,$Y p" -e "$Y q" /path/to/file

The sed solution is significantly slower though (at least for GNU utilities and Busybox utilities; sed might be more competitive if you extract a large part of the file on an OS where piping is slow and sed is fast). Here are quick benchmarks under Linux; the data was generated by seq 100000000 >/tmp/a, the environment is Linux/amd64, /tmp is tmpfs and the machine is otherwise idle and not swapping.

real user sys command 0.47 0.32 0.12 </tmp/a tail -n +50000001 | head -n 10 #GNU 0.86 0.64 0.21 </tmp/a tail -n +50000001 | head -n 10 #BusyBox 3.57 3.41 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #GNU 11.91 11.68 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #BusyBox 1.04 0.60 0.46 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #GNU 7.12 6.58 0.55 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #BusyBox 9.95 9.54 0.28 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #GNU 23.76 23.13 0.31 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #BusyBox

If you know the byte range you want to work with, you can extract it faster by skipping directly to the start position. But for lines, you have to read from the beginning and count newlines. To extract blocks from x inclusive to y exclusive starting at 0, with a block size of b:

dd bs=$b seek=$x count=$((y-x)) </path/to/file

If you want lines X to Y inclusive (starting the numbering at 1), use

tail -n +$X /path/to/file | head -n $((Y-X+1))

tail will read and discard the first X-1 lines (there's no way around that), then read and print the following lines. head will read and print the requested number of lines, then exit. When head exits, tail receives a SIGPIPE signal and dies, so it won't have read more than a buffer size's worth (typically a few kilobytes) of lines from the input file.

Alternatively, as gorkypl suggested, use sed:

sed -n -e "$X,$Y p" -e "$Y q" /path/to/file

The sed solution is significantly slower though (at least for GNU utilities and Busybox utilities; sed might be more competitive if you extract a large part of the file on an OS where piping is slow and sed is fast). Here are quick benchmarks under Linux; the data was generated by seq 100000000 >/tmp/a, the environment is Linux/amd64, /tmp is tmpfs and the machine is otherwise idle and not swapping.

real user sys command 0.47 0.32 0.12 </tmp/a tail -n +50000001 | head -n 10 #GNU 0.86 0.64 0.21 </tmp/a tail -n +50000001 | head -n 10 #BusyBox 3.57 3.41 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #GNU 11.91 11.68 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #BusyBox 1.04 0.60 0.46 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #GNU 7.12 6.58 0.55 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #BusyBox 9.95 9.54 0.28 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #GNU 23.76 23.13 0.31 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #BusyBox

If you know the byte range you want to work with, you can extract it faster by skipping directly to the start position. But for lines, you have to read from the beginning and count newlines. To extract blocks from x inclusive to y exclusive starting at 0, with a block size of b:

dd bs=$b seek=$x count=$((y-x)) </path/to/file

If you want lines X to Y inclusive (starting the numbering at 1), use

tail -n +$X /path/to/file | head -n $((Y-X+1))

tail will read and discard the first X-1 lines (there's no way around that), then read and print the following lines. head will read and print the requested number of lines, then exit. When head exits, tail receives a SIGPIPE signal and dies, so it won't have read more than a buffer size's worth (typically a few kilobytes) of lines from the input file.

Alternatively, as gorkypl suggested, use sed:

sed -n -e "$X,$Y p" -e "$Y q" /path/to/file

The sed solution is significantly slower though (at least for GNU utilities and Busybox utilities; sed might be more competitive if you extract a large part of the file on an OS where piping is slow and sed is fast). Here are quick benchmarks under Linux; the data was generated by seq 100000000 >/tmp/a, the environment is Linux/amd64, /tmp is tmpfs and the machine is otherwise idle and not swapping.

real user sys command 0.47 0.32 0.12 </tmp/a tail -n +50000001 | head -n 10 #GNU 0.86 0.64 0.21 </tmp/a tail -n +50000001 | head -n 10 #BusyBox 3.57 3.41 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #GNU 11.91 11.68 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #BusyBox 1.04 0.60 0.46 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #GNU 7.12 6.58 0.55 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #BusyBox 9.95 9.54 0.28 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #GNU 23.76 23.13 0.31 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #BusyBox

If you know the byte range you want to work with, you can extract it faster by skipping directly to the start position. But for lines, you have to read from the beginning and count newlines. To extract blocks from x inclusive to y exclusive starting at 0, with a block size of b:

dd bs=$b seek=$x count=$((y-x)) </path/to/file

added benchmarks for tail|head vs sed

Source Link

edited Sep 8, 2012 at 11:29

Gilles 'SO- stop being evil'

865.9k
205
1.8k
2.3k

If you want lines X to Y inclusive (starting the numbering at 1), use

tail -n +$X /path/to/file | head -n $((Y-X+1))

tail will read and discard the first X-1 lines (there's no way around that), then read and print the following lines. head will read and print the requested number of lines, then exit. When head exits, tail receives a SIGPIPE signal and dies, so it won't have read more than a buffer size's worth (typically a few kilobytes) of lines from the input file.

Alternatively, as gorkypl suggested, use sed:

sed -n -e "$X,$Y p" -e "$Y q" /path/to/file

I expectThe sed solution is significantly slower though (at least for GNU utilities and Busybox utilities; sed might be more competitive if you extract a large part of the file on an OS where piping is slow and sed is fast). Here are quick benchmarks under Linux; the data was generated by tailseq |100000000 head>/tmp/a, the environment is Linux/amd64, /tmp is tmpfs and sed to have similar performancethe machine is otherwise idle and not swapping.

real user sys command 0.47 0.32 0.12 </tmp/a tail -n +50000001 | head -n 10 #GNU 0.86 0.64 0.21 </tmp/a tail -n +50000001 | head -n 10 #BusyBox 3.57 3.41 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #GNU 11.91 11.68 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #BusyBox 1.04 0.60 0.46 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #GNU 7.12 6.58 0.55 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #BusyBox 9.95 9.54 0.28 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #GNU 23.76 23.13 0.31 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #BusyBox

If you know the byte range you want to work with, you can extract it faster by skipping directly to the start position. But for lines, you have to read from the beginning and count newlines. To extract blocks from x inclusive to y exclusive starting at 0, with a block size of b:

dd bs=$b seek=$x count=$((y-x)) </path/to/file

If you want lines X to Y inclusive (starting the numbering at 1), use

tail -n +$X /path/to/file | head -n $((Y-X+1))

tail will read and discard the first X-1 lines (there's no way around that), then read and print the following lines. head will read and print the requested number of lines, then exit. When head exits, tail receives a SIGPIPE signal and dies, so it won't have read more than a buffer size's worth (typically a few kilobytes) of lines from the input file.

Alternatively, as gorkypl suggested, use sed:

sed -n -e "$X,$Y p" -e "$Y q" /path/to/file

I expect tail | head and sed to have similar performance.

If you know the byte range you want to work with, you can extract it faster by skipping directly to the start position. But for lines, you have to read from the beginning and count newlines. To extract blocks from x inclusive to y exclusive starting at 0, with a block size of b:

dd bs=$b seek=$x count=$((y-x)) </path/to/file

If you want lines X to Y inclusive (starting the numbering at 1), use

tail -n +$X /path/to/file | head -n $((Y-X+1))

tail will read and discard the first X-1 lines (there's no way around that), then read and print the following lines. head will read and print the requested number of lines, then exit. When head exits, tail receives a SIGPIPE signal and dies, so it won't have read more than a buffer size's worth (typically a few kilobytes) of lines from the input file.

Alternatively, as gorkypl suggested, use sed:

sed -n -e "$X,$Y p" -e "$Y q" /path/to/file

The sed solution is significantly slower though (at least for GNU utilities and Busybox utilities; sed might be more competitive if you extract a large part of the file on an OS where piping is slow and sed is fast). Here are quick benchmarks under Linux; the data was generated by seq 100000000 >/tmp/a, the environment is Linux/amd64, /tmp is tmpfs and the machine is otherwise idle and not swapping.

real user sys command 0.47 0.32 0.12 </tmp/a tail -n +50000001 | head -n 10 #GNU 0.86 0.64 0.21 </tmp/a tail -n +50000001 | head -n 10 #BusyBox 3.57 3.41 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #GNU 11.91 11.68 0.14 sed -n -e '50000000,50000010 p' -e '50000010q' /tmp/a #BusyBox 1.04 0.60 0.46 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #GNU 7.12 6.58 0.55 </tmp/a tail -n +50000001 | head -n 40000001 >/dev/null #BusyBox 9.95 9.54 0.28 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #GNU 23.76 23.13 0.31 sed -n -e '50000000,90000000 p' -e '90000000q' /tmp/a >/dev/null #BusyBox

If you know the byte range you want to work with, you can extract it faster by skipping directly to the start position. But for lines, you have to read from the beginning and count newlines. To extract blocks from x inclusive to y exclusive starting at 0, with a block size of b:

dd bs=$b seek=$x count=$((y-x)) </path/to/file

Source Link

answered Sep 7, 2012 at 1:39

Gilles 'SO- stop being evil'

865.9k
205
1.8k
2.3k

Loading

Stack Exchange Network

Return to Answer