What's the fastest way to get n-th line of a text file [duplicate]

Question

Given a file with m lines, how to get the n-th line. m can be smaller than n sometimes. I have tried:

method1: sed -ne '10p' file.txt method2: sed -ne '10p' <file.txt method3: sed -ne '10{p;q;}' file.txt method4: awk 'NR==10' file.txt

in LeetCode's https://leetcode.com/problems/tenth-line/. method1 beats others. I don't know why. I think method3 should be faster.

Are there faster ways?

Updates:

Following @skwllsp suggestion, I run some commands. The results are: instructions

instructions commands 428,160,537 perf stat sed -ne '10p' file.txt 427,426,310 perf stat sed -ne '10p' <file.txt 1,033,730 perf stat sed -ne '10{p;q;}' file.txt 1,111,502 perf stat awk 'NR == 10 { print ; exit ;} ' file.txt

method4 has changed according to @Archemar's answer

and

777,525 perf -stat tail -n +10 file.txt |head -n 1

which is much less than method1.

The link you have provided requires a subscription to see the stuff :/P — sjsam
– sjsam, Commented Apr 26, 2016 at 4:27
@sjsam: Sorry, I paste the result page url. I have chage it to the right one — frams
– frams, Commented Apr 26, 2016 at 4:31
If you feed sed a gigantic file, method 3 will likely stay roughly constant and method 1 will get progressively slower. How big was the file you tested? — Wildcard
– Wildcard, Commented Apr 26, 2016 at 4:33
@Wildcard: There are 7 tests. It's a black box too me. The first test case has only 10 lines. — frams
– frams, Commented Apr 26, 2016 at 4:43
@frams : I believe the first and second method makes no difference. I have just started a thread here. You may wanna stay updated with it. — sjsam
– sjsam, Commented Apr 26, 2016 at 5:25

score 3 · Accepted Answer · 2016-04-26 07:15:55Z

Let's measure your tests to see how many instructions are done in each method. I created my own file seq 2000000 > 2000000.txt and I want to find which method is the fastest.

$ perf stat sed -ne '10p' 2000000.txt 10 Performance counter stats for 'sed -ne 10p 2000000.txt': 203.877247 task-clock # 0.991 CPUs utilized 5 context-switches # 0.025 K/sec 3 cpu-migrations # 0.015 K/sec 214 page-faults # 0.001 M/sec 405,075,423 cycles # 1.987 GHz [50.20%] <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 838,221,677 instructions # 2.07 insns per cycle [75.20%] 203,113,013 branches # 996.251 M/sec [74.99%] 766,918 branch-misses # 0.38% of all branches [75.16%] 0.205683270 seconds time elapsed

So first method - 838,221,677 instructions.

$ perf stat sed -ne '10{p;q;}' 2000000.txt 10 Performance counter stats for 'sed -ne 10{p;q;} 2000000.txt': 1.211558 task-clock # 0.145 CPUs utilized 2 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 213 page-faults # 0.176 M/sec 1,633,950 cycles # 1.349 GHz [23.73%] <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 824,789 instructions # 0.50 insns per cycle 164,935 branches # 136.135 M/sec 11,751 branch-misses # 7.12% of all branches [83.24%] 0.008374725 seconds time elapsed

So, third method - 824,789 instructions. It is much better than the first method.

The improved forth method

$ perf stat awk 'NR == 10 { print ; exit ;} ' 2000000.txt 10 Performance counter stats for 'awk NR == 10 { print ; exit ;} 2000000.txt': 1.357354 task-clock # 0.162 CPUs utilized 2 context-switches # 0.001 M/sec 0 cpu-migrations # 0.000 K/sec 282 page-faults # 0.208 M/sec 1,777,749 cycles # 1.310 GHz [11.54%] <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 919,636 instructions # 0.52 insns per cycle 185,695 branches # 136.807 M/sec 11,218 branch-misses # 6.04% of all branches [91.64%] 0.008375258 seconds time elapsed

Little bit worse than the second method. Anyway it is as efficient as the third method.

You might repeat the same tests with your file and see which method is the best.

A measument for the second method:

$ perf stat sed -ne '10p' <2000000.txt 10 Performance counter stats for 'sed -ne 10p': 203.278584 task-clock # 0.998 CPUs utilized 1 context-switches # 0.005 K/sec 3 cpu-migrations # 0.015 K/sec 213 page-faults # 0.001 M/sec 403,941,976 cycles # 1.987 GHz [49.84%] <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 835,372,994 instructions # 2.07 insns per cycle [74.92%] 203,327,145 branches # 1000.239 M/sec [74.90%] 773,067 branch-misses # 0.38% of all branches [75.35%] 0.203714402 seconds time elapsed

It is as bad as the first method

@sjsam: I updates the question too. I seems that command <file.txt and command file.txt are different. <file.txt is faster? — frams
– frams, Commented Apr 26, 2016 at 7:21

Community · Accepted Answer · 2017-04-13 12:36:37Z

for awk

 awk 'NR == 10 { print ; exit ;} ' file.txt

I think perl is faster, there was a same question here about one year ago.

Stack Exchange Network

What's the fastest way to get n-th line of a text file [duplicate]

2 Answers 2

Linked

Hot Network Questions

What's the fastest way to get n-th line of a text file [duplicate]

2 Answers 2

Linked

Related

Hot Network Questions