How to effectively skip the first n lines in a file with python?

Question

I am currently using a C++ script with a Python wrapper for manipulating a larger (15 GB) text file line-by-line. Effectively what it does is it reads a line from input.txt, processes it, the outputs the result to output.txt. I am using the straigtforward loop here (inp being opened as input.txt, out being opened as output.txt):

for line in inp: result = operate(line) out.write(result)

However, because of the C++ script's issues, it has some failure rate, which causes the loop to shut after about ten million iterations. This leaves me with an output file made using only like 10% of the input.

Since I have no means of fixing the original script, I thought about just restarting it where it stopped. I counted the lines of output.txt, made another called output2.txt, and started the following code:

k = 0 for line in inp: if k < 12123253: k + = 1 else: result = operate(line) out2.write(result) k + = 1

However, compared to when I was counting the lines, which ended under a minute, this method takes long hours to get to the designated line.

Why is this method inefficient? Is there a faster one? I am on a Windows pc with a strong calculating capability (72GB RAM, good processors), and using python 2.7.

I think tell (to record where you were) and seek (to return to that point in your next run) could probably help you out. stackoverflow.com/questions/3299213/… — Jacques de Hooge
– Jacques de Hooge, Commented Apr 13, 2016 at 8:46

Francesco · Accepted Answer · 2016-04-13 08:39:38Z

5

I suggest you to use itertools

with open(inp) as f: result = itertools.islice(f, start_line, None) for i in result: #do something with this line

answered Apr 13, 2016 at 8:39

Francesco

4,3002 gold badges25 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jacques de Hooge Over a year ago

This will still read through the whole file upto the point of interest.

Lei Shi · Accepted Answer · 2016-04-13 08:56:17Z

you may use file.seek and file.tell. Below is the sample (pseudo) code:

def seralizebreakpoint(pos): pass def desearializebreakpoint(): '''return -1 if there is actually no break point''' pass def process(inp): pos = inp.tell() for line in inp: try: result = operate(line) pos = inp.tell() except: seralizebreakpoint(pos) raise def processEntry(pathtoinput): bp = desearializebreakpoint() with open(pathtoinput, 'r') as inp: if bp > -1: inp.seek(bp) process(inp)

Collectives™ on Stack Overflow

How to effectively skip the first n lines in a file with python?

2 Answers 2

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Linked

Related