4

I would like to use itertools.islice(self._f, 0, 100, None) to read in a file piece by piece (in blocks of 100 lines) as follows:

f = open('test.dat', 'r') while (some condition I look for): f = open(fileName, 'r') x = itertools.islice(f, 0, 100, None) doSomethingWithX(x) 

My problem is, I do not know how long the file is and I am looking for a condition to stop the while loop when the end of the file is reached. But I cannot figure out how it is done.

EDIT: Ok, I see the difficulty. Maybe I should reformulate the question when the itertools.islice is capsuled in a class like here:

class reader: def __init__() self._f = open('test.dat', 'r') def getNext(): return itertools.islice(self._f, 0, 100, None) R = reader() while (some condition I look for): x = R.getNext() doSomethingWithX(x) 
2
  • Could you please include in your question, whether you are trying to iterate over lines or bytes? Commented Aug 16, 2015 at 20:56
  • I have edited my code to fetch 100 lines per iteration. Is that what you were looking for? Commented Aug 16, 2015 at 21:31

3 Answers 3

4

If you don't mind getting list slices, you can use iter:

with open(filename, 'r') as f: for x in iter(lambda: list(itertools.islice(f, 100)), []): doSomethingWithX(x) 

Not sure which file you are using as you have f = .. twice and have self_.f in there too.

Using your edited code:

class reader: def __init__(self): self._f = open('out.csv', 'r') def getNext(self): return itertools.islice(self._f, 100) R = reader() import itertools for x in iter(lambda: list(R.getNext()),[]): print(x) 

using a test file with the following and your class code using itertools.islice(self._f, 2):

1 2 3 4 5 6 7 8 9 10 

outputs:

In [15]: R = reader() In [16]: import itertools In [17]: for x in iter(lambda: list(R.getNext()),[]): ....: print(x) ....: ['1\r\n', '2\r\n'] ['3\r\n', '4\r\n'] ['5\r\n', '6\r\n'] ['7\r\n', '8\r\n'] ['9\r\n', '10'] 
Sign up to request clarification or add additional context in comments.

11 Comments

@andi, the answer will applies, this will take 100 lines at a time until you have exhausted the iterator i.e got to the end of the file, the only issue is whether you want a list of what exactly you want to do with x
Maybe I am to stu*** to see it :( I will give it a shot, but might take a while :D. Thanks already for your time.
using [] with iter is a sentinel value, we call list on the islice object so when we get to the end there will be an empty list so the loop ends, you will get 100 lines in a list each iteration until the end
but what if the file has 150 lines?
I do not get it to work in the class, sorry. Would you be able to include it in the class? Sorry for the troubles.
|
0

So what I was looking for was something like this:

class reader: def __init__() self._f = open('test.dat', 'r') self._f.seek(0, os.SEEK_END) # find EOF self._EOF = self._f.tell() self._f.seek(0) # go back to beginning def getNext(): if self._f.tell() != self._EOF: x = np.genfromtxt(itertools.islice(self._f, 0, self._chunkSizes, None), dtype=np.float64) return x else: return [] R = reader() x = R.getNext() while (x != []): doSomethingWithX(x) x = R.getNext() 

Comments

-1

You can use the readline method to easily process the chunks of 100 lines. Do as follows:

def read_chunks(f, chunks=100): block = True while block: block = [f.readline() for i in range(chunks)] block = list(filter(None, block)) yield block with open("filename") as f: for lines in read_chunks(f): print(len(lines), lines) 

Comments