it's possible to determine how many lines exist in file without per line iteration? [duplicate]

Question

Possible Duplicate:
How to get line count cheaply in Python?

Good day. i have some code below, which implements per line file reading and counter iteration.

def __set_quantity_filled_lines_in_file(self): count = 0 with open(self.filename, 'r') as f: for line in f: count += 1 return count

My question is, are there methods to determine how many lines of text data in current file without per line iteration?

Thanks!

4 upvotes for the "duplicate" comment, but only 3 close votes? — glglgl
– glglgl, Commented May 12, 2012 at 8:55
@glglgl: I don't know if it's the reason, but new users get the ability to vote on comments before they get the ability to vote to close. — Steve Jessop
– Steve Jessop, Commented May 12, 2012 at 10:43

Li-aung Yip · Accepted Answer · 2012-05-12 08:53:15Z

In general it's not possible to do better than reading every character in the file and counting newline characters.

It may be possible if you know details about the internal structure of the file. For example, if the file is 1024kB long, and every line is 1kB in length, then you can deduce there are 1024 lines in the file.

i have a different quantity of characters in each line, but thank you for your question!

user845279 · Accepted Answer · 2012-05-12 08:30:24Z

I'm not sure if Python has that function or not, highly doubt it, but it would essentially require reading the whole file. A newline is signified by the \n character (actually system dependent) so there is no way to know how many of those exist in a file without going through the whole file.

Jay M · Accepted Answer · 2012-05-12 08:48:01Z

1

You could use the readlines() file method and this is probably the easiest.

If you want to be different, you could use the read() member function to get the entire file and count CR, LF,CRLR LFCR character combinations using collections.Counter class.
However, you will have to deal with the various ways of terminating lines.
Something like:

import collections f=open("myfile","rb") d=f.read() f.close() c=collections.Counter(d) lines1=c['\r\n'] lines2=c['\n\r'] lines3=c['\r']-lines1-lines2 lines4=c['\n']-lines1-lines2 nlines=lines3+lines4

answered May 12, 2012 at 8:48

Jay M

4,4831 gold badge28 silver badges39 bronze badges

5 Comments

Dmitry Zagorulkin Over a year ago

i'm not interesting in the easiest way, i'm looking for a scalable way and the fastest way to perform this action.

Jay M Over a year ago

Assuming your files are always less than 2G, the fastest and most scalable way is going to be do it in C. Create a Python extension in C which just counts lines from a buffer in memory.

anatoly techtonik Over a year ago

'\n\r' will be treated as two lines on most platforms, no?

anatoly techtonik Over a year ago

@JasonMorgan, nah - this approach doesn't work - stackoverflow.com/questions/29695861/…

Jay M Over a year ago

@techtonik I already stated that, if required, you would have to handle multiple platforms in my answer. Thanks for the link to the other question related to this.

ThiefMaster · Accepted Answer · 2012-05-12 08:55:00Z

1

No, such information can only be retrieved by iterating over the whole file's content (or reading the whole file into memory. But unless you know for sure that the files will always be small better don't even think about doing this).

Even if you do not loop over the file contents, the functions you call do. For example, len(f.readlines()) will read the whole file into a list just to count the number of elements. That's horribly inefficient since you don't need to store the file contents at all.

edited May 12, 2012 at 8:55

answered May 12, 2012 at 8:27

ThiefMaster

320k85 gold badges608 silver badges648 bronze badges

6 Comments

Jay M Over a year ago

I think other posts here have proved this statement untrue. Iteration is not the only way.

Li-aung Yip Over a year ago

@JasonMorgan - are you saying you know how to count the occurrences of \r\n in a file in less than O(n) time? If so, please provide details.

glglgl Over a year ago

@JasonMorgan What else does e.g. your Counter() do other than iterate over the file's content? And what other does your f.read() do than reading the whole file content, needing an unnecessary amount of memory?

ThiefMaster Over a year ago

@JasonMorgan: I was not talking about the code but about what actually happens. len(r.readlines()) does it without iterating manually but the whole file is read into a list and then thrown away after determining its length. So it's a waste of memory (although that only applies for a rather short time)

Dmitry Zagorulkin Over a year ago

Thank you Jason. I think I will write(in another process) information in the several bytes in file. When a need to understood about who many lines text in a file, I will read this bytes.

|

Schuh · Accepted Answer · 2012-05-12 08:35:05Z

0

This gives the answer, but reads the whole file and stores the lines in a list

 len(f.readlines())

answered May 12, 2012 at 8:35

Schuh

1,0955 silver badges10 bronze badges

1 Comment

glglgl Over a year ago

And thus needing an unnecessay amount of memory.

Collectives™ on Stack Overflow

it's possible to determine how many lines exist in file without per line iteration? [duplicate]

5 Answers 5

1 Comment

Comments

5 Comments

6 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

5 Comments

6 Comments

1 Comment

Linked

Related