4

Possible Duplicate:
How to get line count cheaply in Python?

Good day. i have some code below, which implements per line file reading and counter iteration.

def __set_quantity_filled_lines_in_file(self): count = 0 with open(self.filename, 'r') as f: for line in f: count += 1 return count 

My question is, are there methods to determine how many lines of text data in current file without per line iteration?

Thanks!

6
  • thank you Paolo, it's same questions. Commented May 12, 2012 at 8:41
  • better way to use buffers for readed lines. Commented May 12, 2012 at 8:43
  • This question is also related. Commented May 12, 2012 at 8:48
  • 4 upvotes for the "duplicate" comment, but only 3 close votes? Commented May 12, 2012 at 8:55
  • 2
    @glglgl: I don't know if it's the reason, but new users get the ability to vote on comments before they get the ability to vote to close. Commented May 12, 2012 at 10:43

5 Answers 5

5

In general it's not possible to do better than reading every character in the file and counting newline characters.

It may be possible if you know details about the internal structure of the file. For example, if the file is 1024kB long, and every line is 1kB in length, then you can deduce there are 1024 lines in the file.

Sign up to request clarification or add additional context in comments.

1 Comment

i have a different quantity of characters in each line, but thank you for your question!
3

I'm not sure if Python has that function or not, highly doubt it, but it would essentially require reading the whole file. A newline is signified by the \n character (actually system dependent) so there is no way to know how many of those exist in a file without going through the whole file.

Comments

1

You could use the readlines() file method and this is probably the easiest.

If you want to be different, you could use the read() member function to get the entire file and count CR, LF,CRLR LFCR character combinations using collections.Counter class.
However, you will have to deal with the various ways of terminating lines.
Something like:

import collections f=open("myfile","rb") d=f.read() f.close() c=collections.Counter(d) lines1=c['\r\n'] lines2=c['\n\r'] lines3=c['\r']-lines1-lines2 lines4=c['\n']-lines1-lines2 nlines=lines3+lines4 

5 Comments

i'm not interesting in the easiest way, i'm looking for a scalable way and the fastest way to perform this action.
Assuming your files are always less than 2G, the fastest and most scalable way is going to be do it in C. Create a Python extension in C which just counts lines from a buffer in memory.
'\n\r' will be treated as two lines on most platforms, no?
@JasonMorgan, nah - this approach doesn't work - stackoverflow.com/questions/29695861/…
@techtonik I already stated that, if required, you would have to handle multiple platforms in my answer. Thanks for the link to the other question related to this.
1

No, such information can only be retrieved by iterating over the whole file's content (or reading the whole file into memory. But unless you know for sure that the files will always be small better don't even think about doing this).

Even if you do not loop over the file contents, the functions you call do. For example, len(f.readlines()) will read the whole file into a list just to count the number of elements. That's horribly inefficient since you don't need to store the file contents at all.

6 Comments

I think other posts here have proved this statement untrue. Iteration is not the only way.
@JasonMorgan - are you saying you know how to count the occurrences of \r\n in a file in less than O(n) time? If so, please provide details.
@JasonMorgan What else does e.g. your Counter() do other than iterate over the file's content? And what other does your f.read() do than reading the whole file content, needing an unnecessary amount of memory?
@JasonMorgan: I was not talking about the code but about what actually happens. len(r.readlines()) does it without iterating manually but the whole file is read into a list and then thrown away after determining its length. So it's a waste of memory (although that only applies for a rather short time)
Thank you Jason. I think I will write(in another process) information in the several bytes in file. When a need to understood about who many lines text in a file, I will read this bytes.
|
0

This gives the answer, but reads the whole file and stores the lines in a list

 len(f.readlines()) 

1 Comment

And thus needing an unnecessay amount of memory.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.