Python - Counting Words In A Text File

Question

I'm new to Python and am working on a program that will count the instances of words in a simple text file. The program and the text file will be read from the command line, so I have included into my programming syntax for checking command line arguments. The code is below

import sys count={} with open(sys.argv[1],'r') as f: for line in f: for word in line.split(): if word not in count: count[word] = 1 else: count[word] += 1 print(word,count[word]) file.close()

count is a dictionary to store the words and the number of times they occur. I want to be able to print out each word and the number of times it occurs, starting from most occurrences to least occurrences.

I'd like to know if I'm on the right track, and if I'm using sys properly. Thank you!!

Looks good and reasonably Pythonic. Deal with the newline on the end of each line though, the last character will be '\n' which will mess up your counts. You'll want to use for word in line[:-1].split(): or something. — Gareth Davidson
– Gareth Davidson, Commented Sep 11, 2014 at 3:05
@Gaz Davidson: line.split() will clean up all the whitespace. — PM 2Ring
– PM 2Ring, Commented Sep 11, 2014 at 3:30
You might like using re.findall(r'\w+', ...) to chunk things into words since it keys off more than just whitespace as delimiters ... see this example from the python docs — reteptilian
– reteptilian, Commented Nov 4, 2015 at 20:03

Brian Larsen · Accepted Answer · 2014-09-11 03:17:29Z

What you did looks fine to me, one could also use collections.Counter (assuming you are python 2.7 or newer) to get a bit more information like the number of each word. My solution would look like this, probably some improvement possible.

import sys from collections import Counter lines = open(sys.argv[1], 'r').readlines() c = Counter() for line in lines: for work in line.strip().split(): c.update(work) for ind in c: print ind, c[ind]

tripleee · Accepted Answer · 2014-09-11 03:40:31Z

Your final print doesn't have a loop, so it will just print the count for the last word you read, which still remains as the value of word.

Also, with a with context manager, you don't need to close() the file handle.

Finally, as pointed out in a comment, you'll want to remove the final newline from each line before you split.

For a simple program like this, it's probably not worth the trouble, but you might want to look at defaultdict from Collections to avoid the special case for initializing a new key in the dictionary.

PM 2Ring · Accepted Answer · 2014-09-11 04:02:12Z

I just noticed a typo: you open the file as f but you close it as file. As tripleee said, you shouldn't close files that you open in a with statement. Also, it's bad practice to use the names of builtin functions, like file or list, for your own identifiers. Sometimes it works, but sometimes it causes nasty bugs. And it's confusing for people who read your code; a syntax highlighting editor can help avoid this little problem.

To print the data in your count dict in descending order of count you can do something like this:

items = count.items() items.sort(key=lambda (k,v): v, reverse=True) print '\n'.join('%s: %d' % (k, v) for k,v in items)

See the Python Library Reference for more details on the list.sort() method and other handy dict methods.

roelofs · Accepted Answer · 2017-11-13 03:42:59Z

I just did this by using re library. This was for average words in a text file per line but you have to find out number of words per line.

import re #this program get the average number of words per line def main(): try: #get name of file filename=input('Enter a filename:') #open the file infile=open(filename,'r') #read file contents contents=infile.read() line = len(re.findall(r'\n', contents)) count = len(re.findall(r'\w+', contents)) average = count // line #display fie contents print(contents) print('there is an average of', average, 'words per sentence') #closse the file infile.close() except IOError: print('An error oocurred when trying to read ') print('the file',filename ) #call main main()

Collectives™ on Stack Overflow

Python - Counting Words In A Text File

4 Answers 4

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Linked

Related