3

I'm new to Python and am working on a program that will count the instances of words in a simple text file. The program and the text file will be read from the command line, so I have included into my programming syntax for checking command line arguments. The code is below

import sys count={} with open(sys.argv[1],'r') as f: for line in f: for word in line.split(): if word not in count: count[word] = 1 else: count[word] += 1 print(word,count[word]) file.close() 

count is a dictionary to store the words and the number of times they occur. I want to be able to print out each word and the number of times it occurs, starting from most occurrences to least occurrences.

I'd like to know if I'm on the right track, and if I'm using sys properly. Thank you!!

3
  • Looks good and reasonably Pythonic. Deal with the newline on the end of each line though, the last character will be '\n' which will mess up your counts. You'll want to use for word in line[:-1].split(): or something. Commented Sep 11, 2014 at 3:05
  • 1
    @Gaz Davidson: line.split() will clean up all the whitespace. Commented Sep 11, 2014 at 3:30
  • You might like using re.findall(r'\w+', ...) to chunk things into words since it keys off more than just whitespace as delimiters ... see this example from the python docs Commented Nov 4, 2015 at 20:03

4 Answers 4

3

What you did looks fine to me, one could also use collections.Counter (assuming you are python 2.7 or newer) to get a bit more information like the number of each word. My solution would look like this, probably some improvement possible.

import sys from collections import Counter lines = open(sys.argv[1], 'r').readlines() c = Counter() for line in lines: for work in line.strip().split(): c.update(work) for ind in c: print ind, c[ind] 
Sign up to request clarification or add additional context in comments.

Comments

0

Your final print doesn't have a loop, so it will just print the count for the last word you read, which still remains as the value of word.

Also, with a with context manager, you don't need to close() the file handle.

Finally, as pointed out in a comment, you'll want to remove the final newline from each line before you split.

For a simple program like this, it's probably not worth the trouble, but you might want to look at defaultdict from Collections to avoid the special case for initializing a new key in the dictionary.

Comments

0

I just noticed a typo: you open the file as f but you close it as file. As tripleee said, you shouldn't close files that you open in a with statement. Also, it's bad practice to use the names of builtin functions, like file or list, for your own identifiers. Sometimes it works, but sometimes it causes nasty bugs. And it's confusing for people who read your code; a syntax highlighting editor can help avoid this little problem.

To print the data in your count dict in descending order of count you can do something like this:

items = count.items() items.sort(key=lambda (k,v): v, reverse=True) print '\n'.join('%s: %d' % (k, v) for k,v in items) 

See the Python Library Reference for more details on the list.sort() method and other handy dict methods.

Comments

0

I just did this by using re library. This was for average words in a text file per line but you have to find out number of words per line.

import re #this program get the average number of words per line def main(): try: #get name of file filename=input('Enter a filename:') #open the file infile=open(filename,'r') #read file contents contents=infile.read() line = len(re.findall(r'\n', contents)) count = len(re.findall(r'\w+', contents)) average = count // line #display fie contents print(contents) print('there is an average of', average, 'words per sentence') #closse the file infile.close() except IOError: print('An error oocurred when trying to read ') print('the file',filename ) #call main main() 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.