3

I am trying to print the top 10 frequent words using the following code. However, its not working. Any idea on how to fix it?

def reducer_count_words(self, word, counts): # send all (num_occurrences, word) pairs to the same reducer. # num_occurrences is so we can easily use Python's max() function. yield None, (sum(counts), word) # discard the key; it is just None def reducer_find_max_10_words(self, _, word_count_pairs): # each item of word_count_pairs is (count, word), # so yielding one results in key=counts, value=word tmp = sorted(word_count_pairs)[0:10] yield tmp 
4
  • @Veedrac: more similar to this question: stackoverflow.com/questions/3121979/… Commented May 28, 2014 at 19:45
  • @Leftium I strongly disagree with your interpretation of the question. Also, how the hell did "its not working. Any idea on how to fix it?" get upvotes? Commented May 28, 2014 at 19:49
  • @Veedrac: my interpretation is based on the question title and the asker's responses to other answers. Commented May 28, 2014 at 19:58
  • @Leftium I stick by my opinion, but I don't really care about a question of this quality anyway. Commented May 28, 2014 at 20:00

3 Answers 3

2

Use collections.Counter and its most_common method:

>>>from collections import Counter >>>my_words = 'a a foo bar foo' >>>Counter(my_words.split()).most_common() [('foo', 2), ('a', 2), ('b', 1)] 
Sign up to request clarification or add additional context in comments.

2 Comments

I am using this command in my code but seeing this error: unhashable type 'list'. If I want to use this format it seems like I cannot use most.common()
Hmm. That exact code works on my machine.
1

Use collections.most_common()

Example:

most_common([n]) Return a list of the n most common elements and their counts from the most common to the least. If n is not specified, most_common() returns all elements in the counter. Elements with equal counts are ordered arbitrarily: >>> from collections import Counter >>> Counter('abracadabra').most_common(3) [('a', 5), ('r', 2), ('b', 2)] 

2 Comments

I am using this command in my code but seeing this error: unhashable type 'list'. If I want to use this format it seems like I cannot use most.common()
Run most_common() on the list of words, not on the (word, count) tuples
0
tmp = sorted(word_count_pairs, key=lambda pair: pair[0], reverse=True)[0:10] 

Explanation:

  • The key parameter of sorted() allows you to run a function on each element before comparison.
  • lambda pair: pair[0] is a function that extracts the number from your word_count_pairs.
  • reverse sorts in descending order, instead of ascending order.

Sources:


aside: If you have many different words, sorting the entire list to find the top ten is inefficient. There are much more efficient algorithms. The most_common() method mentioned in another answers probably utilizes a more efficient algorithm.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.