1
\$\begingroup\$

The following snippet will return a list containing the words in all of the works of Shakespeare:

>>>def shakespeare_tokens(path = 'shakespeare.txt', url = 'http://inst.eecs.berkeley.edu/~cs61a/fa11/shakespeare.txt'): """Return the words of Shakespeare's plays as a list""" import os from urllib.request import urlopen if os.path.exists(path): return open('shakespeare.txt', encoding='ascii').read().split() else: shakespeare = urlopen(url) return shakespeare.read().decode(encoding='ascii').split() >>>tokens = shakespeare_tokens() 

Problem 1:

Here's the idea: We start with some word - we'll use "The" as an example. Then we look through all of the texts of Shakespeare and for every instance of "The" we record the word that follows "The" and add it to a list, known as the successors of "The". Now suppose we've done this for every word Shakespeare has used, ever.

Let's go back to "The". Now, we randomly choose a word from this list, say "cat". Then we look up the successors of "cat" and randomly choose a word from that list, and we continue this process. This eventually will terminate in a period (".") and we will have generated a Shakespearean sentence!

The object that we'll be looking things up in is called a 'successor table', although really it's just a dictionary. The keys in this dictionary are words, and the values are lists of successors to those words.

Here's an incomplete definition of the build_successors_table function. The input is a list of words (corresponding to a Shakespearean text), and the output is a successors table. (By default, the first word is a successor to '.').

>>> table = build_successors_table(tokens)

Problem 2:

Let's generate some sentences Suppose we're given a starting word. We can look up this word in our table to find its list of successors, and then randomly select a word from this list to be the next word in the sentence. Then we just repeat until we reach some ending punctuation. (Note: to randomly select from a list, first make sure you import the Python random library with import random and then use the expression random.choice(my_list)) This might not be a bad time to play around with adding strings together as well.

>>> construct_sent('The', table)

Below is the solution for above 2 problems:

#Shakespeare and Dictionaries def build_successors_table(tokens): table = {} prev = '.' for word in tokens: if prev in table: table[prev].append(word) else: table[prev] = [word] prev = word return table def construct_sent(word, table): import random result = '' while word not in ['.', '!', '?']: result = result + word word = random.choice(table[word]) return result + word 
\$\endgroup\$

1 Answer 1

6
\$\begingroup\$

Python's standard library includes a class that would simplify your first snippet: collections.defaultdict. This class allows you to set a default for missing keys, so if you make it a list you can append without having to check if it already exists.

You could also simplify the iteration using zip; see e.g. "Iterate a list as pair (current, next) in Python".

Using that would make it:

from collections import defaultdict def build_successors_table(tokens): table = defaultdict(list) for prev, word in zip(['.']+tokens, tokens): table[prev].append(word) return table 

For the second snippet note that, per the style guide, all imports should happen at the start of your script - there's no benefit to trying to import random every time the function is called.

Continually appending strings is relatively inefficient, as a new string object is created each time. Instead, it's typical to create a list of words and then use ' '.join(words) to combine them back together.

It is more efficient to test membership (x in y) on a set (O(1)) than a list (O(n)).

import random def construct_sent(word, table): words = [] while word not in {'.', '!', '?'}: words.append(word) word = random.choice(table[word]) return ' '.join(words) + word 

Both functions would benefit from docstrings explaining what they do and how to use them (e.g. what the parameters are expected to be).

\$\endgroup\$
4
  • \$\begingroup\$ is this test driven programming? because we have test cases designed before filling code. \$\endgroup\$ Commented Jun 22, 2015 at 15:27
  • \$\begingroup\$ ...what? What do you mean "is this test driven programming"? Where are the test cases, then? \$\endgroup\$ Commented Jun 22, 2015 at 15:27
  • \$\begingroup\$ I missed out pasting test cases which are part of docstrings \$\endgroup\$ Commented Jun 22, 2015 at 15:29
  • 2
    \$\begingroup\$ @overexchange nor do you mention TDD anywhere in the question. We aren't mind readers. Also, does it really matter how I got to the code in my answer. \$\endgroup\$ Commented Jun 22, 2015 at 15:30

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.