Skip to main content
1 of 11
Jason C
  • 6.5k
  • 2
  • 22
  • 33

Java

Pulls the intro sentence from a random Wikipedia article:

import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\. +[A-Z0-9]")[0]; } while (sentence.endsWith(":") && sentence.length() < 30); System.out.println(sentence + "."); } } 

Sometimes you get unlucky; I try to minimize this by setting a minimum sentence length and filtering out sentences that end with ":" (all disambiguation pages start that way). Sentence boundaries are a period followed by whitespace followed by a number or capital letter.

I also filter out text in parentheses (the result is still a valid sentence) to try and remove some periods that aren't sentence boundaries. I filter out square braces to remove source citation numbers. Example (5 runs):

  • Marisa Anderson is a psychic consultant and medium from Scarsdale, New York who specializes in remote viewing to sense and locate missing persons from a distance.
  • Santo Stefano di Magra is a comune in the Province of La Spezia in the Italian region Liguria, located about 80 km southeast of Genoa and about 11 km northeast of La Spezia.
  • Idle Cure was an arena rock band from Long Beach, California.
  • Starotitarovskaya is a stanitsa in Temryuksky District of Krasnodar Krai, Russia.
  • Douglas Geers is an American composer.

If you notice any grammar issues, please edit the associated Wikipedia page -- it's an open source word list. ;)

Jason C
  • 6.5k
  • 2
  • 22
  • 33