Skip to main content
deleted 8 characters in body
Source Link
Jason C
  • 6.5k
  • 2
  • 22
  • 33

Java

Pulls the intro sentence from a random Wikipedia article:

import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\.( +[A-Z0-9]|$)")[0]; } while (sentence.endsWith(":") || sentence.length() < 30 || sentence.contains("?")); System.out.println(sentence + "."); } } 

Sometimes you get unlucky; I try to minimize this by setting a minimum sentence length and filtering out sentences that end with ":" (all disambiguation pages start that way) or contain a "?" (there seem to be many articles with unresolved unknown info marked by question marks). Sentence boundaries are a period followed by whitespace followed by a number or capital letter.

I also filter out text in parentheses (the result is still a valid sentence) to try and remove some periods that aren't sentence boundaries. I filter out square braces to remove source citation numbers. Example (5 runs)Examples:

  • Idle Cure was an arena rock band from Long Beach, California.
  • Self-focusing is a non-linear optical process induced by the change in refractive index of materials exposed to intense electromagnetic radiation.
  • TB10Cs4H3 is a member of the H/ACA-like class of non-coding RNA molecule that guide the sites of modification of uridines to pseudouridines of substrate RNAs.
  • The Six-headed Wild Ram in Sumerian mythology was one of the Heroes slain by Ninurta, patron god of Lagash, in ancient Iraq.
  • Sugar daddy is a slang term for a man who offers to support a typically younger woman or man after establishing a relationship that is usually sexual.
  • Old Bethel United Methodist Church is located at 222 Calhoun St., Charleston, South Carolina.
  • Douglas Geers is an American composer.

If you notice any grammar issues, well, that's your fault for not being a diligent Wikipedia editor! ;-)

Java

Pulls the intro sentence from a random Wikipedia article:

import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\.( +[A-Z0-9]|$)")[0]; } while (sentence.endsWith(":") || sentence.length() < 30 || sentence.contains("?")); System.out.println(sentence + "."); } } 

Sometimes you get unlucky; I try to minimize this by setting a minimum sentence length and filtering out sentences that end with ":" (all disambiguation pages start that way) or contain a "?" (there seem to be many articles with unresolved unknown info marked by question marks). Sentence boundaries are a period followed by whitespace followed by a number or capital letter.

I also filter out text in parentheses (the result is still a valid sentence) to try and remove some periods that aren't sentence boundaries. I filter out square braces to remove source citation numbers. Example (5 runs):

  • Idle Cure was an arena rock band from Long Beach, California.
  • Self-focusing is a non-linear optical process induced by the change in refractive index of materials exposed to intense electromagnetic radiation.
  • TB10Cs4H3 is a member of the H/ACA-like class of non-coding RNA molecule that guide the sites of modification of uridines to pseudouridines of substrate RNAs.
  • The Six-headed Wild Ram in Sumerian mythology was one of the Heroes slain by Ninurta, patron god of Lagash, in ancient Iraq.
  • Sugar daddy is a slang term for a man who offers to support a typically younger woman or man after establishing a relationship that is usually sexual.
  • Old Bethel United Methodist Church is located at 222 Calhoun St., Charleston, South Carolina.
  • Douglas Geers is an American composer.

If you notice any grammar issues, well, that's your fault for not being a diligent Wikipedia editor! ;-)

Java

Pulls the intro sentence from a random Wikipedia article:

import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\.( +[A-Z0-9]|$)")[0]; } while (sentence.endsWith(":") || sentence.length() < 30 || sentence.contains("?")); System.out.println(sentence + "."); } } 

Sometimes you get unlucky; I try to minimize this by setting a minimum sentence length and filtering out sentences that end with ":" (all disambiguation pages start that way) or contain a "?" (there seem to be many articles with unresolved unknown info marked by question marks). Sentence boundaries are a period followed by whitespace followed by a number or capital letter.

I also filter out text in parentheses (the result is still a valid sentence) to try and remove some periods that aren't sentence boundaries. I filter out square braces to remove source citation numbers. Examples:

  • Idle Cure was an arena rock band from Long Beach, California.
  • Self-focusing is a non-linear optical process induced by the change in refractive index of materials exposed to intense electromagnetic radiation.
  • TB10Cs4H3 is a member of the H/ACA-like class of non-coding RNA molecule that guide the sites of modification of uridines to pseudouridines of substrate RNAs.
  • The Six-headed Wild Ram in Sumerian mythology was one of the Heroes slain by Ninurta, patron god of Lagash, in ancient Iraq.
  • Sugar daddy is a slang term for a man who offers to support a typically younger woman or man after establishing a relationship that is usually sexual.
  • Old Bethel United Methodist Church is located at 222 Calhoun St., Charleston, South Carolina.
  • Douglas Geers is an American composer.

If you notice any grammar issues, well, that's your fault for not being a diligent Wikipedia editor! ;-)

deleted 57 characters in body
Source Link
Jason C
  • 6.5k
  • 2
  • 22
  • 33

Java

Pulls the intro sentence from a random Wikipedia article:

import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\.( +[A-Z0-9]|$)")[0]; } while (sentence.endsWith(":") || sentence.length() < 30 || sentence.contains("?") || sentences.contains("/")); System.out.println(sentence + "."); } } 

Sometimes you get unlucky; I try to minimize this by setting a minimum sentence length and filtering out sentences that end with ":" (all disambiguation pages start that way) or contain a "?" (there seem to be many articles with unresolved unknown info marked by question marks) or a "/" (see comments below). Sentence boundaries are a period followed by whitespace followed by a number or capital letter.

I also filter out text in parentheses (the result is still a valid sentence) to try and remove some periods that aren't sentence boundaries. I filter out square braces to remove source citation numbers. Example (5 runs):

  • Idle Cure was an arena rock band from Long Beach, California.
  • Self-focusing is a non-linear optical process induced by the change in refractive index of materials exposed to intense electromagnetic radiation.
  • TB10Cs4H3 is a member of the H/ACA-like class of non-coding RNA molecule that guide the sites of modification of uridines to pseudouridines of substrate RNAs.
  • The Six-headed Wild Ram in Sumerian mythology was one of the Heroes slain by Ninurta, patron god of Lagash, in ancient Iraq.
  • Sugar daddy is a slang term for a man who offers to support a typically younger woman or man after establishing a relationship that is usually sexual.
  • Old Bethel United Methodist Church is located at 222 Calhoun St., Charleston, South Carolina.
  • Douglas Geers is an American composer.

If you notice any grammar issues, well, that's your fault for not being a diligent Wikipedia editor! ;-)

Java

Pulls the intro sentence from a random Wikipedia article:

import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\.( +[A-Z0-9]|$)")[0]; } while (sentence.endsWith(":") || sentence.length() < 30 || sentence.contains("?") || sentences.contains("/")); System.out.println(sentence + "."); } } 

Sometimes you get unlucky; I try to minimize this by setting a minimum sentence length and filtering out sentences that end with ":" (all disambiguation pages start that way) or contain a "?" (there seem to be many articles with unresolved unknown info marked by question marks) or a "/" (see comments below). Sentence boundaries are a period followed by whitespace followed by a number or capital letter.

I also filter out text in parentheses (the result is still a valid sentence) to try and remove some periods that aren't sentence boundaries. I filter out square braces to remove source citation numbers. Example (5 runs):

  • Idle Cure was an arena rock band from Long Beach, California.
  • Self-focusing is a non-linear optical process induced by the change in refractive index of materials exposed to intense electromagnetic radiation.
  • TB10Cs4H3 is a member of the H/ACA-like class of non-coding RNA molecule that guide the sites of modification of uridines to pseudouridines of substrate RNAs.
  • The Six-headed Wild Ram in Sumerian mythology was one of the Heroes slain by Ninurta, patron god of Lagash, in ancient Iraq.
  • Sugar daddy is a slang term for a man who offers to support a typically younger woman or man after establishing a relationship that is usually sexual.
  • Old Bethel United Methodist Church is located at 222 Calhoun St., Charleston, South Carolina.
  • Douglas Geers is an American composer.

If you notice any grammar issues, well, that's your fault for not being a diligent Wikipedia editor! ;-)

Java

Pulls the intro sentence from a random Wikipedia article:

import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\.( +[A-Z0-9]|$)")[0]; } while (sentence.endsWith(":") || sentence.length() < 30 || sentence.contains("?")); System.out.println(sentence + "."); } } 

Sometimes you get unlucky; I try to minimize this by setting a minimum sentence length and filtering out sentences that end with ":" (all disambiguation pages start that way) or contain a "?" (there seem to be many articles with unresolved unknown info marked by question marks). Sentence boundaries are a period followed by whitespace followed by a number or capital letter.

I also filter out text in parentheses (the result is still a valid sentence) to try and remove some periods that aren't sentence boundaries. I filter out square braces to remove source citation numbers. Example (5 runs):

  • Idle Cure was an arena rock band from Long Beach, California.
  • Self-focusing is a non-linear optical process induced by the change in refractive index of materials exposed to intense electromagnetic radiation.
  • TB10Cs4H3 is a member of the H/ACA-like class of non-coding RNA molecule that guide the sites of modification of uridines to pseudouridines of substrate RNAs.
  • The Six-headed Wild Ram in Sumerian mythology was one of the Heroes slain by Ninurta, patron god of Lagash, in ancient Iraq.
  • Sugar daddy is a slang term for a man who offers to support a typically younger woman or man after establishing a relationship that is usually sexual.
  • Old Bethel United Methodist Church is located at 222 Calhoun St., Charleston, South Carolina.
  • Douglas Geers is an American composer.

If you notice any grammar issues, well, that's your fault for not being a diligent Wikipedia editor! ;-)

Syntax coloring
Source Link
Victor Stafusa
  • 8.8k
  • 5
  • 42
  • 63
import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\.( +[A-Z0-9]|$)")[0]; } while (sentence.endsWith(":") || sentence.length() < 30 || sentence.contains("?") || sentences.contains("/")); System.out.println(sentence + "."); } } 
import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\.( +[A-Z0-9]|$)")[0]; } while (sentence.endsWith(":") || sentence.length() < 30 || sentence.contains("?") || sentences.contains("/")); System.out.println(sentence + "."); } } 
import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\.( +[A-Z0-9]|$)")[0]; } while (sentence.endsWith(":") || sentence.length() < 30 || sentence.contains("?") || sentences.contains("/")); System.out.println(sentence + "."); } } 
import java.io.InputStream; import java.net.URL; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; public class RandomSentence { public static void main (String[] args) throws Exception { String sentence; do { InputStream in = new URL("https://en.wikipedia.org/wiki/Special:Random").openStream(); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); String intro = doc.getElementsByTagName("p").item(0).getTextContent(); sentence = intro.replaceAll("\\([^(]*\\) *", "").replaceAll("\\[[^\\[]*\\]", "").split("\\.( +[A-Z0-9]|$)")[0]; } while (sentence.endsWith(":") || sentence.length() < 30 || sentence.contains("?") || sentences.contains("/")); System.out.println(sentence + "."); } } 
added 27 characters in body
Source Link
Jason C
  • 6.5k
  • 2
  • 22
  • 33
Loading
Rearrange for humor effect
Source Link
Jason C
  • 6.5k
  • 2
  • 22
  • 33
Loading
added 117 characters in body
Source Link
Jason C
  • 6.5k
  • 2
  • 22
  • 33
Loading
Correct 'split' to avoid leaving periods at the end of single-sentence intros; added a great sentence that is completely incomprehensible to me.
Source Link
Jason C
  • 6.5k
  • 2
  • 22
  • 33
Loading
removed example output from a crappy wiki article that i just nominated for deletion
Source Link
Jason C
  • 6.5k
  • 2
  • 22
  • 33
Loading
edited body
Source Link
Jason C
  • 6.5k
  • 2
  • 22
  • 33
Loading
deleted 8 characters in body
Source Link
Jason C
  • 6.5k
  • 2
  • 22
  • 33
Loading
Source Link
Jason C
  • 6.5k
  • 2
  • 22
  • 33
Loading