1

I need to create a regex through which i can find all the sentences containing a specific word/regex.

For eg. if i have the following text

Harrison Ford is working on a new Film. The film is yet to be released

The film has a gud star cast. Most paid actor is Harrison Ford in the film.

Here if i want to get all the sentences where I can find the word Harrison, How should i go about it. The regex should return the following selections

  • Harrison Ford is working on a new Film.
  • Most paid actor is Harrison Ford in the film.

The sentence beginning and ending can be marked by a new line character, or a full stop or if it is the first line in the paragraph.

I used the following regex

.*?((\n|.|^\\s*).*?\\b(Harrison)\\b.*?[.\n]).* 

But i am unable to get the splitting of the lines. I get the sentence from the start till the first Harrison Ford.

Please let me know of any suggestions that any of you may have

2
  • 2
    How is full stop ending a sentence in Most paid actor is Mr. Harrison Ford in the film.? Commented Dec 20, 2015 at 12:59
  • yeah this is solved. please look Dukefirehawk's solution .... and for a generic piece .. please look at my comment in that section Commented Dec 24, 2015 at 18:04

3 Answers 3

1

If you can guarantee that a sentence and only a sentence ends with a new line character or a full stop then I suggest you first split the text and then search each line:

String[] sentences = text.split("\\.|\\R+"); for (String se : sentences) { if (se.indexOf("Harrison") != -1) System.out.println(se.trim()); } 

Output:

Harrison Ford is working on a new Film Most paid actor is Harrison Ford in the film 
Sign up to request clarification or add additional context in comments.

Comments

1

For Java, the following code should do the trick

String data = "Harrison Ford is working on a new Film\n The film is yet to be released. " + "The film has a gud star cast. " + "Most paid actor is Harrison Ford in the film."; String tmpData = data.replace('\n', '.'); Pattern myPattern = Pattern.compile("([\\w|\\s]*Harrison[\\w|\\s]*)[\\.]"); Matcher m = myPattern.matcher(tmpData); while(m.find()) { System.out.println("Result: " + m.group(1)); } 

1 Comment

Thanks @Dukefirehawk, your suggestion helped me sort out the regex. I made some more modifications, in which i took care of the scenarios where word can be any thing except full stop or new line. The last line need not have full stop or new line. And this is the regex, i achieved (?i)([^\\.\n]*?\\b(Harrison)(Ford)?\\b.*?)(\\.|\n|$) This would work even if the sentence contains '@' or ! and the last line need not end in '.' or new line
0

You should use the global flag to match all occurences in a string. Then use this regex to find all sentences containing "Harrison":

(?:[\w][^.]+)?Harrison[^.]+ 

Regular expression visualization

See a demo here.

1 Comment

java (as the Q is tagged) doesn't have the concept of a "global" flag

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.