1

I have a String and I would consider every single word. For example:

"That's a good question" 

I need to elaborate every single word:

That, s, a, good, question 

I don't need to save them I need to read the single words.

I was testing this solution:

String s = "That's a good question"; String[] words = s.split("\\s+"); for (int i = 0; i < words.length; i++) { words[i] = words[i].replaceAll("[^\\w]", ""); } 

but I don't know what regular expression I need to separate "That's" in two different words.

1
  • 1
    I think you're looking for a linguistic algorithm, for example, this phrase "Mother's house" is not equal than "Mother is house". Now, if your solution for every word ending with " 's " is equals than " word followed with 'is' " then you can replace every " 's " with " is " and then execute the split. Commented Nov 26, 2017 at 19:27

4 Answers 4

1

If I didn't misunderstand you, this is what you're looking for - change String[] words = s.split("\\s+"); with String[] words = s.split("[\\s']");.

Sign up to request clarification or add additional context in comments.

3 Comments

There is no difference between the two options
@Sam now there is
+ inside [...] doesn't represent "one or more" quantifier, it is simple literal there so you may want to remove it.
1

Are you completely sure you need to consider that's as two words? (viz. that is)

Ordinarily, I believe that's is counted as one word in English.

But if your perspective on the requirements is correct, you have a (moderately) difficult problem: I don't think there is any (reasonable) regex that can distinguish between something like that's (contraction of that and is) and something like steve's (possessive).

AFAIK you will have to write something yourself.

Suggestion: take a look at this list of English language contractions. You could use it to make an enumeration of the things you need to handle in a special way.

Basic Example

enum Contraction { AINT("ain't", "is not"), ARENT("aren't", "are not"), // Many, many in between... YOUVE("you've", "you have"); private final String oneWord; private final String twoWords; private Contraction(String oneWord, String twoWords) { this.oneWord = oneWord; this.twoWords = twoWords; } public String getOneWord() { return oneWord; } public String getTwoWords() { return twoWords; } } String s = "That's a good question".toLowerCase(); for (Contraction c : Contraction.values()) { s = s.replaceAll(c.getOneWord(), c.getTwoWords()) } String[] words = s.split("\\s+"); // And so forth... 

NOTE: This example handles case sensitivity by converting the entire input to lower case, so the elements in the enum will match. If that doesn't work for you, you may need to handle it in another way.

I'm not clear on what you need to do with the words once you have them, so I left that part out.

Comments

0

if you're looking for the regex to match the apostrophe, you can use this to get the whole string containing it.

.*["'].* 

and this is for the apostrophe itself

["'] 

Comments

0

this should work. Replace 's with the second word before running it through the split method.

s.replaceALL("\'s", " is"); String[] words = s.split("\\s+"); 

This also changes That's to " that, is " if that's what you're looking to do

2 Comments

Wrong, this solution will return the following ["word is", "word2", ....] he needs ["word", "is", "word2", ....]
Before running it through the split method**

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.