0

I have a string like:

Hello how how how are are you you?

I love cookies cookies, apples and pancakes pancakes.

I wish for an output:

Hello how are you?

I love cookies, apples and pancakes.

Till now I have coded:

String[] s = input.split(" "); String prev = s[0]; String ans = prev + " "; for (int i = 1; i < s.length; i++) { if (!prev.equals(s[i])) { prev = s[i]; ans += prev + " "; } } System.out.println(ans); 

I get outputs as:

Hello how are you you?

I love cookies cookies, apples and pancakes pancakes.

I need some help with the logic for , . ! ? ...

7
  • @JBNizet I find your comment rude, the author of the post said he needs help with the logic, meaning he already knows that they are not the same, and since he already knows that it gives problems, suggesting him to debug isn't going to solve the problem Commented Mar 16, 2019 at 14:39
  • @JBNizet yes I know cookies is not equal to cookies,. I need to help with the logic so that my program takes it as the same and adds the one with the punctuation Commented Mar 16, 2019 at 14:39
  • Possible duplicate of How can I eliminate duplicate words from String in Java? Commented Mar 16, 2019 at 14:40
  • 1
    What you need is called Tokenization. Commented Mar 16, 2019 at 14:40
  • @TiiJ7, there was a wrong usage of formatting — a code quote style for the text quote. Commented Mar 16, 2019 at 14:45

4 Answers 4

4

you can use regex to do this for you. sample code:

String regex = "\\b(\\w+)\\b\\s*(?=.*\\b\\1\\b)"; input = input.replaceAll(regex,""); 
  1. \b Matches a word boundary position between a word character and non-word character or position (start / end of string).
  2. \w Matches any word character (alphanumeric & underscore).
  3. \b Matches a word boundary position between a word character and non-word character or position (start / end of string).
  4. \s Matches any whitespace character (spaces, tabs, line breaks).
  5. * Match 0 or more of the preceding token.
  6. (?= Matches a group after the main expression without including it in the result.
  7. . Matches any character except line breaks.
  8. \1 Matches the results of capture group #1 in step 2.

Note: It is important to use word boundaries here to avoid matching partial words.

Here's a link to regex demo and explaination : RegexDemo

Sign up to request clarification or add additional context in comments.

4 Comments

can you pls explain the regex pls
What the - how are we supposed to understand that?
i added a link to demo and explaination you can modify it there to see & compare results
Might want to note that as currently written this won't work if contractions are allowed in the input. An input of can't can't will result in 'can't
2

You can use java.util.StringTokenizer to tokenize the words. Make sure to set the delimiters to split the words. In your case they are spaces, commas and full stops. This can help you to split the words without the punctuation marks. Then you can compare the previous token with the current and if they are equal you can ignore it.

You can try this code snippet:

String s = "I love cookies cookies, apples and pancakes pancakes."; StringTokenizer tokenizer = new StringTokenizer(s, " ,.", true); List<String> duplicateRemovedTokenList = new LinkedList<>(); String prevToken = null; while (tokenizer.hasMoreTokens()) { String currentToken = tokenizer.nextToken(); if (currentToken.equals(" ")) { duplicateRemovedTokenList.add(currentToken); continue; } if (!currentToken.equals(prevToken)) { duplicateRemovedTokenList.add(currentToken); prevToken = currentToken; } } String duplicateRemovedString = StringUtils.join(duplicateRemovedTokenList, ""); 

1 Comment

This has a few problems, it adds extra spaces and doesn't work with inputs like "I love cookies, cookies, apples and pancakes pancakes." (note the extra comma after the first cookies.
2

You should use a secondary variable to store your words without the punctuation.

String[] s = input.split(" "); String ans = ""; for (int i = 0; i < s.length - 1; i++) { String currentAux = s[i].replaceAll("[,.!?]", ""); String nextAux = s[i + 1].replaceAll("[,.!?]", ""); if (nextAux.equals(currentAux)) { continue; } ans += " " + s[i]; } ans += " " + s[s.length - 1]; System.out.println(ans); 

4 Comments

For "Hello how how how are are you you?" it returns Hello how are you. ? is missing
@SandeepRanjan try it again now :)
This is a good answer, but I think you should add colon and semicolon in your calls to replaceAll because sentences like "I had a huge meal meal; however, I I am already hungry again again." will not be handled correctly - meal will appear twice.
@D.B. we cal add a lot of symbols, the ones i used are the ones requested in the end of the question.
0

If you are looking for a one liner, here is a Java 8 based solution

Stream.of(input.split(" ")).distinct().reduce((a, b) -> a + " " + b).orElse("") 

Comments