10

How do you split a string of words and retain whitespaces?

Here is the code:

String words[] = s.split(" "); 

String s contains: hello world

After the code runs, words[] contains: "hello" "" world

Ideally, it should not be an empty string in the middle, but contain both whitespaces: words[] should be: "hello" " " " " world

How do I get it to have this result?

4
  • Why you don't want to trim your string? Commented Jul 7, 2015 at 15:32
  • Because I need to keep it verbatim what the user inputs. It's part of the specs. Commented Jul 7, 2015 at 15:34
  • 2
    String.split removes the delimiter you provide to it (the space in this case). If you want a different behavior, you'd have to implement a variant of split yourself. Commented Jul 7, 2015 at 15:34
  • @Chronio, existing api can support it. why reinvent the wheel. Commented Jul 7, 2015 at 15:37

4 Answers 4

16

You could use lookahead/lookbehind assertions:

String[] words = "hello world".split("((?<=\\s+)|(?=\\s+))"); 

where (?<=\\s+) and (?=\\s+) are zero-width groups.

Sign up to request clarification or add additional context in comments.

Comments

10

If you can tolerate both white spaces together in one string, you can do

String[] words = s.split("\\b"); 

Then words contains ("hello", " ", "world").

3 Comments

+1 because in my case, this also would have been acceptable, because I was ultimately trying to reverse each word's characters, but leave the words in order and keep the same whitespace in between each word.
OMG - this just made my entire day. Have been struggling to split words and retain spaces in between. So many workarounds that didn't work 100% - and then I see this - and everything falls into place!
This is the best!! +1
4

s.split("((?<= )|(?= ))"); is one way.

Technically, the regular expression is using lookahead and lookbehind. The single space after each = is the delimiter.

Comments

1

You could do something like this:

List<String> result = new LinkedList<>(); int rangeStart = 0; for (int i = 0; i < s.length(); ++i) { if (Character.isWhitespace(s.charAt(i))) { if (rangeStart < i) { result.add(s.substring(rangeStart, i)); } result.add(Character.toString(s.charAt(i))); rangeStart = i + 1; } } if (rangeStart < s.length()) { result.add(s.substring(rangeStart)); } 

Yeah, no regexes, sue me. This way you can see how it works more easily.

1 Comment

I tested this out, and I can attest that it works.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.