601

What regex pattern would need I to pass to java.lang.String.split() to split a String into an Array of substrings using all whitespace characters (' ', '\t', '\n', etc.) as delimiters?

13 Answers 13

1017

Something in the lines of

myString.split("\\s+"); 

This groups all white spaces as a delimiter.

So if I have the string:

"Hello[space character][tab character]World" 

This should yield the strings "Hello" and "World" and omit the empty space between the [space] and the [tab].

As VonC pointed out, the backslash should be escaped, because Java would first try to escape the string to a special character, and send that to be parsed. What you want, is the literal "\s", which means, you need to pass "\\s". It can get a bit confusing.

The \\s is equivalent to [ \\t\\n\\x0B\\f\\r].

Sign up to request clarification or add additional context in comments.

3 Comments

Note that you need to trim() first: trim().split("\\s++") - otherwise, e.g. splitting ` a b c` will emit two empty strings first.
Why did you use four backslashes near the end of your answer? ie. "\\\\s"?
"".trim().split("\\s+") - empty string split gives you a length of 1. "term".trim().split("\\s+") - gives you also a length of 1.
92

In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:

\w - Matches any word character.

\W - Matches any nonword character.

\s - Matches any white-space character.

\S - Matches anything but white-space characters.

\d - Matches any digit.

\D - Matches anything except digits.

A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.

2 Comments

67

To get this working in Javascript, I had to do the following:

myString.split(/\s+/g) 

1 Comment

This question is about Java
37

"\\s+" should do the trick

2 Comments

Why the + at the end?
13

Also you may have a UniCode non-breaking space xA0...

String[] elements = s.split("[\\s\\xA0]+"); //include uniCode non-breaking 

1 Comment

Indeed me too. I found this character at a response from ElasticSearch while I was trying to update the index aliases. The simple \\s+ did not have the desired effect.
10

Apache Commons Lang has a method to split a string with whitespace characters as delimiters:

StringUtils.split("abc def") 

http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split(java.lang.String)

This might be easier to use than a regex pattern.

Comments

10
String string = "Ram is going to school"; String[] arrayOfString = string.split("\\s+"); 

Comments

2

To split a string with any Unicode whitespace, you need to use

s.split("(?U)\\s+") ^^^^ 

The (?U) inline embedded flag option is the equivalent of Pattern.UNICODE_CHARACTER_CLASS that enables \s shorthand character class to match any characters from the whitespace Unicode category.

If you want to split with whitespace and keep the whitespaces in the resulting array, use

s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)") 

See the regex demo. See Java demo:

String s = "Hello\t World\u00A0»"; System.out.println(Arrays.toString(s.split("(?U)\\s+"))); // => [Hello, World, »] System.out.println(Arrays.toString(s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)"))); // => [Hello, , World, , »] 

Comments

1

Since it is a regular expression, and i'm assuming u would also not want non-alphanumeric chars like commas, dots, etc that could be surrounded by blanks (e.g. "one , two" should give [one][two]), it should be:

myString.split(/[\s\W]+/) 

Comments

1

you can split a string by line break by using the following statement :

 String textStr[] = yourString.split("\\r?\\n"); 

you can split a string by Whitespace by using the following statement :

String textStr[] = yourString.split("\\s+"); 

Comments

1
String str = "Hello World"; String res[] = str.split("\\s+"); 

Comments

0

Alternatively:

myString.split("\\p{Space}+") 

This performs similarly to "\\s+" but is perhaps more clear.

SonarLint and other static code analysis tools will actually throw a warning for using either \\s+ or \\{Space} without (?U) or Pattern.UNICODE_CHARACTER_CLASS. SonarSource states:

When using POSIX classes, Unicode support should be enabled by either passing Pattern.UNICODE_CHARACTER_CLASS as a flag to Pattern.compile or by using (?U) inside the regex.

So it would be:

Pattern.compile("(?U)\\p{Space}+"); 

or

Pattern.compile("\\p{Space}+", Pattern.UNICODE_CHARACTER_CLASS); 

Sources:

2 Comments

Pardon me, but I could not find, on the two links you provided, an explicit statement that \p{Space} will match all Unicode whitespace characters. What am I missing?
@Abra You are correct, I've edited my post.
-2

Study this code.. good luck

 import java.util.*; class Demo{ public static void main(String args[]){ Scanner input = new Scanner(System.in); System.out.print("Input String : "); String s1 = input.nextLine(); String[] tokens = s1.split("[\\s\\xA0]+"); System.out.println(tokens.length); for(String s : tokens){ System.out.println(s); } } } 

1 Comment

Can you please detail your answer?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.