How to split a string with any whitespace chars as delimiters

Question

What regex pattern would need I to pass to java.lang.String.split() to split a String into an Array of substrings using all whitespace characters (' ', '\t', '\n', etc.) as delimiters?

rogerdpack · Accepted Answer · 2020-11-12 17:39:06Z

Something in the lines of

myString.split("\\s+");

This groups all white spaces as a delimiter.

So if I have the string:

"Hello[space character][tab character]World"

This should yield the strings "Hello" and "World" and omit the empty space between the [space] and the [tab].

As VonC pointed out, the backslash should be escaped, because Java would first try to escape the string to a special character, and send that to be parsed. What you want, is the literal "\s", which means, you need to pass "\\s". It can get a bit confusing.

The \\s is equivalent to [ \\t\\n\\x0B\\f\\r].

Note that you need to trim() first: trim().split("\\s++") - otherwise, e.g. splitting ` a b c` will emit two empty strings first.
Why did you use four backslashes near the end of your answer? ie. "\\\\s"?
"".trim().split("\\s+") - empty string split gives you a length of 1. "term".trim().split("\\s+") - gives you also a length of 1.

Amit Joki · Accepted Answer · 2015-04-12 05:19:08Z

In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:

\w - Matches any word character.

\W - Matches any nonword character.

\s - Matches any white-space character.

\S - Matches anything but white-space characters.

\d - Matches any digit.

\D - Matches anything except digits.

A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.

Useful link : docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/…
Read Pattern class JavaDoc: docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Andy Thomas · Accepted Answer · 2016-07-29 16:17:25Z

67

To get this working in Javascript, I had to do the following:

myString.split(/\s+/g)

edited Jul 29, 2016 at 16:17

Andy Thomas

86.8k12 gold badges111 silver badges160 bronze badges

answered Mar 1, 2012 at 22:18

Mike Manard

1,1249 silver badges14 bronze badges

1 Comment

O-9 Jan 29 at 12:52

This question is about Java

VonC · Accepted Answer · 2008-10-22 11:29:25Z

37

"\\s+" should do the trick

answered Oct 22, 2008 at 11:29

VonC

1.4m569 gold badges4.8k silver badges5.7k bronze badges

2 Comments

Floella Over a year ago

Why the + at the end?

VonC Over a year ago

@Anarelle it repeats the space character capture at least once, and as many time as possible: see https://regex101.com/r/dT7wG9/1 or http://rick.measham.id.au/paste/explain.pl?regex=\s%2B or http://regexper.com/#^s%2B or http://www.myezapp.com/apps/dev/regexp/show.ws?regex=\s+&env=env_java

jake_astub · Accepted Answer · 2014-09-09 03:29:23Z

13

Also you may have a UniCode non-breaking space xA0...

String[] elements = s.split("[\\s\\xA0]+"); //include uniCode non-breaking

answered Sep 9, 2014 at 3:29

jake_astub

3404 silver badges11 bronze badges

1 Comment

Investigator Over a year ago

Indeed me too. I found this character at a response from ElasticSearch while I was trying to update the index aliases. The simple \\s+ did not have the desired effect.

Felix Scheffer · Accepted Answer · 2013-12-01 17:10:18Z

Apache Commons Lang has a method to split a string with whitespace characters as delimiters:

StringUtils.split("abc def")

http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split(java.lang.String)

This might be easier to use than a regex pattern.

Arrow · Accepted Answer · 2016-03-31 18:54:35Z

10

String string = "Ram is going to school"; String[] arrayOfString = string.split("\\s+");

edited Mar 31, 2016 at 18:54

answered Mar 31, 2016 at 18:14

Arrow

1653 silver badges12 bronze badges

Comments

Wiktor Stribiżew · Accepted Answer · 2020-08-19 08:06:52Z

To split a string with any Unicode whitespace, you need to use

s.split("(?U)\\s+") ^^^^

The (?U) inline embedded flag option is the equivalent of Pattern.UNICODE_CHARACTER_CLASS that enables \s shorthand character class to match any characters from the whitespace Unicode category.

If you want to split with whitespace and keep the whitespaces in the resulting array, use

s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")

See the regex demo. See Java demo:

String s = "Hello\t World\u00A0»"; System.out.println(Arrays.toString(s.split("(?U)\\s+"))); // => [Hello, World, »] System.out.println(Arrays.toString(s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)"))); // => [Hello, , World, , »]

Ria · Accepted Answer · 2014-02-17 08:34:56Z

Since it is a regular expression, and i'm assuming u would also not want non-alphanumeric chars like commas, dots, etc that could be surrounded by blanks (e.g. "one , two" should give [one][two]), it should be:

myString.split(/[\s\W]+/)

RajeshVijayakumar · Accepted Answer · 2014-09-01 13:40:50Z

you can split a string by line break by using the following statement :

 String textStr[] = yourString.split("\\r?\\n");

you can split a string by Whitespace by using the following statement :

String textStr[] = yourString.split("\\s+");

Skywalker · Accepted Answer · 2016-02-02 11:36:08Z

1

String str = "Hello World"; String res[] = str.split("\\s+");

edited Feb 2, 2016 at 11:36

Skywalker

1,5861 gold badge18 silver badges37 bronze badges

answered Apr 12, 2015 at 4:04

Olivia Liao

3933 silver badges7 bronze badges

Comments

Kenny · Accepted Answer · 2024-04-08 17:50:42Z

Alternatively:

myString.split("\\p{Space}+")

This performs similarly to "\\s+" but is perhaps more clear.

SonarLint and other static code analysis tools will actually throw a warning for using either \\s+ or \\{Space} without (?U) or Pattern.UNICODE_CHARACTER_CLASS. SonarSource states:

When using POSIX classes, Unicode support should be enabled by either passing Pattern.UNICODE_CHARACTER_CLASS as a flag to Pattern.compile or by using (?U) inside the regex.

So it would be:

Pattern.compile("(?U)\\p{Space}+");

or

Pattern.compile("\\p{Space}+", Pattern.UNICODE_CHARACTER_CLASS);

Sources:

Java SonarLint rule
Pattern Javadocs (where \p{Space} is documented)

Pardon me, but I could not find, on the two links you provided, an explicit statement that \p{Space} will match all Unicode whitespace characters. What am I missing?

Risith Ravisara · Accepted Answer · 2016-10-24 14:08:00Z

Study this code.. good luck

 import java.util.*; class Demo{ public static void main(String args[]){ Scanner input = new Scanner(System.in); System.out.print("Input String : "); String s1 = input.nextLine(); String[] tokens = s1.split("[\\s\\xA0]+"); System.out.println(tokens.length); for(String s : tokens){ System.out.println(s); } } }

Collectives™ on Stack Overflow

How to split a string with any whitespace chars as delimiters

13 Answers 13

3 Comments

2 Comments

1 Comment

2 Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

3 Comments

2 Comments

1 Comment

2 Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

1 Comment

Linked

Related