5

How can I split the following word in to an array

That's the code

into

array 0 That 1 s 2 the 3 code 

I tried something like this

String str = "That's the code"; String[] strs = str.split("\\'"); for (String sstr : strs) { System.out.println(sstr); } 

But the output is

That s the code 
5
  • 1
    Why not use space when splitting? And take care of special characters like '. Commented Dec 22, 2013 at 9:43
  • Also when we say split a string to words, we mean That's the code to That's, the and code. Commented Dec 22, 2013 at 9:45
  • That's the code is equivalent to That is the code. I'm comparing sentence. Commented Dec 22, 2013 at 9:47
  • @herohuyongtao: why would we mean that? That's is two words: That and the contraction of is to s. Commented Dec 22, 2013 at 9:48
  • @JBNizet It depends what you mean by words. You are right when comparing sentence where treating What's to What is. :) Commented Dec 22, 2013 at 9:49

8 Answers 8

19

To specifically split on white space and the apostrophe:

public class Split { public static void main(String[] args) { String [] tokens = "That's the code".split("[\\s']"); for(String s:tokens){ System.out.println(s); } } } 

or to split on any non word character:

public class Split { public static void main(String[] args) { String [] tokens = "That's the code".split("[\\W]"); for(String s:tokens){ System.out.println(s); } } } 
Sign up to request clarification or add additional context in comments.

3 Comments

what the difference between [\\W] and [\\s']
\\W represents a non-word character which is any character that is not a-z, A-Z, 0-9, including the _ (underscore) character. \\s represents a white space, so tabs, spaces, line breaks, etc. If I were to add something in parens () to the String \\W would split on each paren, however the \\s version would not.
Do you now the runtime complexity for this method?
6

The best solution I've found to split by words if your string contains accentuated letters is :

String[] listeMots = phrase.split("\\P{L}+"); 

For instance, if your String is

String phrase = "Salut mon homme, comment ça va aujourd'hui? Ce sera Noël puis Pâques bientôt."; 

Then you will get the following words (enclosed within quotes and comma separated for clarity) :

"Salut", "mon", "homme", "comment", "ça", "va", "aujourd", "hui", "Ce", "sera", "Noël", "puis", "Pâques", "bientôt". 

Hope this helps!

Comments

4

You can split according to non-characters chars:

String str = "That's the code"; String[] splitted = str.split("[\\W]"); 

For your input, output will be:

That s the code 

Comments

1

You can split by a regex that would be one of the two characters - quote or space:

String[] strs = str.split("['\\s]"); 

Comments

1

You should first replace the ' with " " (blank space), using str.replaceAll("'", " ") and then you can split the string on the blank space separator, using str.split(" ").You could alternatively use a regular expression to split on ' OR space.

Comments

1

If you want to split on non alphabetic chars

String str = "That's the code"; String[] strs = str.split("\\P{Alpha}+"); for (String sstr : strs) { System.out.println(sstr); } 

\P{Alpha} matches any non-alphabetic character and this is called POSIX character you can read more about it in this link It is very useful. + indicates that we should split on any continuous string of such characters.

and the output will be

That s the code 

1 Comment

+1 for Unicode version but this code may be not very clear for someone new to regex so you probably should expand your answer a little.
0

You can use OR in regular expression

public static void main(String[] args) { String str = "That's the code"; String[] strs = str.split("'|\\s"); for (String sstr : strs) { System.out.println(sstr); } } 

The string will be split by single quote (') or space. The single quote doesn't need to be escaped. The output would be

run: That s the code BUILD SUCCESSFUL (total time: 0 seconds) 

Comments

0

split uses regex and in regex ' is not special character so you don't need to escape it with \. To represent whitespaces you can use \s (which in String needs to be written as "\\s"). Also to create set of characters you can use "OR" operator | like a|b|c|d, or just use character class [abcd] which means exactly the same as (a|b|c|d).

To makes things simple you can use

String[] strs = str.split("'| "); 

or

String[] strs = str.split("'|\\s");//to include all whitespaces 

or

String[] strs = str.split("['\\s]");//equivalent of "'|\\s" 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.