I would like to know how to split up a large string into a series of smaller strings or words. For example:
I want to walk my dog.
I want to have a string: "I", another string:"want", etc.
How would I do this?
I would like to know how to split up a large string into a series of smaller strings or words. For example:
I want to walk my dog.
I want to have a string: "I", another string:"want", etc.
How would I do this?
Use split() method
Eg:
String s = "I want to walk my dog"; String[] arr = s.split(" "); for ( String ss : arr) { System.out.println(ss); } As a more general solution (but ASCII only!), to include any other separators between words (like commas and semicolons), I suggest:
String s = "I want to walk my dog, cat, and tarantula; maybe even my tortoise."; String[] words = s.split("\\W+"); The regex means that the delimiters will be anything that is not a word [\W], in groups of at least one [+]. Because [+] is greedy, it will take for instance ';' and ' ' together as one delimiter.
A regex can also be used to split words.
\w can be used to match word characters ([A-Za-z0-9_]), so that punctuation is removed from the results:
String s = "I want to walk my dog, and why not?"; Pattern pattern = Pattern.compile("\\w+"); Matcher matcher = pattern.matcher(s); while (matcher.find()) { System.out.println(matcher.group()); } Outputs:
I want to walk my dog and why not See Java API documentation for Pattern
See my other answer if your phrase contains accentuated characters :
String[] listeMots = phrase.split("\\P{L}+"); فنّى will be split into two words.Yet another method, using StringTokenizer :
String s = "I want to walk my dog"; StringTokenizer tokenizer = new StringTokenizer(s); while(tokenizer.hasMoreTokens()) { System.out.println(tokenizer.nextToken()); } StringTokenizer looks for the consecutive tokens in the string and returns them one by one.To include any separators between words (like everything except all lower case and upper case letters), we can do:
String mystring = "hi, there,hi Leo"; String[] arr = mystring.split("[^a-zA-Z]+"); for(int i = 0; i < arr.length; i += 1) { System.out.println(arr[i]); } Here the regex means that the separators will be anything that is not a upper or lower case letter [^a-zA-Z], in groups of at least one [+].
Use split()
String words[] = stringInstance.split(" "); Using Java Stream API:
String sentence = "I want to walk my dog."; Arrays.stream(sentence.split(" ")).forEach(System.out::println); Output:
I want to walk my dog. Or
String sentence2 = "I want to walk my dog."; Arrays.stream(sentence2.split(" ")).map(str -> str.replace(".", "")).forEach(System.out::println); Output:
I want to walk my dog String[] str = s.split("[^a-zA-Z]+");