In line
words = line.split("\\s+");
you split by regex, which is much slower, than splitting by one char (5 times on my machine). Java split String performances
If the words are exactly separated by only one space, then the solution is simple
words = line.split(" ");
just replace with this line and your code will run faster.
If words can be separated by several spaces, then add such a line after the loop
text.remove("");
and still replace your regex split with 1 char split.
public class Test { public static void main(String[] args) throws IOException { // string contains 1, 2 and two spaces between 1 and 2. text size should be 2 String txt = "1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" + "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" + "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" + "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" + "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" + "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1"; InputStream inpstr = new ByteArrayInputStream(txt.getBytes()); BufferedReader read = new BufferedReader(new InputStreamReader(inpstr)); Set<String> text = new TreeSet<>(); String[] words; String line; long startTime = System.nanoTime(); while ((line = read.readLine()) != null) { //words = line.split("\\s+"); -- runs 5 times slower words = line.split(" "); for (int i = 0; i < words.length; i++) { text.add(words[i]); } } text.remove(""); // add only if words can be separated with multiple spaces long endTime = System.nanoTime(); System.out.println((endTime - startTime) + " " + text.size()); } }
Also you can replace your for loop with
text.addAll(Arrays.asList(words));
Scannerclass instead of BufferReader?words[0].length() > 0" condition in the loop guard, as this stops adding anything if the string starts with a space, even if there are words after. Put that as a conditional inside the loop. (And just use a for each loop, no need to faff with array indices).