8

Java 8 has a way to create a Stream from lines of a file. In this case, foreach will step through lines. I have a text file with following format..

bunch of lines with text $$$$ bunch of lines with text $$$$ 

I need to get each set of lines that goes before $$$$ into a single element in the Stream.

In other words, I need a Stream of Strings. Each string contains the content that goes before $$$$.

What is the best way (with minimum overhead) to do this?

7
  • Take a look at this question: stackoverflow.com/questions/32290278/… or also this one: stackoverflow.com/questions/20746429/… Commented Oct 10, 2016 at 7:00
  • It does not answer my question.. Commented Oct 10, 2016 at 7:21
  • Does is have to use Streams? Commented Oct 10, 2016 at 7:56
  • Yes. There is a way to do this by creating a spliterator from an iterator. I want to avoid that. Commented Oct 10, 2016 at 7:59
  • you need to create a custom predicate Commented Oct 10, 2016 at 8:53

5 Answers 5

2

I couldn't come up with a solution that processes the lines lazily. I'm not sure if this is possible.

My solution produces an ArrayList. If you have to use a Stream, simply call stream() on it.

public class DelimitedFile { public static void main(String[] args) throws IOException { List<String> lines = lines(Paths.get("delimited.txt"), "$$$$"); for (int i = 0; i < lines.size(); i++) { System.out.printf("%d:%n%s%n", i, lines.get(i)); } } public static List<String> lines(Path path, String delimiter) throws IOException { return Files.lines(path) .collect(ArrayList::new, new BiConsumer<ArrayList<String>, String>() { boolean add = true; @Override public void accept(ArrayList<String> lines, String line) { if (delimiter.equals(line)) { add = true; } else { if (add) { lines.add(line); add = false; } else { int i = lines.size() - 1; lines.set(i, lines.get(i) + '\n' + line); } } } }, ArrayList::addAll); } } 

File content:

 bunch of lines with text bunch of lines with text2 bunch of lines with text3 $$$$ 2bunch of lines with text 2bunch of lines with text2 $$$$ 3bunch of lines with text 3bunch of lines with text2 3bunch of lines with text3 3bunch of lines with text4 $$$$

Output:

 0: bunch of lines with text bunch of lines with text2 bunch of lines with text3 1: 2bunch of lines with text 2bunch of lines with text2 2: 3bunch of lines with text 3bunch of lines with text2 3bunch of lines with text3 3bunch of lines with text4 

Edit:

I've finally come up with a solution which lazily generates the Stream:

public static Stream<String> lines(Path path, String delimiter) throws IOException { Stream<String> lines = Files.lines(path); Iterator<String> iterator = lines.iterator(); return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<String>() { String nextLine; @Override public boolean hasNext() { if (nextLine != null) { return true; } while (iterator.hasNext()) { String line = iterator.next(); if (!delimiter.equals(line)) { nextLine = line; return true; } } lines.close(); return false; } @Override public String next() { if (!hasNext()) { throw new NoSuchElementException(); } StringBuilder sb = new StringBuilder(nextLine); nextLine = null; while (iterator.hasNext()) { String line = iterator.next(); if (delimiter.equals(line)) { break; } sb.append('\n').append(line); } return sb.toString(); } }, Spliterator.ORDERED | Spliterator.NONNULL | Spliterator.IMMUTABLE), false); } 

This is actually/coincidentally very similar to the implementation of BufferedReader.lines() (which is internally used by Files.lines(Path)). It may be less overhead not to use both of these methods but instead use Files.newBufferedReader(Path) and BufferedReader.readLine() directly.

Sign up to request clarification or add additional context in comments.

1 Comment

This works. This is similar to what I mentioned in my fourth comment under the question. Can you please delete the ArrayList based answer and include the best performant version of your second code so that I can accept your answer.
2

You can use a Scanner as an iterator and create the stream from it:

private static Stream<String> recordStreamOf(Readable source) { Scanner scanner = new Scanner(source); scanner.useDelimiter("$$$$"); return StreamSupport .stream(Spliterators.spliteratorUnknownSize(scanner, Spliterator.ORDERED | Spliterator.NONNULL), false) .onClose(scanner::close); } 

This will preserve the newlines in the chunks for further filtering or splitting.

Comments

0

There already exists a similar shorter answer, but type.safe is the following, without extra state:

 Path path = Paths.get("... .txt"); try { List<StringBuilder> glist = Files.lines(path, StandardCharsets.UTF_8) .collect(() -> new ArrayList<StringBuilder>(), (list, line) -> { if (list.isEmpty() || list.get(list.size() - 1).toString().endsWith("$$$$\n")) { list.add(new StringBuilder()); } list.get(list.size() - 1).append(line).append('\n'); }, (list1, list2) -> { if (!list1.isEmpty() && !list1.get(list1.size() - 1).toString().endsWith("$$$$\n") && !list2.isEmpty()) { // Merge last of list1 and first of list2: list1.get(list1.size() - 1).append(list2.remove(0).toString()); } list1.addAll(list2); }); glist.forEach(sb -> System.out.printf("------------------%n%s%n", sb)); } catch (IOException ex) { Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex); } 

Instead of .endsWith("$$$$\n") it would be better to do:

.matches("(^|\n)\\$\\$\\$\\$\n") 

Comments

0

Here a solution based on this previous work:

public class ChunkSpliterator extends Spliterators.AbstractSpliterator<List<String>> { private final Spliterator<String> source; private final Predicate<String> delimiter; private final Consumer<String> getChunk; private List<String> current; ChunkSpliterator(Spliterator<String> lineSpliterator, Predicate<String> mark) { super(lineSpliterator.estimateSize(), ORDERED|NONNULL); source=lineSpliterator; delimiter=mark; getChunk=s -> { if(current==null) current=new ArrayList<>(); current.add(s); }; } public boolean tryAdvance(Consumer<? super List<String>> action) { while(current==null || !delimiter.test(current.get(current.size()-1))) if(!source.tryAdvance(getChunk)) return lastChunk(action); current.remove(current.size()-1); action.accept(current); current=null; return true; } private boolean lastChunk(Consumer<? super List<String>> action) { if(current==null) return false; action.accept(current); current=null; return true; } public static Stream<List<String>> toChunks( Stream<String> lines, Predicate<String> splitAt, boolean parallel) { return StreamSupport.stream( new ChunkSpliterator(lines.spliterator(), splitAt), parallel); } } 

which you can use like

try(Stream<String> lines=Files.lines(pathToYourFile)) { ChunkSpliterator.toChunks( lines, Pattern.compile("^\\Q$$$$\\E$").asPredicate(), false) /* chain your stream operations, e.g. .forEach(s -> { s.forEach(System.out::print); System.out.println(); }) */; } 

Comments

-1

You could try

 List<String> list = new ArrayList<>(); try (Stream<String> stream = Files.lines(Paths.get(fileName))) { list = stream .filter(line -> !line.equals("$$$$")) .collect(Collectors.toList()); } catch (IOException e) { e.printStackTrace(); } 

5 Comments

That does not combine the lines between the "$$$$" lines to a single element. Rather it removes these delimeters, leaving you clueless afterwards.
I realised it afterwards but I'm not able to remove my answer. You can concatenate the lines and split with $$$$.
@Isukthar you should be able to remove it by using the link at the bottom left of your answer
I got the message "An error has occurred - please retry your request." when i click on delete.
It might be a temporary issue, otherwise don't hesitate to ask for help on Meta Stack Overflow.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.