2

There are mulitple questions for streams but for this usecase & in java, didnt find any.

I have a huge stream of objects Stream<A> [~1Million objects]. StreamA comes from a file.

Class A { enum status [Running,queued,Completed], String name } 

I want to split Stream<A> into three streams without using any Collect statements. Collect statement loads everything into memory.

I am facing StackOverflowException as I am calling stream.concat multiple times here.

Stream.Concat has problem mentioned in Java Docs "Implementation Note: Use caution when constructing streams from repeated concatenation. Accessing an element of a deeply concatenated stream can result in deep call chains, or even StackOverflowException."

Map<Status, Stream<String>> splitStream = new HashMap<>(); streamA.foreach(aObj -> Stream<String> statusBasedStream = splitStream.getOrDefault(aObj.status,Stream.of()); splitStream.put(aObj.status, Stream.concat(statusBasedStream, Stream.of(aObj.name))); 

There are few options where custom streams are available in github to achieve Concatenation but wanted to use standard libraries to solve this.

If data is smaller would have taken a list approach as mentioned here (Split stream into substreams with N elements)

11
  • 1
    "Collect statement loads everything into memory." Sure, but putting everything into a map also loads everything into memory. There's no laziness there. Commented May 5, 2020 at 0:38
  • 4
    This feels like an XY problem. What are you really trying to achieve? Commented May 5, 2020 at 1:17
  • 1
    What are you going to do with the almost 1 million names that are status Completed, if not load them into memory? Commented May 5, 2020 at 1:19
  • 2
    @VishwaramSankaran What is your issue with having 6 open files? That is nothing, and it is what you need to do for this. Commented May 5, 2020 at 1:50
  • 3
    A million strings are not that impressive. The fact that you didn’t even realize that all strings are in memory when you collect them into Stream.Builder instances, indicates that there is no actual heap memory problem with them. You didn’t say where this Stream<A> does come from. Further, it’s not clear what problems you get with Stream.concat, when you talk about splitting, the very opposite of what concat does. Commented May 5, 2020 at 8:55

1 Answer 1

1

Not the exact solution of the problem but if you have information about the indexes then combination of Stream.skip() and Stream.limit() can help in this - Below is the dummy code that I tried -

 int queuedNumbers = 100; int runningNumbers=200; Stream<Object> all = Stream.of(); Stream<Object> queuedAndCompleted = all.skip(queuedNumbers); Stream<Object> queued = all.limit(queuedNumbers); Stream<Object> running = queuedAndCompleted.limit(runningNumbers); Stream<Object> completed = queuedAndCompleted.skip(runningNumbers); 

Hope it would be of some help.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.