5

I'm having something like:

 List<Data> dataList = stepts.stream() .flatMap(step -> step.getPartialDataList().stream()) .collect(Collectors.toList()); 

So I'm combining into dataList multiple lists from every step.

My problem is that dataList might run into OutOfMemoryError. Any suggestions on how I can batch the dataList and save the batches into db?

My primitive idea is to:

 for (Step step : steps) { List<Data> partialDataList = step.getPartialDataList(); if (dataList.size() + partialDataList.size() <= MAXIMUM_SIZE) { dataList.addAll(partialDataList); } else { saveIntoDb(dataList); dataList = new ArrayList<>(); } } 

PS: I know there is this post, but the difference is that I might not be able to store whole data in memory.

LE: getPartialDataList metod is more like createPartialDataList()

7
  • are you performing any other operations on the data before saving the batches to the DB? Commented Nov 19, 2019 at 16:52
  • might? have you tried? Commented Nov 19, 2019 at 16:57
  • What could be relevant to the question further would be, what data size are we talking here for the List<Data>? How is getPartialDataList implemented? Does it keep a cursor over reads from the data? Commented Nov 19, 2019 at 17:00
  • @vphilipnyc no, just saving it Commented Nov 19, 2019 at 21:38
  • @Eugene, might, yes. I'm trying to figure out all corner cases from a prod env. Commented Nov 19, 2019 at 21:38

1 Answer 1

5

If your concern is OutOfMemoryError you probably shouldn't create additional intermediate data structures like lists or streams before saving to the database.

Since the Step.getPartialDataList() already returns List<Data> the data is already in the memory, unless you have your own List implementation. You just need to use JDBC batch insert:

PreparedStatement ps = c.prepareStatement("INSERT INTO data VALUES (?, ?, ...)"); for (Step step : steps) { for (Data data : step.getPartialDataList()) { ps.setString(1, ...); ps.setString(2, ...); ... ps.addBatch(); } } ps.executeBatch(); 

There is no need to chunk into smaller batches prematurely with dataList. First see what your database and JDBC driver are supporting before doing premature optimizations.

Do note that for most databases the right way to insert large amount of data is an external utility and not JDBC e.g. PostgreSQL has COPY.

Sign up to request clarification or add additional context in comments.

3 Comments

Best case would have been to do a bulk insert with all data. But in a prod env there might be a small chance to get OutOfMemory. On step.getPartialDataList() I will create data on the fly, maybe the naming here is not the best. But surrely the enitre data is not in memory
@UnguruBulan I'd argue this answer is still the right one for your use-case. You can't process less data at a time than the result of one getPartialDataList()
I want to process more data than the result of getPartialDataList()

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.