2

I needed to create a batch recently which reads over a table with millions of rows. The table has about 12 columns and I only need to do a read operation. But I needed all fields therefore I thought about using persistence objects.

I really used only the most basic code only to achieve that and with no tweaks. JPA was quite annoying because it forced me to use custom paging with maxResults and minResults. You can view the approximate code hyperlinks below, if you are interested. There really is nothing else to it, beside the default XML files etc.

The JPA code: http://codeviewer.org/view/code:297e
The Hibernate code: http://codeviewer.org/view/code:297f
The JDBC code: same as above, but with "d" on the end (sorry I can only post 2 links)

The result in time of finished operations was something like that. I am only talking of read-operations:

JPA: Per 5 seconds: 1.000||Per Minute: 12.000||Per Hour: 720.000 Hibernate: Per 5 seconds: 20.000||Per Minute: 240.000||Per Hour: 14.400.000 JDBC: Per 5 seconds: 50.000-80.000||Per Minute: 600.000-960.000||Per Hour: 36.000.000-57.600.000 

I can't explain it, but JPA is ridiculous. It can only be a big bad joke. The funny thing is that it startet with the same speed as the Hibernate code, but after about 30.000 records it became slower and slower until it got stable at 1.000 read operations per 5 seconds. It has reached that point after finishing approximately 100.000 records. But honestly... there is no point in that speed.

Why is that so? Please explain it to me. I really don't know what I'm doing wrong. But I also think it shouldn't be that slow, even with default settings. It can't be and it must not be! In comparison to that Hibernate and JDBC speed is acceptable and stable all the time.

2
  • Provider is org.hibernate.ejb.HibernatePersistence Commented Sep 5, 2012 at 8:08
  • 1
    Have you tried a much bigger page number size (20 is very small, and thus causes a whole lot of queries to be executed in the JPA code), and avoid flushing the EM (since it's a read-only operation, there's nothing to flush, but flushing causes Hibernate to check the dirtyness of all the cached entities). Commented Sep 5, 2012 at 8:34

1 Answer 1

4

With Hibernate you get a good performance using only one query and scrollable results. Unfortunately, this is not currently possible in JPA, and you must execute a query for every result page.

So, you are doing it right. But your page size is only set to 20 results. This is very few, so your code makes a very high number of queries. Try with greater size, for example 10000 results and performance probably will increase. Anyway, I think you won't be able to get numbers close to Hibernate ones.

Sign up to request clarification or add additional context in comments.

1 Comment

This was it! I just tried the 10.000 as pageSize and it works practically as fast as the Hibernate batch. I was just used to the 20 or 50 pagesize, because most tutorials used such low values. My bad. Thanks, you saved my day :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.