2

I use Spring Data, Spring Boot, and Hibernate as JPA provider and I want to improve performance in bulk inserting.

I refer to this link to use batch processing:

http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch15.html

This is my code and my application.properties for insert batching experiment.

My service:

 @Value("${spring.jpa.properties.hibernate.jdbc.batch_size}") private int batchSize; @PersistenceContext private EntityManager em; @Override @Transactional(propagation = Propagation.REQUIRED) public SampleInfoJson getSampleInfoByCode(String code) { // SampleInfo newSampleInfo = new SampleInfo(); // newSampleInfo.setId(5L); // newSampleInfo.setCode("SMP-5"); // newSampleInfo.setSerialNumber(10L); // sampleInfoDao.save(newSampleInfo); log.info("starting... inserting..."); for (int i = 1; i <= 5000; i++) { SampleInfo newSampleInfo = new SampleInfo(); // Long id = (long)i + 4; // newSampleInfo.setId(id); newSampleInfo.setCode("SMPN-" + i); newSampleInfo.setSerialNumber(10L + i); // sampleInfoDao.save(newSampleInfo); em.persist(newSampleInfo); if(i%batchSize == 0){ log.info("flushing..."); em.flush(); em.clear(); } } 

part of application.properties that related to batching:

spring.jpa.properties.hibernate.jdbc.batch_size=100 spring.jpa.properties.hibernate.cache.use_second_level_cache=false spring.jpa.properties.hibernate.order_inserts=true spring.jpa.properties.hibernate.order_updates=true 

Entity class:

@Entity @Table(name = "sample_info") public class SampleInfo implements Serializable{ private Long id; private String code; private Long serialNumber; @Id @GeneratedValue( strategy = GenerationType.SEQUENCE, generator = "sample_info_seq_gen" ) @SequenceGenerator( name = "sample_info_seq_gen", sequenceName = "sample_info_seq", allocationSize = 1 ) @Column(name = "id") public Long getId() { return id; } public void setId(Long id) { this.id = id; } @Column(name = "code", nullable = false) public String getCode() { return code; } public void setCode(String code) { this.code = code; } @Column(name = "serial_number") public Long getSerialNumber() { return serialNumber; } public void setSerialNumber(Long serialNumber) { this.serialNumber = serialNumber; } } 

Running the service above batch inserting 5000 rows took 30 to 35 seconds to complete, but if comment these lines:

if(i%batchSize == 0){ log.info("flushing..."); em.flush(); em.clear(); } 

inserting 5000 rows took only 5 to 7 seconds, faster than batch mode.

Why is it slower when using batch mode?

1 Answer 1

1

That because EntityManager don't persist data in database immediately. And when you call flush() data will be persisted. When you comment those lines, EntityManager flushes data depending on flush-mode parameter, calling flush directly you tell EntityManager execute queries in database.

Sign up to request clarification or add additional context in comments.

4 Comments

So the result I got is normal then? Using batching will always be slower than not using batching, right?
Result is normal. Batching needed when yours entities can overflow hibernate session-level cache, but it's must be a lot of entities (ten times larger, then yours example). If you don't know size of entities array you can add batching, but increase batchSize to 10 000 etc
" EntityManager flushes data when transactions are committed" I think this statement is wrong. Default Hibernate flush mode is AUTO and you can not quarentee the flush time. If flush mode is explicitly is set to COMMIT, yes this time you are right.
jit, you are right, I tryed to say the same in last sentence. I will correct the answer

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.