2

I'm using a chunk step with a reader and writer. I am reading data from Hive with 50000 chunk size and insert into mysql with same 50000 commit.

@Bean public JdbcBatchItemWriter<Identity> writer(DataSource mysqlDataSource) { return new JdbcBatchItemWriterBuilder<Identity>() .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>()) .sql(insertSql) .dataSource(mysqlDataSource) .build(); } 

When i have started dataload and insert into Mysql its commiting very slow and 100000 records are takiing more than a hr to load while same loader with Gemfire loading 5 million recordsin 30 min.

seems like it insert one by one insted of batch as laoding 1500 then 4000 then ....etc ,does anyone faced same issue ?

3
  • please share the sql statement insertSql that you are using Commented Feb 10, 2020 at 4:48
  • I have only 8 column and sql is simple however its inserting records one by one INSERT INTO table_name (column1, column2, column3, ...) VALUES (value1, value2, value3, ...); Commented Feb 10, 2020 at 5:24
  • 3
    seems like it insert one by one insted of batch: The JdbcBatchItemWriter does not inset items one by one, it inserts them with a JDBC batch update in a single transaction: github.com/spring-projects/spring-batch/blob/…. Similar to the answer by @Binu, try to write a custom SqlParameterSourceProvider that does not use reflection and see if it improves performance. Commented Feb 10, 2020 at 7:38

2 Answers 2

2

Since you are using BeanPropertyItemSqlParameterSourceProvider, this will include lot of reflection to set variables in prepared statement.This will increase time.

If speed is the your high priority try implementing your own ItemWriter as given below and use prepared statement batch to execute update.

@Component public class CustomWriter implements ItemWriter<Identity> { //your sql statement here private static final String SQL = "INSERT INTO table_name (column1, column2, column3, ...) VALUES (?,?,?,?);"; @Autowired private DataSource dataSource; @Override public void write(List<? extends Identity> list) throws Exception { PreparedStatement preparedStatement = dataSource.getConnection().prepareStatement(SQL); for (Identity identity : list) { // Set the variables preparedStatement.setInt(1, identity.getMxx()); preparedStatement.setString(2, identity.getMyx()); preparedStatement.setString(3, identity.getMxt()); preparedStatement.setInt(4, identity.getMxt()); // Add it to the batch preparedStatement.addBatch(); } int[] count = preparedStatement.executeBatch(); } } 

Note: This is a rough code. So Exception handling and resource handling is not done properly. You can work on the same. I think this will improve your writing speed very much.

Sign up to request clarification or add additional context in comments.

1 Comment

Many thanks for this. The only modification I found necessary was that the implemented method takes a Chunk not a List e.g. public void write(Chunk<? extends Foo> chunk) throws Exception { ... }
0

Try Adding ";useBulkCopyForBatchInsert=true" to your connection url.

Connection con = DriverManager.getConnection(connectionUrl + ";useBulkCopyForBatchInsert=true") 

Source : https://learn.microsoft.com/en-us/sql/connect/jdbc/use-bulk-copy-api-batch-insert-operation?view=sql-server-ver15

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.