0

I want to insert 700 million rows to a table which is defined in a following way.

CREATE TABLE KeywordIndex (id INT PRIMARY KEY AUTO_INCREMENT, keyValue VARCHAR(45) NOT NULL, postings LONGTEXT NOT NULL); 

To insert data in the table I first check if the keyValue exists I update the value of postings by concatenating new value to old value. Otherwise, insert data as a new row of the table. Also, if the size of postings is bigger than its definition I consider a new row to write extension of postings of the keyValue. In my implementation, inserting 70,294 entry took 12 hours!!!!

( I am not a database expert, so the code I've written could be based on wrong foundations. Please help me to understand my mistakes :) )

I read this page but I could not find a solution for my problem.

I add code that I wrote to do this process.

 public void writeTermIndex( HashMap<String, ArrayList<TermPosting>> finalInvertedLists) { try { for (String key : finalInvertedLists.keySet()) { int exist=ExistTerm("KeywordIndex",key); ArrayList<TermPosting> currentTermPostings=finalInvertedLists.get(key); if (exist>0) { String postings=null; String query = "select postings from KeywordIndex where keyValue=?"; PreparedStatement preparedStmt = conn.prepareStatement(query); preparedStmt.setString (1, key); ResultSet rs=preparedStmt.executeQuery(); if(rs.next()) postings=rs.getString("postings"); postings=postings+convertTermPostingsToString(currentTermPostings); if(getByteSize(postings)>65530) insertUpdatePostingList("KeywordIndex",key,postings); else{ updatePosting("KeywordIndex",key,postings); rs.close(); preparedStmt.close(); } } else { String postings=convertTermPostingsToString(currentTermPostings); if(getByteSize(postings)>65530) insertPostingList("KeywordIndex",key,postings); else insetToHashmap("KeywordIndex",key,postings); } } } catch(Exception e){ e.printStackTrace(); } } 
11
  • Exactly how is this code not working? Commented Nov 12, 2015 at 15:29
  • 4
    Consider bulk load then call a procedure to combine records between the tables, then drop the bulk loaded table. The individual calls back and forth from your application to database are going to be SLOW... The overhead for packet, overhead for database connection and call. IMO better off bulk loading and let the database "MERGE" the sets in a procedure to eliminate traffic and overhead. The procedure is just two statements insert all non existing records, update existing records. so much faster all in db Commented Nov 12, 2015 at 15:37
  • @MarcB it works but slowly. Commented Nov 12, 2015 at 15:47
  • @xQbert Could you please explain more? what do you mean by combine records between tables? I only have one table. I only one time connect to my database. Commented Nov 12, 2015 at 15:52
  • 1
    The bulk Load takes the date from the source file containing the 700 million rows and loads it into a second table. once the database has both tables the queries to combine the two into one become simple. Add index on keys to loaded table. Then Insert into original table from loaded table where the id in loaded table is not in original table. Then update original table from loaded table setting the new value = oldvalue + new value. then drop the loaded table. The logic to combine is done in the database. Overhead of processing each row from app server to db server is just too slow. Commented Nov 12, 2015 at 15:55

1 Answer 1

0

You should think about using executeBatch() for insert (I'm not talking about the load part of your request). Depending on your database, performances can change a lot (see benchmark at the end of this page)(I once tested it with oracle database)

Something like :

PreparedStatement statement = null; try { statement = getConnection().prepareStatement(insertQuerry); for (/*...*/) { statement.clearParameters(); statement.setString(1, "Hi"); statement.addBatch(); } statement.executeBatch(); } catch (SQLException se) { //Handle exception } finally { //Close everything } 
Sign up to request clarification or add additional context in comments.

1 Comment

Could you explain more you said in insertion part I use executeBatch but I insert after some checking that this keyValue is exist or not? Could you please explain with more details

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.