menu

Friday, October 10, 2014

Using Batch Statements In Java

Introduction:

     Using batch statements is closely related with prepared statement usage, and it is also has huge performance benefit. If you need to insert,update or delete more than one tables or one tables with more than one times in the same connection you should use batching. By using batching , you will go to database only one time.If you use more than one tables in a batch statements it is called heterogen batch, otherwise it is called homogen batch. In heterogen batch you have to use addBatch(String sql) overloaded method of  statement object and you cannot use prepared statement. Below is an example of homogen batching.

String CONN_STRING = "jdbc:mysql://localhost:3306/DB_NAME?user=USER&password=PWD";
String UPDATE_SQL = "UPDATE User SET age=? WHERE name=?";
     Connection conn = null;
     PreparedStatement pstmt = null;
     try {
              Integer[] ages = new Integer[] { 5, 10, 15};
              String[] names = new String[] { “xxx”, “yyy”, “zzz”};
              Class.forName("com.mysql.jdbc.Driver");
              conn = DriverManager.getConnection(CONN_STRING);
              conn.setAutoCommit(false);
              pstmt = conn.prepareStatement(UPDATE_SQL);
              for (int i = 0; i < ages.length; i++) {
                    pstmt.setInt(1, ages[i]);
                    pstmt.setString(2, names [i]);
                    pstmt.addBatch();
                    }
              pstmt.executeBatch();
              conn.commit();
             } catch (SQLException e) {
                      try {
                              if (conn != null && !conn.isClosed())
                                    conn.rollback();
                                    // TODO logging...
                             } catch (SQLException e1) {
                                      // TODO logging...
                             }
            } catch (Exception e) {
                        try {
                                if (conn != null && !conn.isClosed())
                                    conn.rollback();
                                    // TODO logging...
                                } catch (SQLException e1) {
                                      // TODO logging...
                                }
             } finally {
                          try {
                                  if (pstmt != null)
                                         pstmt.close();
                                  if (conn != null && !conn.isClosed())
                                         conn.close();
                            } catch (SQLException e) {
                                      // TODO logging...
                            }
             }                

Some important points of using batch statements are;
  • You can get the number of rows that were effected by each SQL statement from the array that   the executeBatch invocation returns. This number does not include rows that were affected by triggers or by referential integrity enforcement.
  • Even if the autocommit mode is on, all the data will be committed to database after executing   executeBatch.
  • When you use batching, you go to the database only for one time. However the database engine doesn’t  necessarily execute all queries in one process.For example in DB2 you have to enable multi-row insert property to execute all batched queries in one process.In this case the executeBatch will not return the effected rows, instead return Statement.SUCCESS_NO_INFO (-2).This will give better  performance since the database engine doesn’t need to return the effected rows, and the whole batch  will be executed in one process.
  • If you have heavy insert or update statements, you can do the insert or update to memory, and later can write to database by an asynchron background thread.

Using single sql instead of batch:

     You can use single sql for insert operation instead of batch statement. In this way you do multi insert with one sql statement, and again using prepared statement, you can pass your parameters.Single sql statement executed about 10 times faster than batch statements in my machine, since the database engine execute it in a single process like multi-row insert enabled DB2.(mentioned above in Batch statements)
For update or delete operations you can use IN keyword to use a single sql for multi operation.
      
Oracle example:
INSERT ALL
  INTO test (id, name) VALUES (1, 'xxx')
  INTO test (id, name) VALUES (2, 'yyy')
  INTO test (id, name) VALUES (3, 'zzz')
SELECT * FROM dual;
Db2,Mysql, Mssql, Postgre Sql example:
INSERT INTO test (id, name) VALUES (1, 'xxx'), (2, 'yyy'), (3, 'zzz');

Note: In mysql you can use rewriteBatchedStatements=true property in connection string. It creates a single sql statement for all batch data, and send to database to execute in one process. That means its faster than standart batching.However, its slower than single sql statement approach because of the cost of creating single sql statement from all batch data. You can also tune max_allowed_packet property pf mysql to limit number of rows that can be executed in a batch statement. Depending on the processor and database engine, number of rows that can be executed in a batch may vary and you may have to limit max number of rows in a batch.

Conclusion

     Batch statements are crucial in performance related applications, since it reduces the database calls, and improve response times. You can also choose to run single sql statements instead of batching to get faster results. Be careful about that, batch statements are not necessarily executed in a single process in database side, that is you may need to enable database specific properties to enable multi-row operations in one process.

No comments:

Post a Comment