Tuesday, March 8, 2011

Quick tip on hibernate batch operation

You can find the details of what batch operation with respect to DB is and related Hibernate guide at http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html . Any time you need to do bulk inserts/updates, it's better to do in batch to get a better performance. Recently I was developing a small app and followed Hibernate guide to make sure that my configurations were correct. Even after trying several things, I kept seeing following in log generated by Hibernate:

Hibernate: insert into Member (name, id) values (?, ?)
Hibernate: insert into Member (name, id) values (?, ?)
Hibernate: insert into Member (name, id) values (?, ?)
Hibernate: insert into Member (name, id) values (?, ?)
Hibernate: insert into Member (name, id) values (?, ?) 


But batch operation is supposed to generate statements like


insert into Member (name, id) values ('name0', 1),('name1', 2),('name2', 3),('name3', 4),('name4', 5),('name5', 6),('name6', 7),('name7', 8),('name8', 9),('name9', 10),('name10', 11),('name11', 12),('name12', 13),('name13', 14),('name14',
15),('name15', 16),('name16', 17),('name17', 18),('name18', 19),('name19', 20)


It turns out that even when batch operation is in progress, hibernate log shows individual operation. If you don't know this, you might be up for a ride figuring out what the heck is going on. The best way to make sure is to turn on the query logs in DB itself. For MySQL5 installed on Fedora, you do it by adding log parameter at /etc/my.conf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
log=/var/log/mysqld.log


You should see statements like following at /var/log/mysqld.log :

2 Query insert into Member (name, id) values ('name0', 1),('name1', 2),('name2', 3),('name3', 4),('name4', 5),('name5', 6),('name6', 7),('name7', 8),('name8', 9),('name9', 10),('name10', 11),('name11', 12),('name12', 13),('name13', 14),('name14',
15),('name15', 16),('name16', 17),('name17', 18),('name18', 19),('name19', 20)
2 Query    commit


The complete test project is at http://anonsvn.jboss.org/repos/qa/prabhat/batch-insert . The relevant hibernate configuration is at http://anonsvn.jboss.org/repos/qa/prabhat/batch-insert/src/main/resources/hibernate.cfg.xml . If you are letting hibernate generate the primary key then don't use GenerationType.Identity in your  model otherwise batch insert will not take place.

12 comments:

Omar Vázquez Mayorga. said...

Good explanation. Do you know if this is possible with JPA.

cheers.

Prabhat Jha said...

Yes, it should be. The best way to find out is to give it a shot. Attribute names might be different though.

Unknown said...

Thanks for the code example. Let me highlight the incredibly relevant ?rewriteBatchedStatements=true on the connection string (jdbc:mysql://localhost:3306/batch?rewriteBatchedStatements=true).

Aleksey Sundukovskiy said...

I know I'm late to the party but I have a very simple question. hibernate.jdbc.batch_size if provided as configuration parameter to hibernate should control the number of objects cached in first-level cache by flushing the object to the database. Then why do we need an explicit flush ?

while (numRecordsProcessed < hibernateBatchSize && it.hasNext())
{
Member wsBean = it.next();
session.save(wsBean);
//log.info("Webservices record " + wsBean + " saved.");
numRecordsProcessed++;
}
session.flush();

Prabhat Jha said...

I believe by explicit flush, you are forcing objects to be cleaned from memory just in case. You could try without flush and see what happens...you got the source to play with. :-)

Anonymous said...

If the objects being persisted have child objects, do these optimizations have any affect?

For example, if I have a House object that contains a list of Room objects and my hibernate mappings are configured such that the Rooms get persisted when I persist the House, does the batching configuration have any impact or does it only have benefits when persisted simple entities without relationships to other tables?

Roger Pack said...

How do we know if it's working or not then? Does yours say anything like

DEBUG: org.hibernate.jdbc.AbstractBatcher - Executing batch size: 2

?

Prabhat Jha said...

@Roger I have mentioned in the blog how to verify if it's really working. For mysql, you can see the update/insert statmenets in mysql log.

roger said...

yeah I'm on oracle...

Anonymous said...

I have the problem that the statements are not executed in batch (verified by mysql-log). The log says the following:

[...]
166 [main] DEBUG org.hibernate.cfg.Configuration - hibernate.jdbc.batch_size=20
166 [main] DEBUG org.hibernate.cfg.Configuration - hibernate.order_inserts=true
166 [main] DEBUG org.hibernate.cfg.Configuration - hibernate.order_updates=true
[...]
166 [main] DEBUG org.hibernate.cfg.Configuration - hibernate.dialect=org.hibernate.dialect.MySQL5InnoDBDialect
[...]
[ 20 single inserts instead of one (verified by mysql log) ]
1014 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - Executing batch size: 20
1021 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - no batched statements to execute
[...]

any idea why?

Prabhat Jha said...

Please take a good look at your configuration, DB connection string as well as how batch size is actually used in code at http://anonsvn.jboss.org/repos/qa/prabhat/batch-insert/src/main/java/org/eejot/batchinsert/DBService.java

It's possible that things are less complicated or configuration is different with newer hibernate version.

Roger Pack said...

@anonymous if you have hibernate generate id's for inserts then it does them serially maybe?