[castor-dev] Performance aspect of refactoring of SQL engine

Discussion:

Ralf Joachim

2009-05-14 22:47:02 UTC

Hi all,

as you may already know we have a GSoC student working at refactoring
SQL engine of Castor. While planing next improvement steps I had also
been thinking about possibilities to improve performance of execution of
SQL statements. I got one idea on how to improve performance but this
would require quite some refactoring work. As I do not know how big the
improvement could be, I like to hear your opinoins if it is worth to
spend time in this refactoring.

Current codebase of Castor works as follows:

When a JDO configuration is loaded or used for the first time it creates
SQL strings with parameters for every entity of the configiguration and
every standard database operation executed for this entities. Standard
operations in this context are those that get executed by a call to
create(), load(), update() or remove() methods on a Database instance.

When one of the operations from above get called Castor every time
prepares a java.sql.PreparedStatement, binds the required parameters and
executes the prepared statement.

Looking at best practis for prepared statements it is said that a
application should prepare a statement only once per connection as you
could reuse the prepared statement to bind different parameters and
execute it multiple times. As I expect that some database drivers and
about all database servers do such optimisations themself, my question
is, how big would be the performance improvement we could expect when
reusing prepared statements.

As we need to prepare statements for every connection in any case, the
same prepared statement can only be reused within a single transaction.
According to this an application that uses an own transaction would not
gain any performance improvement. In the worst case we could get a very
small performance penulty by the code needed to handle creation and
closing of prepared statements in this case.

On the other hand are there quite some applications that execute
hundreds or thoutheds of operations within one transaction. If all those
operations lead to the execution of the same SQL statement with
different parameters we should get a performance improvement according
to the best practis mentioned above.

Do anyone have experience with this? Any opinions?

Regards
Ralf
--
Syscon Ingenieurbüro für Meß- und Datentechnik GmbH
Ralf Joachim
Raiffeisenstraße 11
72127 Kusterdingen
Germany

Tel. +49 7071 3690 52
Mobil: +49 173 9630135
Fax +49 7071 3690 98

Internet: www.syscon.eu
E-Mail: ***@syscon.eu

Sitz der Gesellschaft: D-72127 Kusterdingen
Registereintrag: Amtsgericht Stuttgart, HRB 382295
Geschäftsleitung: Jens Joachim, Ralf Joachim

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Udai Gupta

2009-05-15 12:03:42 UTC

Permalink

Hi,

In my personal experience with JDBC I know that for reusing Prepared
statements really gives a huge performance improvement as the JDBC.

Suppose we have "INSERT INTO <TABLE> VALUES (PARAMETER1,PARAMETER2)"
and we want to execute it for like thousand times in a transaction.

Now with simple Statement every time the Query will be
validated(optimized internally) (1000 times) by the SQL compiler of DB
and then data will be inserted to table.

On the other hand with PreparedStatement the Query will the
validated(optimized) only for the first time and for remaining 999
only parameters will be passed to Query(no validation and optimization
overhead for other 999)

Rest is just math.

There is one more thing with PreparedStatement, excute() and excuteBatch().
excute() send parameters per query over the network (JDBC to DB server)
while excuteBatch() sends the Batch(1000 times at ones) of parameters
over networks.
So using Batch gives high performance improvement if the network
latency is costly(which mostly is the case as webserver and database
server are separated these days).

I have personally ran the tests to check these things.

Regards,
Udai Gupta

Post by Ralf Joachim
Hi all,
as you may already know we have a GSoC student working at refactoring
SQL engine of Castor. While planing next improvement steps I had also
been thinking about possibilities to improve performance of execution of
SQL statements. I got one idea on how to improve performance but this
would require quite some refactoring work. As I do not know how big the
improvement could be, I like to hear your opinoins if it is worth to
spend time in this refactoring.
When a JDO configuration is loaded or used for the first time it creates
SQL strings with parameters for every entity of the configiguration and
every standard database operation executed for this entities. Standard
operations in this context are those that get executed by a call to
create(), load(), update() or remove() methods on a Database instance.
When one of the operations from above get called Castor every time
prepares a java.sql.PreparedStatement, binds the required parameters and
executes the prepared statement.
Looking at best practis for prepared statements it is said that a
application should prepare a statement only once per connection as you
could reuse the prepared statement to bind different parameters and
execute it multiple times. As I expect that some database drivers and
about all database servers do such optimisations themself, my question
is, how big would be the performance improvement we could expect when
reusing prepared statements.
As we need to prepare statements for every connection in any case, the
same prepared statement can only be reused within a single transaction.
According to this an application that uses an own transaction would not
gain any performance improvement. In the worst case we could get a very
small performance penulty by the code needed to handle creation and
closing of prepared statements in this case.
On the other hand are there quite some applications that execute
hundreds or thoutheds of operations within one transaction. If all those
operations lead to the execution of the same SQL statement with
different parameters we should get a performance improvement according
to the best practis mentioned above.
Do anyone have experience with this? Any opinions?
Regards
Ralf
--
Syscon Ingenieurbüro für Meß- und Datentechnik GmbH
Ralf Joachim
Raiffeisenstraße 11
72127 Kusterdingen
Germany
Tel. +49 7071 3690 52
Mobil: +49 173 9630135
Fax +49 7071 3690 98
Internet: www.syscon.eu
Sitz der Gesellschaft: D-72127 Kusterdingen
Registereintrag: Amtsgericht Stuttgart, HRB 382295
Geschäftsleitung: Jens Joachim, Ralf Joachim
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Ralf Joachim

2009-05-16 13:26:28 UTC

Permalink

Thanks for sharing your opinion Udai.

I guess it will be best to create a small test to get an impression on
how big the improvement would be. Will create a task for Ahmad to test this.

Regards
Ralf

Post by Udai Gupta
Hi,
In my personal experience with JDBC I know that for reusing Prepared
statements really gives a huge performance improvement as the JDBC.
Suppose we have "INSERT INTO <TABLE> VALUES (PARAMETER1,PARAMETER2)"
and we want to execute it for like thousand times in a transaction.
Now with simple Statement every time the Query will be
validated(optimized internally) (1000 times) by the SQL compiler of DB
and then data will be inserted to table.
On the other hand with PreparedStatement the Query will the
validated(optimized) only for the first time and for remaining 999
only parameters will be passed to Query(no validation and optimization
overhead for other 999)
Rest is just math.
There is one more thing with PreparedStatement, excute() and excuteBatch().
excute() send parameters per query over the network (JDBC to DB server)
while excuteBatch() sends the Batch(1000 times at ones) of parameters
over networks.
So using Batch gives high performance improvement if the network
latency is costly(which mostly is the case as webserver and database
server are separated these days).
I have personally ran the tests to check these things.
Regards,
Udai Gupta

---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email

--
Syscon Ingenieurbüro für Meß- und Datentechnik GmbH
Ralf Joachim
Raiffeisenstraße 11
72127 Kusterdingen
Germany

Tel. +49 7071 3690 52
Mobil: +49 173 9630135
Fax +49 7071 3690 98

Internet: www.syscon.eu
E-Mail: ***@syscon.eu

Sitz der Gesellschaft: D-72127 Kusterdingen
Registereintrag: Amtsgericht Stuttgart, HRB 382295
Geschäftsleitung: Jens Joachim, Ralf Joachim

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Udai Gupta

2009-05-16 14:27:46 UTC

Permalink

Hi Ralf,

Yes creating test is an option to measure (to get an idea) the
improvement because I also did some similar when I came to know about
the performance improvement using PrepareStatment and its Batch
feature. It was a Live project and I used simple Statements with long
transactions everywhere and I was bit busy with other things that time
(or you can say lazy :p ), so I wanted to make sure if the improvement
is enough rewarding to spend that much time. It was like a miracle for
me to see the performance improvement in the transactions (49 x 900 x
6 quires at a time) that time and more when I found about Batch. So I
decided to investigate more about PrepareStatement and these things I
found.

1. 1000 queries, 1(first) registered(validated, optimized)and
remaining 999 will follow the first one (no validation, no
optimization).

Now, this validation(syntax) and query optimization(internal engine
process) varies from engine to engine. In whole this is more of an
mathematical (probably Relational Calculus and Query Optimization)
analysis to give the answer how much improvement we will get because
people have variables like
var1 - Query (Sub variables - Complexity, Where, SubQuery, Parameters etc)
var2 - No. of queries per transactions
var3 - others

So the cost of var1 will be involved for every query while using
Statement, but var1 will cost only once if we use PrepareStatement. So
here what give us the improvement is var1 mostly and improvement
depends on the cost of the var1.

2. excuteBatch(), It's a network latency issue (depends on the DB and
Application server network) . In my case it was huge as the
Application server and DB server was at different physical location.

3. It is recommended by OWASP (owasp.org) that PreparedStatment is
secure from SQL injections because the query registers first with the
DB and the later parameter get into it by using bind, so one can't
extend it to modify the query.

eg: query = "SELECT * FROM <MYTABLE> WHERE id = "+p;

Now the Parameter p is coming from the request.getParameter which if
not check someone could a vulnerability.
if I send p = 123; DROP <MYTABLE>

But this PreparedStatement this doesn't occur even if you be careless
with parameters(but careful with query).

I might be repeating myself but I think that test is a good idea, and
I would like to finally add that you can see as much as improvement as
you want by increasing the cost of var1 and var2 in Point 1 and for
excuteBatch() by Network Latency.

Cheers,
Udai Gupta

Post by Ralf Joachim
Thanks for sharing your opinion Udai.
I guess it will be best to create a small test to get an impression on
how big the improvement would be. Will create a task for Ahmad to test this.
Regards
Ralf

---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email

--
Syscon Ingenieurbüro für Meß- und Datentechnik GmbH
Ralf Joachim
Raiffeisenstraße 11
72127 Kusterdingen
Germany
Tel. +49 7071 3690 52
Mobil: +49 173 9630135
Fax +49 7071 3690 98
Internet: www.syscon.eu
Sitz der Gesellschaft: D-72127 Kusterdingen
Registereintrag: Amtsgericht Stuttgart, HRB 382295
Geschäftsleitung: Jens Joachim, Ralf Joachim
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Werner Guttmann

2009-05-16 20:08:14 UTC

Permalink

Hi Ralf,

most good JDBC drivers these days support prepared statement pooling.
With that feature enabled, you will get most benefits for free. In other
words, there's non need (in my view) to deal with this at the code level.

Regards
Werner

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Jon Wilmoth

2009-05-16 20:54:55 UTC

Permalink

I agree. This is easily configured for a Tomcat connection pool (sample config below). I think there's bigger performance related fish to fry with JDO. For example http://jira.codehaus.org/browse/CASTOR-1409 cause one to load all relationships on a record when trying to save in order to prevent the relationship from being lost.

<Resource name="@DATABASE-POOL-JNDI-NAME@"
 auth="Container"
 type="javax.sql.DataSource"
 url="@DATABASE_URL@"
 driverClassName="@DATABASE_DRIVER@"
 username="@DATABASE_USER@"
 password="@DATABASE_PASSWORD@"
 testOnBorrow="true"
 validationQuery="@DATABASE_HEALTH_SQL@"
 maxActive="@DATABASE_POOL_MAX_CONNECTIONS@"
 maxIdle="@DATABASE_POOL_MIN_CONNECTIONS@"
 maxWait="10000"
 poolingStatements="true"/>

p.s. My schedule hasn't allowed me to follow the Castor project for several months now, but I'm hoping to soon!

________________________________
From: Werner Guttmann <***@codehaus.org>
To: ***@castor.codehaus.org
Sent: Saturday, May 16, 2009 1:08:14 PM
Subject: Re: [castor-dev] Performance aspect of refactoring of SQL engine

Hi Ralf,

most good JDBC drivers these days support prepared statement pooling.
With that feature enabled, you will get most benefits for free. In other
words, there's non need (in my view) to deal with this at the code level.

Regards
Werner

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Udai Gupta

2009-05-17 01:24:54 UTC

Permalink

Hi All,

I think pooling gives performance improvement due to maintaining the
physical connection by reducing the access time to database
(authentication, creation on socket) which applies for
PrepareStatement and Statement.

While the PrepareStatement gives Performance improvement because of
one time registration of query in a transaction which I think Ralf
asked.

But yes I agree that Pooling gives huge performance improvement is
used wisely because i have seen cases where Pooling can cost resources
by making DB slow if not configured correctly (and these configuration
varies from application to application).

Correct me If I am wrong.

I do not get how PrepareStatement pooling works, I think pooling works
regardless of what kind of Statement you are using.

Regards,
Udai Gupta

---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Ralf Joachim

2009-05-19 09:19:05 UTC

Permalink

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Thanks for your opinion Werner and Jon. 
 
My feeling also says that we may not gain much by reusing of prepared
statements. The problem is that such a feeling can be wrong and that
you can find various best practise for JDBC telling to reuse prepared
statements. I therefore thought it may be a good idea to test things
before taking a decision. 
 
Does one of you know a source that compares the different possibilities
to execute statements with JDBC? 
 
Regards 
Ralf 
 
Jon Wilmoth schrieb:
<blockquote cite="mid:***@web35605.mail.mud.yahoo.com"
type="cite">
<style type="text/css"></style>
<div
style="font-family: times new roman,new york,times,serif; font-size: 12pt;">
<div>I agree.  This is easily configured for a Tomcat connection pool
(sample config below).  I think there's bigger performance related fish
to fry with JDO.  For example <a moz-do-not-send="true"
href="http://jira.codehaus.org/browse/CASTOR-1409">http://jira.codehaus.org/browse/CASTOR-1409</a> cause
one to load all relationships on a record when trying to save in order
to prevent the relationship from being lost.</div>
<div
style="font-size: 12pt; font-family: times new roman,new york,times,serif;"> </div>
<div
style="font-size: 12pt; font-family: times new roman,new york,times,serif;"><Resource name="@DATABASE-POOL-JNDI-NAME@" 
              auth="Container" 
              type="javax.sql.DataSource" 
              url="@DATABASE_URL@" 
              driverClassName="@DATABASE_DRIVER@" 
              username="@DATABASE_USER@" 
              password="@DATABASE_PASSWORD@" 
              testOnBorrow="true" 
              validationQuery="@DATABASE_HEALTH_SQL@" 
              maxActive="@DATABASE_POOL_MAX_CONNECTIONS@" 
              maxIdle="@DATABASE_POOL_MIN_CONNECTIONS@" 
              maxWait="10000" 
              poolingStatements="true"/> </div>
<div
style="font-size: 12pt; font-family: times new roman,new york,times,serif;"> 
p.s. My schedule hasn't allowed me to follow the Castor project for
several months now, but I'm hoping to soon!</div>
<div
style="font-size: 12pt; font-family: times new roman,new york,times,serif;"> </div>
<div style="font-size: 13px; font-family: arial,helvetica,sans-serif;">
<hr size="1">From:
Werner Guttmann <a class="moz-txt-link-rfc2396E" href="mailto:***@codehaus.org"><***@codehaus.org></a> 
To:
<a class="moz-txt-link-abbreviated" href="mailto:***@castor.codehaus.org">***@castor.codehaus.org</a> 
Sent: Saturday, May
16, 2009 1:08:14 PM 
Subject: Re:
[castor-dev] Performance aspect of refactoring of SQL engine 
 
Hi Ralf, 
 
most good JDBC drivers these days support prepared statement pooling. 
With that feature enabled, you will get most benefits for free. In other 
words, there's non need (in my view) to deal with this at the code
level. 
 
Regards 
Werner 
 
Ralf Joachim wrote: 
> Hi all, 
> 
> as you may already know we have a GSoC student working at
refactoring 
> SQL engine of Castor. While planing next improvement steps I had
also 
> been thinking about possibilities to improve performance of
execution of 
> SQL statements. I got one idea on how to improve performance but
this 
> would require quite some refactoring work. As I do not know how
big the 
> improvement could be, I like to hear your opinoins if it is worth
to 
> spend time in this refactoring. 
> 
> 
> Current codebase of Castor works as follows: 
> 
> When a JDO configuration is loaded or used for the first time it
creates 
> SQL strings with parameters for every entity of the
configiguration and 
> every standard database operation executed for this entities.
Standard 
> operations in this context are those that get executed by a call to 
> create(), load(), update() or remove() methods on a Database
instance. 
> 
> When one of the operations from above get called Castor every time 
> prepares a java.sql.PreparedStatement, binds the required
parameters and 
> executes the prepared statement. 
> 
> 
> Looking at best practis for prepared statements it is said that a 
> application should prepare a statement only once per connection as
you 
> could reuse the prepared statement to bind different parameters and 
> execute it multiple times. As I expect that some database drivers
and 
> about all database servers do such optimisations themself, my
question 
> is, how big would be the performance improvement we could expect
when 
> reusing prepared statements. 
> 
> As we need to prepare statements for every connection in any case,
the 
> same prepared statement can only be reused within a single
transaction. 
> According to this an application that uses an own transaction
would not 
> gain any performance improvement. In the worst case we could get a
very 
> small performance penulty by the code needed to handle creation and 
> closing of prepared statements in this case. 
> 
> On the other hand are there quite some applications that execute 
> hundreds or thoutheds of operations within one transaction. If all
those 
> operations lead to the execution of the same SQL statement with 
> different parameters we should get a performance improvement
according 
> to the best practis mentioned above. 
> 
> 
> Do anyone have experience with this? Any opinions? 
> 
> Regards 
> Ralf 
> 
 
--------------------------------------------------------------------- 
To unsubscribe from this list, please visit: 
 
    <a class="moz-txt-link-freetext" href="http://xircles.codehaus.org/manage_email">http://xircles.codehaus.org/manage_email</a> 
 
 
</div>
</div>
</blockquote>
 
<pre class="moz-signature" cols="72">--

Syscon Ingenieurbüro für Meß- und Datentechnik GmbH
Ralf Joachim
Raiffeisenstraße 11
72127 Kusterdingen
Germany

Tel. +49 7071 3690 52
Mobil: +49 173 9630135
Fax +49 7071 3690 98

Internet: <a class="moz-txt-link-abbreviated" href="http://www.syscon.eu">www.syscon.eu</a>
E-Mail: <a class="moz-txt-link-abbreviated" href="mailto:***@syscon.eu">***@syscon.eu</a>

Sitz der Gesellschaft: D-72127 Kusterdingen
Registereintrag: Amtsgericht Stuttgart, HRB 382295
Geschäftsleitung: Jens Joachim, Ralf Joachim
</pre>
</body>
</html>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Werner Guttmann

2009-05-19 09:34:42 UTC

Permalink

Hi Ralf,

Post by Ralf Joachim
Thanks for your opinion Werner and Jon.
My feeling also says that we may not gain much by reusing of prepared
statements. The problem is that such a feeling can be wrong and that you
can find various best practise for JDBC telling to reuse prepared
statements. I therefore thought it may be a good idea to test things
before taking a decision.

Sure. I am just trying to share what I know.

Post by Ralf Joachim
Does one of you know a source that compares the different possibilities
to execute statements with JDBC?

Not really, to be honest.

Post by Ralf Joachim
Regards
Ralf

---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email

--
Syscon Ingenieurbüro für Meß- und Datentechnik GmbH
Ralf Joachim
Raiffeisenstraße 11
72127 Kusterdingen
Germany
Tel. +49 7071 3690 52
Mobil: +49 173 9630135
Fax +49 7071 3690 98
Internet: www.syscon.eu
Sitz der Gesellschaft: D-72127 Kusterdingen
Registereintrag: Amtsgericht Stuttgart, HRB 382295
Geschäftsleitung: Jens Joachim, Ralf Joachim
--------------------------------------------------------------------- To
http://xircles.codehaus.org/manage_email

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email