[castor-dev] GSoC 2010 introduction

Dennis Butterstein

2010-04-28 14:22:53 UTC

Hi folks,
I'll take the chance to introduce myself to the community.

My name is Dennis Butterstein and I'll contribute to castor in the
context of google summer of code over the next few months. I hope to be
able to combine my work on castor with my master's thesis that I'll
start to write soon.

Roughly speaking, my subject will be to refactor loading of entities
from database to be better
maintainable, extendable and more clear.

For those amongst you interested in details I've added a more detailed
description taken from my application for the google summer of code (the
rest of you can ignore this part =) ):

I started to implement some refactoring to get to know current
codebase and classes I will need to know to be able to start working
in GSOC with full force right on time. As stated in jira issue
castor-2888 there are some refactorings to do to be able to seize
loading strategies themselves. So the first step will be to adapt
KeyGenerator implementations to use the new CastorConnection which
wraps the used PersistenceFactory as well as the used connection
(java.sql.Connection). By doing so we achieve the possibility to use
CastorStatement for SQLStatementInsert as well. So
SQLStatementDelete, SQLStatementUpdate and SQLStatementInsert will
be constructed in a very similar way and due to that a cleaner
codebase will arise.

By now the SQLStatement classes do not only construct the sql
strings but also the parameter map. To uncouple the order of columns
in select statements from the order of columns in resultset they
would have to construct a map of return values as well.

To be able to seperate those steps namely construction of sql
string, parameter map and the map of results using the visitor
pattern could be assistant. Another point is that using the visitor
pattern will provide more flexibility in constructing query strings
(e.g. specific visitor could be used for different databases).

So I think based on these changes it will be possible to start using
the visitor pattern to build the sql query strings and the parameter
map at first. This will serve as reference implementation to
recognize and resolve possible problems.

After that the subsequent task will be to integrate the visitor
pattern in the current codebase, start using it and test (not least
if the entire functionality was preserved).

Now it will be time to add new functionality. The current select
class hierarchy has to be extended to support joins and orders. By
mapping columns of the select-block to names that will be used to
access values of the resultset the sequence of columns and access to
values could be decoupled. Subsequently this functionality has to be
integrated in the visitor pattern.

At this point SQLStatementLoad can be refactored to use the select
class hierarchy in order to build the query string, execute the
statement and extract columns from resultset. Formerly
SQLStatemenLoad did these tasks on its own.

After that we can use the select class hierarchy for
SQLStatementQuery as well. First ParseTreeWalker, OQLQueryImpl and
QueryResults (and some other classes) will have to be adapted to
support new class hierarchy. In a first step this will be done for
oql queries only. Whether to refactor ParseTreeWalker or to use the
parser created during GSOC 2008 has to be evaluated on time.

Sql pass through queries will follow but for them we will first
have to evaluate possibilities how to get results and bind
parameters in this case.

Having done these things should make it much easier to adapt loading
strategies. Based on benchmarks (like the ones in
cpaptf/src/site/resources/results/) received from a reference
machine I'll try to enhance loading strategies step by step. I
thought about making some comparisons to other similar projects
(e.g. hibernate) if suitable benchmarks exist. We'll have to see
whether it will be possible to implement an automated decision
strategy to choose the most efficient loading strategy. Another
option: we could make the loading strategy configurable as the
developers should know enough about their project to be able to
estimate dimensions of relations.
My work will not contain the implementation of any loading strategy
or similar. It will only evaluate possibilities and show benchmark
results to be considered to point out the direction for future work.

Right now I started refactoring SQLStatementUpdate to use CastorConnection.

Well, all that remains to be said is that I'm happy to get that chance
and I'm looking forward* *to work with you.

Here's to successful cooperation!