Query execution speed-up: The investigation

From phenoscape
Revision as of 21:13, 30 January 2009 by Crk18 (talk | contribs) (Software)

A number of potential factors affect the performance of the queries that are executed on the Phenoscape database. These can be classified under a few major categories:

  • Hardware: This has to do mainly with the database server configuration
  • RDBMS
  • Database schema
  • Querying strategy

The last category has been discussed at some length in the Queries page. This page mainly presents the findings in the other categories and proposes possible steps to improve these

Hardware

The Phenoscape database is hosted on a development server running Centos 5.2 x86_64. The Phenoscape application is hosted on a development web server running Mac OS X 10.4, running JBoss (Version 4.0.5). At present (Jan 30, 2009), the size of the database is approximately 687 MB.

Software

The querying module for the Phenoscape API has been implemented primarily in Java (version 1.5). Much of the code has been inherited from the [1] BBOP group. An extensible Shard interface has been developed by the BBOP team to encapsulate a variety of data repositories ranging from conventional relational databases to RDF triple stores. The Shard interface and the classes that implement it such as the AbstractShard and the OBDSQLShard contain method definitions that can retrieve data from the repositories by executing SQL or more XML based query formats. For the RDBMS-based implementation of the Shard interface, stored procedures have also been developed to both populate the database and query it.

RDBMS

The Phenoscape data is stored in a database that is managed through the PostgreSQL RDBMS (Version 8.3.3).

Database Schema