Bigdata
For querying across the entire Knowledgebase dataset, Phenoscape is using the Bigdata RDF triplestore. We selected Bigdata for several reasons:
- Top SPARQL query performance among open-source triplestores
- Support for SPARQL 1.1 query language. This is required for aggregates such as "COUNT". "Property paths" also provide basic transitivity reasoning at query time.
- Embedded full-text index available within SPARQL queries.
- Concise bounded description mode for SPARQL DESCRIBE queries. When blank nodes are included in a DESCRIBE result, this recursively describes them until the graph terminates in named nodes in all directions. This is useful for grabbing the necessary and sufficient RDF graph needed to reconstruct OWL class expressions.
Querying with OWL semantics
While OWL can be stored as RDF, thinking at the OWL axiom level can be quite different from thinking at the RDF triple level. While OWL matches up nicely with RDF and SPARQL when dealing with object property assertions for individuals, querying using complex class descriptions is much more straightforward using DL queries to an OWL reasoner rather than via SPARQL, which is designed for matching triple patterns. Embedding SPARQL patterns matching pieces of the OWL-to-RDF serialization is error prone and will likely provide incomplete results. It is best to consider that triples using predicates from the OWL namespace are a private implementation detail (e.g. owl:onProperty, owl:someValuesFrom, etc.).