An Introduction to OBO Ontologies

OBO-Edit operates on ontologies that conform to the OBO ontology format. The complete specification of the OBO ontology format is available at http://www.godatabase.org/dev/doc/obo_format_spec.html.

OBO ontologies are similar to the ontologies specified by a description logic language like OWL or DAML+OIL, albeit simpler. However, OBO ontologies are designed for the needs of the biological community, and thus have some unique (and sometimes unexpected) qualities. The OBO format provides the ability to track a large amount of meta-data, and includes mechanisms for some basic history auditing. The OBO format does not include all the features of OWL or DAML+OIL, and sometimes uses different semantics for features that - on the surface - seem to be identical to features of a description logic.

This section of the user's guide will give an overview of what sort of ontology the OBO format can represent.

Classes

Classes (also called "terms" in this guide) model types of objects in the real world. Note that classes model types, not instances. For example, the Gene Ontology class "mitochondria" models the class of all mitochondria, not any particular mitochondria in some particular cell.

Classes may have relationships to each other. These relationships may be of one of the pre-defined relationship types in OBO (such as is_a, the relationship type that indicates that a class or relation is a sub-type of another class or relation) or may be user-defined.

If a class A has an is_a relationship to another class B, A is a "subclass" of B. This means that all instances of type A are also instances of type B, and any object of type A implicitly inherits all the characteristics of an object of type B.

For example, a sneaker is_a shoe. That means that any particular sneaker in the real world (ie any instance of the class sneaker) is also a shoe, by definition. All sneakers have the characteristics of a general shoe, as well as some special characterists that only sneakers have. These special characteristics are what differentiate sneakers from shoes to make the category "sneaker" meaningful.

OBO classes are analogous to OWL or DAML+OIL classes.

Relations

Relations (also called "relationship types" in this guide, or "Propertiess" in other description logic languages) model types of relationships between entities.

Any given relation can be applied in two ways, as "class level" relations, or as "instance level" relations. Class level relations relate two classes, and instance level relations relate two instances. For every class level application of a relation, there is at least one possible instance-level application of that relation.

Class-level relations can be thought of as describing required instance-level relationships that must exist when a class is instantiated. For example, consider the class-level relationship finger part_of hand. This means that for any particular finger in the real world (ie John's left index finger) there must be a particular hand in the real world (ie John's left hand) that the finger is part of. That relationship between particulars (johns_left_index_finger part_of johns_left_hand) is an instance-level relationship.

Attributes of Relations

Directionality

Relations apply in a single direction (unless they are marked "symmetrical", see Symmetry below). Consider the relationship leaf part_of plant. This means that all leaves are part_of some plant. It DOES NOT SAY ANYTHING about whether all plants have leaves, because the relationship only makes claims in one direction.

To say that all plants have leaves, we would need another relationship plant has_part leaf. (See "Built-in Relations" below for information about how the relations has_part and part_of relate to each other).

Symmetry

A relation is symmetrical if it applies in both directions. If P is a symmetrical relation, and A has relationship P to B, then B also has relationship P to A. A symmetrical relation could be thought of as having an inverse_of relationship to itself.

"Equals" and "is next to" are everyday examples of symmetrical relations. If A equals B, B equals A. If A is next to B, B is next to A.

Transitivity

A relation is transitive if relationships of this type remain true across chains of links. If P is a transitive relation, and A has relationship P to B, and B has relationship P to C, then by definition A has relationship P to C.

"Is part of" and "is bigger than" are everyday examples of transitive relations. If A is part of B and B is part of C, A is part of C. If A is bigger than B and B is bigger than C, A is bigger than C.

Cyclicity

If a relation is cyclic, it is legal to create a cycle of links of that relationship type. Note that a cycle of a given relation P may contain other relationship types than P; the cycle may include is_a links or sub-relations of P.

"develops_from" is an everyday example of a relation that may be cyclical. An instance of A may develop from an instance of B, and later an instance of B may develop from the instance of A. Cyclic relationships often are ones that involve some sense of change over time.

Domain & Range

The domain and range of a relation imply certain is_a relationships for terms that have a relationship of a given type, or are the target of a relationship of a given type. If a relation P has domain D, any term with a relationship of type P to any other term is by definition a subclass of D. If a relation P has range R, any term that is the target of a relationship of type P is by definition a subclass of R. This definition of domain and range is identical to that used by OWL.

To illustrate with a concrete example: let's say we define a relation has_pet. The has_pet relationship has the domain "person", because anything that has a pet must be a person, and the range "animal", because anything that is the target of a has_pet relationship must be an animal. If we see the relationship "K-9 cop" has_pet "Doberman", we know that "K-9 cop" must have an is_a relationship to "person", and "Doberman" must have an is_a relationship to "animal".

Note that this is a very different understanding of domain and range than is usually seen in programming languages or in frame-based reasoning systems. In those systems, domain and range are used as a means to verify the correctness of relationships; domain and range are used to make sure a relation is only used in contexts that make sense. In OBO (and OWL, and other DL-languages), domain and range are used to infer additional information about classes and instances.

This does not mean that our notion of domain and range can't be used for verification. If terms are properly marked with disjoint_from relationships, an "improperly used" relation will usually imply that a term has an is_a relationship to two disjoint classes, and a reasoner will complain.

Relationships between Relations

Relations can have relationships to other relations. Normally, relations should only have is_a or inverse_of relationships to other relations. However, the OBO format does not prevent users from relating relations to each other using user-defined relations.

Like classes, relations can be sub-typed. If a relation A has an is_a relationship to a another relation B, A is a sub-relation of B. Any class-level or instance-level relationship of type A is also a
relationship of type B.

Built-in Relations

The OBO format allows users to define any number of new relations. The OBO format does provide a small number of pre-defined relations that are present in all ontologies and cannot be modified: is_a, disjoint_from, inverse_of, and union_of.

The built-in relations are defined as follows:

Instances

Instances represent concrete entities that instantiate (or realize) an abstract class. I (John Day-Richter, author of OBO-Edit) am an instance of "Computer Programmer" (and a number of other classes).

OBO has the ability to represent instances, but OBO-Edit does not at present do anything with that information. Look for some interesting applications of this in the near future.

Identifiers

Every Class, Relation and Instance in OBO-Edit has a unique identifier. In OBO format, an identifier is an alphanumeric string of the form <idspace_name>:<identifier>, where <idspace_name> is a short header string (usually identifying the ontology or organization that originated the id) and <identifier> is a string of arbitrary length.

In OBO 1.2, the "idspace" header tag provides a means to map the short idspace names to a URI. This mechanism allows OBO ids to be converted easily to languages that require more descriptive identifiers, and provies a way to avoid namespace clashes. See the OBO 1.2 specification section 2.1 for more information on header tags that help manage ids.

Obsoletes

OBO ontologies are designed for users who intend to iteratively develop their ontologies. Therefore, the OBO format provides a means for tracking the ids of terms that have been "deleted".

When a term or relation or instance is "deleted", the term does not vanish. Instead, it is marked "obsolete". Obsolete terms are not editable (by most tools).

Often, a term is obsoleted because it is replaced with a more accurate term. OBO 1.2 provides a mechanism for automatically assigning replacements for obsolete terms. See Assigning Replacement Terms for more information.

Meta-Data

Terms, classes and instances support a number of different types of meta-data:

Dbxrefs

Dbxref is short for "database cross-reference". Dbxrefs are used to indicate an analogous item in another database or ontology, OR to give a literature reference for a term definition. A dbxref is formatted like an OBO identifier (<database_name>:<identifier>) to specify a particular object in a database or ontology.

Definition

Objects can be given a lengthy text definition. Object definitions in OBO must also have at least one dbxref that describes the source of the definition.

Synonyms

Often a term will have several names that are commonly used. Synonyms specify alternate names for a term.

Comments

Comments are text annotations about an object in an ontology. Comments can contain any free text information required.

Categories

An ontology may contain any number of user-defined "categories". For example, the Gene Ontology defines a number of categories like "terms of interest to plant biologists" and "terms of interest to insect biologists". A term may belong to any number of user-defined categories.

Namespaces

OBO files are designed to be easily merged and separated. Most tools that use OBO files can load many OBO files at once. If several ontologies have been loaded together and saved into a single file, it would be impossible to know which terms came from which file unless the origin of each term is indicated somehow. Namespaces are used to solve this problem by indicating a "logical ontology" to which every term, relation, instance OR relationship belongs, i.e. each entity is tagged with a Namespace that indicates which ontology it is part of.

Namespaces are user-definable. Every ontology object belongs to a single namespace. When terms from many ontologies have been loaded together, namespaces are used to break the merged ontology back into separate files.