Expressivity versus Computability

Since the 1980s, it has been recognized that there is a natural trade-off between the expressivity of a representation used for an ontology and its computability. In an illuminating paper by Haimowitz et al. [82], an evaluation was made of the adequacy of a formal description language—NIKL—to represent medical concepts used in a physiological diagnostic program. The authors of this paper described the requirements of a knowledge representation language to include the following desiderata:

• The system should have precise semantics, either based on the semantics of first order predicate calculus, or having at least comparable precision.

• The knowledge representation should automatically provide logical inferences. This is what sets a knowledge representation apart from a conventional database. It can answer questions that go beyond what was explicitly told to the system.

• It should include some form of taxonomical inheritance of characteristics by more specialized concepts from more general ones. Variations on the inheritance mechanism include whether it is a pure tree (only one parent or generalization for each concept) or more general graph (multiple potential generalizations of each concept). For example, an ontology could represent the fact that a protein can be both part of a signal transduction pathway and a transcriptional factor. Some inheritance mechanisms allow for exceptions and others do not (e.g., although most RNA molecules code for proteins, some do not.).

• The inference mechanism should be sound and complete; i.e., it should derive no false conclusions from true knowledge and guarantee that all true conclusions within the class automatically promised will in fact be made.

• Whatever automatic inference is provided by the representation system itself should be efficient.

• Because of the requirements of soundness, completeness, and efficiency, the expressive power (the kinds of things that can be said with it and inferred in the knowledge representation) should be limited to avoid the representation of undecidable or "NP-hard" concepts.

These desiderata apply to all the existing bio-ontologies and, therefore, the limitations that Haimowitz et al. found in their evaluation are likely to hold in the construction of bio-ontologies. Among the troublesome aspects of applying NIKL to the medical domain, Haimowitz et al. found the representation to be inadequate to capture spatial relationships so that, e.g., the transitivity of the part-whole or contained relationships was not automatically preserved. Furthermore, causation has a variety of flavors (e.g., a gene may be permissive for a particular process, or a gene product may be necessary but not sufficient for that process, or a gene product might be necessary and sufficient) and these could not easily or accurately be represented in NIKL. Also, the transitivity relationships that would then be expected of a causal representation were not suitably automatically inferred. Similarly, in the temporal representation of the different phases of a process, the dependencies of overlapping phases of a process were not adequately captured. These limitations would certainly create problems for representing large numbers of cellular processes germane to functional genomics.

Subsequently, several special-purpose representations were developed to create ontologies that met the list of desiderata listed above. These specialized knowledge representations pertained to temporal reasoning or spatial reasoning or causality. A few of them had limited capabilities across all of these aspects. None of the current bio-ontologies are represented in languages as rich as these efforts dating back to the 1980s. The implication is that, for our foreseeable future, the expressivity that can be reasonably expected from bio-ontological representation languages is going to be limited and we will not be able to express all the kinds of relationships that a biologist wishes to assert. Nonetheless, for the reasons noted at the beginning of this section, the utility of having even a relatively inexpressive set of attributes for concepts associated with genes and gene interactions would be immense. The alternative is manual review of the unstructured literature, and that is unlikely to be a scalable solution without further efforts in natural language-based information retrieval techniques. As we have stated before, notwithstanding their limitations, these bio-ontologies are likely to grow in size and in use. Indeed there are several commercial efforts now that have implemented very low ontological commitment ontologies such as that of Proteome in which very simple ontologies have been populated by hundreds of knowledge workers to supply the biotechnology industry with annotations for the genome. Similar efforts are under way by the other gene discovery groups and companies in this area such as Celera, Curagen, and Rosetta.

A comparison of the various ontologies mentioned in this section is shown in table 5.1.

Table 5.1: Overview of representative ontology technologies. "Consumer report" of instances of some of the dominant ontologies used to represent genmic knowledge. Some of the instances are general purpose ontology building tools (e.g., Ontolingua) and others are the sole instance of a particular representational technology (e.g KEGG).

Table 5.1: Overview of representative ontology technologies. "Consumer report" of instances of some of the dominant ontologies used to represent genmic knowledge. Some of the instances are general purpose ontology building tools (e.g., Ontolingua) and others are the sole instance of a particular representational technology (e.g KEGG).

Ontology

Defines a nomenclature for genomics/bioinformatics

Specialization

Expressivity

Interoperability / Integration

Human expertise required

Ontolingua

Multipurpose tool; no nomenclature included or recommended. User community can define the nomenclature of choice.

Support of multiple inheritance allows representation of sme of the most complex dependencies in genomics

Broad and richly typed representation with several add-on packages for specialized domains (e.g. temporal and spatial reasoning). Can assert disjountness, negation, etc. This expressivity unsurprisingly is coupled with minimal reasoning support.

Several tools from within the Artificial Intelligence community are interoparable. Less so with standard WWW tools.

Considerable knowledge acquistion required for proper use.

GO

Specifies a controlled vocabulary for GO descriptions.

Limited specialization without formal Inheritance mechanisms.

Limited expressivity

(no causal transitivity, temporal ordering, assertion of disjoint subconcepts.)

Tightly integrated into the suite of databases and tools of the NCBI.

Minimal ontological commitment and restricted expressivity enable easy adoption and use.

KEGG

Specifies standard

Very limited

Limited

Intergrated with

Minimal

nomenclature for each

form of

expressivity

other pathway

ontological

entity (e.g., EC for

specialization

(no assertion

representations (e.g., WIT[a]).

commitment

enzymes).

without inheritance.

of disjoint subconcepts). However, the extensive use of binary relations allows primitive interference of causal transitivity and representation of pathways.

and restricted expressivity enable easy adoption and use.

OML/CKML

Multipurpose tool; no

Supports

Missing some

Uses XML as syntax

Significant

nomenclature included

multiple

of Ontolingua's

and therefore easily

experience

or recommended. User

inheritance

constructs

interoperates at

required but

community can define

and automated

(e.g.,

least at the syntactic

XML syntax

the nomenclature of

subsumption,

collections) but

level with several

brings it

choice.

i.e., inference

relatively high

e-commerce/WWW

quicker to

from the

expressivity.

tools. As yet,

wider

logical

though, few

community

constraints

instances of

of

defined in the

integration with other

developers.

term of

tools.

However,

concept C'

still a work in

logically imply

progress.

those of the

more general

concept C.

[a]http://www.wit.mcs.anl.gov/WIT/

Was this article helpful?

0 0

Post a comment