Exploiting the explicit and implicit ontologies of the biomedical literature

Instead of devising new ontologies specifically for use in functional genomics, others have taken advantage of the existing very low ontological commitment representations used to maintain the biomedical literature. The terminology structures of MEDLINE, the Medical Subject Headings (MeSH), are an outgrowth of the original paper-based Index Medicus. As illustrated in figure 5.5 below, MEDLINE's key words are organized into major headings within a few fixed hierarchies or trees (e.g., physical findings, diseases, chemicals). This limited taxonomical structure makes it evident that MeSH has limited expressive capabilities. It has the immense benefit of having been carefully curated by extremely well trained librarians over decades, and therefore it represents an invaluable depository of annotation of biomedical knowledge. It includes approximately 20,000 major headings (and many more synonyms corresponding to the concepts represented by these headings) and 105,000 chemical names. Each article encoded in MEDLINE (and therefore accessible via PUBMED[4]) has on average 10 MeSH annotations.

