MeSH Tree Structures 2001

O Bacterial Infections and Mycoses [C01] +

O Urologie and Male Genital Diseases [C12] + O Female Genital Diseases and Pregnancy Complications [C13] +

O Hemic and Lymphatic Diseases [C15] + O Neonatal Diseases and Abnormalities [C16] + O Skin and Connective Tissue Diseases [CÍ7J + O Nutritional and Metabolic Diseases [C18] +

O Disorders of Environmental Origin [C21] + O Pathological Conditions, Signs and Symptoms [C23] + 5. 0 Analytical, Diagnostic and Therapeutic Techniques and Equipment [E]

9. [+] Anthropology, Education, Sociology and Social Phenomena [I] 10. (+] Technology and Food and Beverages [J]

Figure 5.5: The MeSH terminology hierarchy. A portion of the MeSH terminology hierarchy starting at the top level and showing the first level of detail at the disease subhierarchy. The entire hierarchy may be browsed at

The MeSH nomenclature could be taken advantage of in functional genomics. For example, the work of Dan Masys et al. in the HAPI [127] project processes microarray data and clusters from functional genomics analyses. The genes are looked up and the corresponding MeSH terms are found in databases maintained by the NCBI (see the nomenclature section 5.5 below). The MEDLINE structure associated with each publication is then used to categorize the genes found in the clusters, as shown in figure 5.6.[5] That is, in addition to finding the publications corresponding to each gene, the MeSH hierarchy permits the identification of broader functional and pathological categories. In the instance of the clusters of genes characterizing acute myelogenous leukemia and acute lymphocytic leukemia [78], this demonstrates that the genes predictive of these leukemias are also known to be involved in polycystic kidney disease, inherited immunodeficiencies, and multiple sclerosis. Many of these relationships were only obtainable through the MeSH hierarchy and not directly by searching using only the gene names.

Diseases Associated with ALL-predictive genes

Leukemia, Lymphocytic, Acute (1) {<.001} Precancerous Conditions (1) {<.001}

Autoimmune Diseases of the Nervous System (1) {<.001}

Demyelinating Autoimmune Diseases, CNS (1) {<.001}

Demyelinating Autoimmune Diseases, CNS (1) {<.001} Female Genital Diseases and Pregnancy Complications

Hemic and Lymphatic Diseases (1) {>.6}

Neonatal Diseases and Abnormalities (3) {>.3}

Severe Combined immunodeficiency (1) {<.001}

Autoimmune Diseases of the Nervous System (1){<.001} Demyelinating Autoimmune Diseases, CNS (!){<.001}

Immunologic Deficiency Syndromes (3) {<.001} Common Variable Immunodeficiency (1) {<.001} Severe Combined Immunodeficiency (1) {<.001} Pathological Conditions, Signs and Symptoms (1) {>.3}

Figure 5.6: Transforming a cluster of genes into an annotated structure automatically using MEDLINE. Summary of concept hierarchy matches with the MeSH hierarchy terms for genes described by Golub et al. [78] that fell into the acute lymphoblastic leukemia (ALL) cluster. Alternatively, the textual content of the biomedical literature itself has an implicit (if muddy) ontological structure which can be exploited. That this can be done without solving the natural language challenge is illustrated by the work of Altman and Raychaudhuri [6]. These authors represented each publication as a word vector of the titles and abstracts, with each vector encoding the frequency of those words for those articles. Using these vectors they were able to apply the same clustering techniques'61 used to find the proximity of gene expression patterns to find how the publications clustered. It may be that in the future, combinations of simple ontologies and a statistical analysis of the content of publications will allow efficient mining of the immense biomedical literature for genomicists seeking to obtain the meaning of the gene or protein patterns they have observed.


[51A similar functionality is also provided by the PubGene program at [61With the word frequency used in the same way as a gene expression value.

Was this article helpful?

0 0

Post a comment