The U.S. National Center for Biotechnology Information,17 or NCBI, was created by the U.S. Congress in 1988 as a division of the National Library of Medicine of the National Institutes of Health. The prescience of the architects of the NCBI and its resources is illustrated by the fact that NCBI resources, including GenBank,18 PubMed,19 and BLAST,18-20-22 are perhaps the most widely known informatics tools for biologists and are used effectively by a wide variety of scientists and educators. Information relating the biologic objects listed above can be accessed through an intuitive search engine, Entrez, that allows users to identify subsets of entries matching search criteria. NCBI developed a programmer's toolkit and tools for remote data access before the emergence of the World Wide Web. NCBI declared its data model and described data using a formal exchange syntax (ASN.1) before the development of XML.

We do not have the space to cover NCBI tools comprehensively, so instead focus on a few key tools and an example that integrates use of several tools. Comprehensive and regularly updated documentation about NCBI resources is available online in the NCBI Handbook.23

Keyword-based searches. Entrez provides a consistent interface for querying a wide variety of NCBI databases, including publications, sequences, structures, genomes, genes, inherited disorders, and protein domains. The search box supports an unusually intelligent parser based on a controlled vocabulary (Medical Subject Heading, or MeSH24). For example, a naive query string such as "gene expression tumor treatment fibroblast 2004" is interpreted as ((((("gene expres-sion"[MeSH Terms] OR gene expression[Text Word]) AND ((tumor[Text Word] OR tumour[Text Word]) OR "neo plasms"[MeSH Terms])) AND (("therapy"[MeSH Subheading] OR "therapeutics"[MeSH Terms]) OR treatment[Text Word])) AND ("fibroblasts"[MeSH Terms] OR fibroblast[Text Word])) AND 2004.25). The resulting query is sufficiently precise to retrieve a small number of publications, including, for example, a paper26 on the use of microarrays for predicting the clinical course of several common carcinomas. Selecting the "Details" button shows explicitly how the query is interpreted. A "Preview/Index" button provides support for construction of precise Boolean queries based on the NCBI data model and controlled vocabularies. Additional "Links" provide additional features, including the addition of hypertext links from words or phrases in PubMed abstracts to related textbook entries.

Sequence-based searches. The BLAST family of tools20-22,27 allows certain types of objects, nucleotide sequences (including genomes), protein sequences, and position-specific models of protein or protein domains, to be searched based on sequence features in addition to annotation. The BLAST tools have proven useful because of their ease of use, speed, and reliable statistics. These tools return not only matches to the query, but expectation values that provide an estimate of the number of matches of similar or greater similarity that would have been obtained under a null hypothesis (no related sequences are in the database). Thus, users need not rely on expert knowledge to judge the relevance of a match and can rely on common knowledge from probability and statistics.

Example. What is Li Fraumeni syndrome?

• Select the OMIM link from the menu bar OMIM refers to the Online Mendelian Inheritance in Man,28 a manually curated catalog of genes and genetic disorders. Enter "Li Fraumeni in the search box and execute the search.

• Follow the link to "Li Fraumeni syndrome LFS". This page includes a wide variety of text description of Li Frau-meni sydrome; for example, it indicates that LFS is a familial cancer syndrome of diverse tumors caused by certain mutations in the TP53 gene.

• Follow the link to the "TP53 gene," which indicates, among other useful information, that the p53 protein binds to and activates the expression of genes that inhibit growth or invasion.

• Following the "Allelic variants" link indicates that an Arg 248 to Trp mutation causes Li Fraumeni syndrome.

• Follow the link to "Cho et al"29 to find an abstract of the paper describing the solution of the p53 structure. Selecting "Links/Books" adds hyperlinks from terms in the abstract to textbook entries describing the linked terms.

• Select "Links/Structure" and follow the link to 1TSR. If you have a structure viewer installed then selecting "View 3D structure" allows you to see that interactions of Arg 248 with the DNA binding site would be disrupted by an Arg248 to Trp mutation.

In this example, NCBI resources took us from the name of a syndrome to an atomic level hypothesis about the cause of the syndrome with a very small number of operations. The infrastructure developed by NCBI provides outstanding capacity for browsing sets of publications, sequences, structures, genes, and syndromes by traversing relationsihps between them.

SEO Tactics

SEO Tactics

Discover how you can explode your traffic and boost your sales with advanced SEO techniques that can put the search engines to work for you quickly.

Get My Free Ebook

Post a comment