A central intellectual and technological asset to functional genomics has been GenBank (see section 5.5.1) and related genomic and protein databases. Their standardized data models have allowed research laboratories throughout the world to rapidly populate them with the very latest information. In turn, these databases are freely available throughout the world via the Internet and have seeded, accelerated, and inspired thousands of research projects.
In contrast, there are few, if any, consequential shared national clinical databases. Specifically, patient data in one information system can only rarely be transferred to another system in clinical practice. Within a single healthcare system, annotations such as billing diagnoses will suffer from distortions imposed in an attempt to maximize reimbursement for care. Despite decades of research and development of clinical record systems, they remain problematic. If properly constructed and maintained, electronic medical records could provide an invaluable set of phenotypic annotations with which to bring clinical relevance to genomic data.
The marked contrast between genomic and clinical care databases is deceptive. The Human Genome Project has benefited from the elegant simplicity of the genetic code. Consequently, there are relatively few items that GenBank requires to be submitted for a entry to be a valid (and useful) component to its database. The clinical care of human beings is a far more complex process, requiring at the minimum a detailed record of the history of multiple clinical interventions and outcomes, relevant life history, and clinical measurements that span several modalities, from serum chemistry to brain imaging. It is not surprising that the data model required to capture all this information is extremely complex as is evidenced by the Health Level Seven (HL7) Reference Information Model . It is a remarkable tribute to the persistence of the individuals involved in these clinical data model standardization efforts that they have been able to arrive at a reasonably adequate standardized representation of not only the aforementioned descriptors but much of the process and business of clinical care.
This challenge relates back to our prior discussion of the complexity of microarray expression data which is much more akin to that of clinical laboratories than that of DNA sequences. Microarray measurements reflect the particular artifactual qualities of the measurement system used, in contrast to the (at least theoretical) invariance of sequence with sequencing technique. It is therefore not surprising that even prior to encompassing the entirety of clinical annotation, the genomics community has faltered in developing shared and standardized data models where the simplicity of the genome no longer dominates. However, the complexity of microarraydata models is easily dwarfed by the complexity of clinical data, and the effort to generate a standard human phenotypic data model for genomics would similarly demand orders of magnitude more effort than has been invested in data modeling by functional genomicists to date. It is therefore somewhat surprising that several efforts to develop clinical phenotype data models have been proposed without much reference or attempt to co-opt the efforts of the clinical informatics community.
A more efficient mechanism might be for functional genomicists and bioinformaticians to appropriate that segment of an existing clinical data model, such as HL7.
This might require that members of the bioinformatics community participate in an HL7 committee in order to fully accommodate the annotation needs of functional genomics. This would be a far smaller cost than having to duplicate the existing massive efforts of HL7.
As we described in section 5.4.
Was this article helpful?