There is a good deal of enthusiasm at present about the emerging discipline of proteomics. The promise of proteomics is that we will be able to measure, in a similarly comprehensive and parallel fashion to RNA microarray measurements, the concentrations of proteins present in a particular cellular system. In a very abstract sense, for the purely computational bioinformatician, the field of proteomics does not harbor any particular novelty in that all it provides is another 100,000 variables, or more, to describe the state of cellular processes. In that perspective, a proteomics data set reduces simply to another array that has distinguishing noise characteristics (i.e., sources of biological and measurement variability as described in section 3.2.5) and is just as amenable to the clustering and classification techniques of chapter 4 as any set of microarray expression measurements. Less abstractly, however, proteomics offers a set of insights that are quite different and divergent from those of expression microarrays. The assumption underlying expression microarray measurements is that by capturing the patterns of expression management, we will capture the basic irregulatory rhythms of the cell . Although these assumptions may hold at times and have done remarkably well in helping biologists elucidate some fundamental biology and to classify clinical phenomena, there are several persuasive reasons why these assumptions should not always hold. First, we know that most of the effector molecules in cellular metabolism are proteins. To the extent that the timing of protein synthesis and the half-life of proteins is not closely coupled to that of RNA expression, the assumption of the representativeness of RNA levels does not hold. As outlined in section 1.5, these assumptions do not hold in many instances. Nonetheless, in proteomics, we will be bedeviled by a new set of assumptions that will be equally problematic and challenging. Assuming that we have high reproducibility and compact systems for assessing the concentration of tens of thousands of proteins, we will be faced with the following challenges:
• Similar concentrations do not imply co-regulation. Given that proteins have hugely different half-lives, even within a single cell (e.g., a structural protein in a bone osteoblast and a parathyroid hormone receptor in the same cell), then the concentrations of protein molecules in a cell may only remotely reflect joint regulation. This problem also haunts the analysis of RNA expression microarray data because of the wide range of the stability-degradation rate of mRNA.
• Conversely, repeatedly different concentrations of two proteins imply co-regulation. At any given sampling time, the two proteins could have quite variable concentrations and different mutual relationships. Yet, there is nothing about this to preclude important functional interactions between these proteins.
• Localization heterogeneity. Unlike transcription of genes, which occurs within the nucleus, protein activity has very distinct and heterogeneous functional significance in different parts of the cellular compartments, and therefore an essential part of understanding protein function and regulation from proteomic data will require detailed localization to subcompartments of organelles in order to be meaningful. This problem also exists with RNA expression microarray data, because of the differing biological implications of RNA concentrations measured before splicing, after splicing in the cytoplasm, and during translation.
These challenges of proteomics will eventually be addressed by novel ways of looking at protein activity over time and in different spatial locations. Nonetheless, at present, the basic mechanism for cheaply and reliably obtaining large numbers of parallel measurements of protein activity have yet to be worked out and industrialized, so that these developments are not likely to occur on a large scale for at least 1 or 2 years. When these challenges have been resolved, then indeed the arrays of proteinomic data will be amenable to the same techniques of analysis as described in this book for RNA expression. For thoughtful and comprehensive insights into the challenges of proteomics and data analytic techniques, we refer the reader to the following papers [25,69,141] and websites http://www.expasy.ch/, http://www.hip.harvard.edu/.
There are several exceptions to this rule, such as mature red blood cells, which lack nuclei and therefore the organism's genome, and gametes (spermatozoa or ova), which have half the usual complement of DNA. However, for the purposes of this overview, the above generalization will suffice.
These pairings are present in all DNA and are the most thermodynamically stable of all possible pairings of nucleotides, which accounts for the high specificity with which complementary strands of nucleotide polymers bind to each other.
That portion of the entire DNA molecules that is transcribed into RNA is called the coding region.
Transcription involves unwinding a DNA molecule so that the particular gene that is to be transcribed is sufficiently exposed to the transcriptional machinery, notably RNA polymerase.
Not all RNA codes for proteins, however. In fact, only 4% of total RNA is made of coding RNA. Of the noncoding RNA, ribosomal RNA (rRNA) and transfer RNA (tRNA) are used in various components of the protein translational apparatus mentioned below, and are not themselves translated into proteins. Eukaryotes also contain small nuclear RNA (snRNA), which is part of the splicing apparatus (see below); small nucleolar RNA (snoRNA), which is involved in methylation of rRNA; and small cytoplasmic RNA (scRNA), which can play a role in the expression of specific genes.
[141In fact, some cells can use the ratio of one alternative splicing to another to govern cellular behavior.
[151Modifications to the RNA include the addition of a cap at the 52 end and a tail made of repeated adenine nucleotides (the poly-A tail)
A complex containing hundreds of proteins and special-function RNA molecules. [171There are notable exceptions: the code for the naturally occurring amino acid selenocysteine is identical to that for a stop codon, except for a particular nucleotide sequence further downstream. One way to circumvent this problem is with multiple serial measurements of all proteins and then to subject them to a dynamics analysis.
Was this article helpful?