There is a central underlying assumption in all gene-clustering techniques for expression analysis. Simply put, the assumption is that genes that appear to be expressed in similar patterns are in fact mechanistically related. Furthermore, the corollary to this assumption is that although genes may distantly affect the function of other gene products, they fall into groups of more tightly regulated mechanisms. For instance, the genes that govern chromosome function or meiosis may be more tightly linked to each other than they are to the genes involved with another function, such as apoptosis. This has been the basis of our collective experience in biological investigations over the last century: that there are groups of proteins which interact together more closely than others. Often they have been organized into pathways such as glycolysis, the Krebs cycle, and other metabolic pathways in which the gene products called enzymes have to work in concert. Other more obvious functional clusters are those of structural proteins whichhave to come together in a conserved and reproducible fashion in order to serve their purpose, whether they are the components of the ribosomal unit or the histoproteins which are essential to maintenance of the structure of chromatin. On this basis, if we can find genes whose expression patterns approximate one another, we can possibly impute that they are functionally clustered together. That is, they have functions that are related.
Several important caveats are worth noting here. First of all, it remains unclear just how discrete the functional groupings of gene function are in the cellular apparatus. It may be that individual gene products have so many different roles under different circumstances that several of them partake of essential roles in significantly different functions. The second caveat is that the term "functionally related" is itself ill-specified. If the pattern of expression of one gene is similar to that of another, it could signify all kinds of relationships, ranging from " two genes having gene products that physically interact," to "one gene encoding a transcriptional factor for the other gene," to "twogenes having different functions but similar promoter sequences," to "two genes both with promoter sequences bound by repressors which are knocked off when a nuclear receptor is activated, even though the two genes have widely disparate functions." Of course there is a level of abstraction at which all genes are functionally related in their role of keeping the cell alive and producing whatever components are needed for the rest of the organism. But below this level of abstraction, there are many alternative and, by their nature, sloppy definitions of clustering. Therefore, we should be somewhat wary of the claim that similarity in expression corresponds to similarity in function. Nonetheless, it is a useful starting point for many analyses of a genome whose function remains by and large unknown at this time. Additionally, as we discuss in the chapter on dissimilarity measures (chapter 3), the question of what constitutes a similar expression pattern is itself poorly defined, or at least has multiple alternative definitions. For example, similarity could mean having similar patterns of change over time. It could mean similar absolute levels of expression at any given point in time, or it could mean perfectly opposite but well-choreographed patterns of expression. Just which dissimilarity measure is chosen for looking at patterns of expression will inuence the kind of functional clusters that we expect.
Was this article helpful?