There are certainly laboratories that criticize data mining or clustering as a "fishing expedition," believing that the analytic technique starts with absolutely no hypothesis. There are also laboratories that apply every existing clustering method to their microarray data sets, hoping to find some "significant genes or clusters." The ideal analytic method lies between these two approaches. Certainly, different clustering methods are useful for different purposes. One should not blindly apply every clustering algorithm without knowing which method is best for answering particular kinds of questions. The analyst should start the discovery process with a hypothesis; this hypothesis may not be limited to a particular gene or set of genes but should ask a specific question of the data. At this point, it may be useful to list some potential questions one could start with:

• What uncategorized genes have an expression pattern similar to these genes that are well characterized?

• How different is the pattern of expression of gene X from other genes?

• What genes closely share a pattern of expression with gene X?

• What category of function might gene X belong to?

• What are all the pairs of genes that closely share patterns of expression?

• Are there subtypes of disease X discernible by tissue gene expression?

• What tissue is this sample tissue closest to?

• Which are the different patterns of gene expression?

• Which genes have a pattern that may have been a result of the influence of gene X?

• Which genes have a pattern that caused the expression pattern of gene X?

• What are all the gene-gene interactions present among these tissue samples?

• Which genes best differentiate these two group of tissues?

• Which gene-gene interactions best differentiate these two groups of tissue samples?

As we shall see as we progress through this chapter, different algorithms are more particularly suited to answer some of these hypotheses, compared with others.

