Selecting Software

Any high-throughput functional genomics investigation will require that there be an armamentarium of analytic tools on hand. Even a large laboratory that has several bioinformaticians creating customized tools will still haveto integrate these tools with the best-of-breed available commercially or through academic licenses. More often, there is insuffcient time or personnel to develop any software locally, so external software tool kits must be obtained. The decision of how to go about acquiring these is inevitably weighed down by considerations of the local environment, local resources, budgetary constraints, and the need for customizability of these tool kits. It is well outside the scope of this book to address this process. Nonetheless, we provide here an abbreviated framework with which to consider acquisitions of bioinformatics software for use in functional genomics.

In order to provide the analytic services required, there are two broad categories of function that must be served: a collaborative data-sharing environment, and a set of robust analytic tools. The former will allow a research group to share common materials such as microarray data sets, annotations of these data sets, publications, and workflow tools such that the progress of any particular study can be monitored, and the latter will provide for the extraction of clinical or biological knowledge from genomic data sets.

Primary concerns in the first category, collaborative software platforms, will be

1. the scalability of the software to satisfy large numbers of local and remote users;

2. the security of the site to protect both the intellectual property of the data as well as the privacy of any patients whose data may be stored in the database.

A highly nontrivial aspect of the data-sharing or collaborative software tool kit is to maintain a data model for the microarrays that is capable of encompassing all the measurements that are likely to be obtained through the available microarray platforms (see section 5.4). Similarly, the collaborative software should have a suffciently detailed and flexible phenotypic data model to allow structured annotations using controlled vocabularies to support the phenotypic-genotypic correlations that are an essential component of a successful functional genomics pipeline.

The second category of bioinformatics software includes the analytic tools that provide the implementation of the techniques that we have described in the data-mining chapter (chapter 4) and in the measurement techniques chapter (chapter 3) such as fold difference, self-organizing maps, dendrograms, support vector machines, relevance networks, fr-means clustering, matrix incision trees, gene shaving, and others. They also provide means of visualizing analytic results and their relationships and linking these with the published literature, annotations, and links to relevant biomedical databases.

In the late 1990s software packages fell cleanly into one of those two aforementioned categories. Subsequently, in order to capture additional market share, the analytic tool vendors have added collaborative data-sharing capabilities and likewise the collaborative data sharing software has become increasingly equipped with a suite of analytic tools commonly used in the investigations reported in the literature. At present, however, few of the solutions are comprehensive. That is, the suite of analytic tools provided by the collaborative software vendors are limited or limiting in their analytic capabilities, and the collaborative, data-sharing, and warehousing facilities of the analytic tools are embryonic and of unproven scalability. Some of the best software packages are free; they have been developed often with federal funding by the best and the brightest and most public-minded of bioinformaticians. As a result, an investigator may not always get what he or she pays for in terms of value in quality software. Nonetheless, with respect to support for installation, debugging, and technical advice, these will be very limited with any of these free packages. Also, publicly available software packages typically require some knowledge of operating systems in order to be able to install them correctly on the computational hardware platforms. This is particularly true of Unix- and Linux-based computational platforms which are typically the most stable and reliable of platforms but also require the most expertise in technical support and management.

Some of the more notable free analysis solutions are listed and coarsely taxonomized in table 7.1. Note that many of these are available for free only to academic users, and charges may apply for commercial users. Also, refer to table 5.2 for free data models and databases.

Table 7.1: List of freely available analysis software for microarray

Tool name

Description

URL http://

AMADA

Dendrograms, principal components analysis

http://www.web.hku.hk/~xxia/software/AMADA.htm

ANOVA

Matlab software programs for microarray data

http://www.jax.org/research/churchill/software/anova/

Array Viewer

Differential gene expression

http://www.tigr.org/softlab

BRB array tools

Excel software, provides scatterplots, dendrograms, and class prediction

http://www.linus.nci.nih.gov/BRB-ArrayTools.html

CAGED

Clustering by expression dynamics

http://www.kebab.tch.harvard.edu/caged

Cleaver

fr-means clustering, principal components analysis, and classification

http://www.classify.stanford.edu/

Cluster and TreeView

Dendrograms, self-organizing maps, fr-means clustering, principal components analysis

http://www.rana.lbl.gov/EisenSoftware.htm

Cyber-T

Differential gene expression

http://www.genomics.biochem.uci.edu/genex/cybert

dChip

Differential gene expression

http://www.dchip.org/

Equalizer

Microarray normalization

http://www.organogenesis.ucsd.edu/TheEqualizer.htm

Expression Profiler

Dendrograms, with many dissimilarity measures

http://www.ep.ebi.ac.uk/

GeneCluster

Self-organizing maps

http://www-genome.wi.mit.edu/MPR/GeneCluster/GeneCluster.html

J-Express

Dendrograms,

http://www.ii.uib.no/~bjarted/jexpress/main.html

self-organising maps, principal components analysis, fr-means clustering

MAExplorer

Differential gene expression, scatterplots, fr-means clustering, dendrograms

http://www-lecb.ncifcrf.gov/MAExplorer

Multiple

Experiment

Viewer

Many normalization, clustering, dissimilarity measures, and graphical options

http://www.tigr.org/softlab

RelNet

Relevance networks

http://www.book.chip.org/

SAM

Excel software, compares expression to clinical parameters

http://www-stat.stanford.edu/~tibs/SAM/index.html

ScanAlyze

Microarray spot detection

http://www.rana.lbl.gov/EisenSoftware.htm

SpotFinder

Microarray spot detection

http://www.tigr.org/softlab

XCluster

Dendrograms, self-organizing maps, fr-means clustering

http://www.genome-www.stanford.edu/~sherlock/cluster.html

Recent reviews of commercially available software tools in this domain include [29, 16, 35].

Recent reviews of commercially available software tools in this domain include [29, 16, 35].

Was this article helpful?

0 0

Post a comment