Info

Artificial zipcode sequences are used to keep track of the vast number of different SNPs.

In practice large numbers of SNP analyses are often run in parallel. One way to sort these out is to use so-called Zipcode sequences attached to the primers. Each SNP is allocated a different Zipcode sequence that can be specifically bound by using the complementary sequence or cZipcode. The cZipcode sequence is attached to a solid support or a polystyrene bead. Different cZipcode sequences may be attached to color coded beads that are later separated by a FACS (fluorescence activated cell sorter— see Ch. 21) or attached to a solid surface forming an array.

Exons can be experimentally isolated and identified by using their flanking splice sites.

Gene Identification by Exon Trapping

In eukaryotes, the actual coding sequences only account for a minority of the DNA. Given a large stretch of DNA sequence, how are the genes identified? Although computer algorithms exist to analyze sequences, the method known as exon trapping allows the experimental isolation of coding sequences. This method relies on the fact that exons are flanked by splice recognition sites that are used during RNA processing to exon trapping Experimental procedure for isolating exons by using their flanking splice recognition sites

Gene Identification by Exon Trapping 689

Personal Genomics

To take full advantage of pharmacogenomics requires knowledge of individual DNA sequence differences, at least for those genes directly relevant to the type of clinical treatment proposed. At present specific genes can be sequenced on a need-to-know basis. However, it has been suggested that everyone should get their complete genome sequenced individually.

The three factors involved are time, technology and cost. The human genome project took about 10 years, cost $3,000 million and has given a consensus sequence based on 10 different people. Perlegen Sciences in California has used microarray based sequencing to provide individual genome sequences to approximately 25 people (as of August 2002) at a cost of $1.5 million each. It is believed that miniaturization combined with high throughput technology could perhaps reduce the cost to $50,000 per person in a year or two. Other, more futuristic projections, based on trends rather than present technology, suggest that within 10 years customers will be able to buy their own genome sequences for the same price as a flat screen TV.

So what will your personal DNA sequence reveal? We know a reasonable amount about hereditary defects due to single genes (such as cystic fibrosis or sickle cell anemia). However, the genetic factors involved in conditions such as heart disease, obesity, cancer, life expectancy and mental disorders are more complex and due to multiple interacting genes, many of which have yet to be identified. Even though interpretation will be a problem, it will doubtless be more economical to get your whole genome sequenced than pay for lots of individual tests for each gene whose effects are understood.

Once everyone takes their own DNA sequence home to analyze on their personal computer, we will presumably see an orgy of self diagnosis. Will we also see an outbreak of numerology with people claiming to find messages from aliens encoded in their genomes?

splice out the introns (see Ch. 12 for details of splicing). Introns can be spliced using an in vitro system; therefore, a length of DNA containing the splice recognition site can be identified. Consequently, exon trapping can be used even if the sequence of the DNA is unknown, although in this case we will not know the relative order in the original DNA of the exons that are isolated.

During exon trapping, the DNA to be analyzed must first be cloned into a special vector that can replicate both in E. coli and in suitable animal cells. The vector carries an artificial mini-gene consisting of just two exons and an intervening intron, together with a promoter and poly(A) tail recognition site (Fig. 24.24). The intron contains a multiple cloning site for cloning lengths of unknown DNA. The pSPL vectors, as they are called, use a simian virus 40 (SV40) origin of replication as well as an SV40 promoter and tail site for the mini-gene. These vectors can replicate in modified monkey cells (COS cells) that contain a defective SV40 genome integrated into a host chromosome.

DNA containing the exons to be trapped is cut into segments using an appropriate restriction enzyme. These segments are inserted into the multiple cloning site within the intron on the pSPL vector (Fig. 24.25). The plasmid is then transformed into the COS monkey cells. The mini-gene will be expressed and the RNA primary transcript will be spliced. If an extra exon was cloned into the middle of the mini-gene, it will be present in the mRNA, which will therefore be longer. To isolate the trapped exon, the mRNA is converted to cDNA and then PCR is used to amplify the region containing the trapped exon. This technique will have to be used in conjunction with sequence analysis to identify all the different exons within the human genome.

FIGURE 24.24 Exon Trapping Vector

The pSPL vector is used to identify exons within regions of suspected coding DNA. The vector has both bacterial and eukaryotic origins of replication so that it can be grown in both E. coli and animal cells. The multiple cloning site is within an intron sequence that is flanked by two exons. This region of the vector can be transcribed into RNA because it contains eukaryotic promoter and eukaryotic poly(A) tail sequences.

Multiple cloning promoter

Multiple cloning

FIGURE 24.24 Exon Trapping Vector

The pSPL vector is used to identify exons within regions of suspected coding DNA. The vector has both bacterial and eukaryotic origins of replication so that it can be grown in both E. coli and animal cells. The multiple cloning site is within an intron sequence that is flanked by two exons. This region of the vector can be transcribed into RNA because it contains eukaryotic promoter and eukaryotic poly(A) tail sequences.

promoter

Eukaryotic

Tail recognition

Bacterial origin of replication

Eukaryotic origin of replication

Eukaryotic

Eukaryotic origin of replication

Tail recognition

Bacterial origin of replication

A major update to the human genome in 2004 resulted in the number of estinated human genes dropping from around 35,000 to around 25,000. Build 34 of the human genome contains 22,287 protein-encoding genes with an average of 1.5 alternative transcripts and 10 exons per gene. The total coding sequence is 34 Mbp or 1.2% of the euchromatin (i.e., the genome excluding the highly condensed and unsequenced regions, especially in the centromeres and telomeres).

Vast amounts of genetic data are now becoming available. Computer analysis of this data has essentially created a new field of enquiry.

Bioinformatics and Computer Analysis

The field of bioinformatics deals with the computerized analysis of large amounts of sequence data. A variety of websites are now available for online searching and manipulation of sequences (Table 24.03).

Both in molecular biology and other areas, vast amounts of information are accumulating in computer data banks. Data mining is the use of computer programs to find useful information by filtering or sifting through the data. Hence, intelligent software designed for data mining is sometimes known as "siftware". Genome mining is the application of this approach to genomic data banks. There are several stages to genome mining:

1. Selection of the data of interest.

2. Preprocessing or "data cleansing". Unnecessary information is removed to avoid slowing or clogging the analysis.

3. Transformation of the data into a format convenient for analysis.

4. Extraction of patterns and relationships from the data.

5. Interpretation and evaluation.

bioinformatics The computerized analysis of large amounts of biological sequence data data mining The use of computer analysis to find useful information by filtering or sifting through large amounts of data genome mining The use of computer analysis to find useful information by filtering or sifting through large amounts of biological sequence data

Bioinformatics and Computer Analysis 691

Promoter

Promoter

Clone fragment of eukaryotic

Was this article helpful?

0 0

Post a comment