Beyond Reference Strains Towards a Second Generation Virogenomics

As part of the organization of the increasing number of viral genomic nucleotide sequences, in most data banks one or a few reference sequences are chosen to represent a virus species. Other, related sequences may become "genome neighbors" ofthe reference sequence [10]. Therefore the basis for comparative genomics that we have outlined in Section 17.3 will imply a set of reference sequences, on the understanding that "neighbor" sequences cannot obscure the main findings based on phylogenetic relationships or searches for domains, motifs, open reading frames, and so forth. Most relevant for evolutionary studies, reference viral proteins are appropriate for incorporation in clusters of related viral proteins that have been constructed or are in preparation (e.g., http://www.ncbi.nlm.nih.gov/ genomes/VIRUSES/vog.html) [10].

Despite its appropriateness for several purposes related to general biological evolution, a point which is often ignored is the extreme biological relevance of minimal genetic changes in virus biology. A wealth of evidence (for reviews, see Refs. [2, 4, 6]) indicates that one or a few nucleotide substitutions may be sufficient to produce a relevant biological alteration in the virus. A phylogenetic tree in which the tips of the branches are mutant clouds rather than defined sequences illustrates this point (Fig. 17.1). This is particularly true for RNA viruses (or viruses which include an RNA step in their replication cycle). During their replication, RNA viruses mutate at average rates of 10-4 to 10- mutations per nucleotide copied [54]. This represents rates per nucleotide site which are about 105-fold higher than the rates operating normally during the replication of cellular DNA. This constant mutational input (between 0.1 and 1 mutations per genome are produced every time a template RNA is copied into a daughter RNA or DNA strand) originates highly dynamic mutant distributions termed viral quasispecies. This concept was developed on theoretical basis by M. Eigen, P. Schuster, and their colleagues to describe primitive RNA (or RNA-like) replicons at the onset of life on Earth [31, 55-57]. Determination ofthe quantitative parameters that underlie the Darwinian principles of genetic variation, competition, and selection has been achieved with model experiments on replication of simple RNA templates in vitro [58, 59]. Quasispecies theory has been instrumental to understanding the adaptive dynamics of RNA viruses. Presently, virologists use an extended definition of "quasispecies" to describe dynamic distributions of nonidentical but closely related mutant and recombinant viral genomes subjected to a continuous process of genetic variation, competition, and selection, and which act as a unit of selection [60]. This general definition captures theoretical developments that have extended quasispecies to nonequilibrium conditions and regards mutation and recombination as sources of genetic variation (recent reviews of quasispecies theory and its implications for virology are found in Refs. [1, 2, 6, 61]). Genetic variation, together with environmental heterogeneity and bottleneck events, underlie the diversification of viruses within hosts and between hosts. Sequence space, a concept derived from information theory, describes all possible nucleotide sequences available to a genetic system. Virus adaptation can be viewed as the result of movements in sequence space to reach points of replicative competence [31, 56]. Connectivity between points of the sequence space is key to the adaptive potential of a genetic system. The main reason why a virus generally maintains its biological identity in terms of pathogenic potential, host range, etc., is the operation of structural and functional constraints that have been well documented for several viruses [62]. However, occasional deviations from prototypic behavior occur, and such deviations may be relevant to disease manifestations and disease emergence [63, 51].

In reality, all known viruses consist of multitudes of related sequences. Viral quasispecies are typical not only of RNA viruses but also of some DNA viruses [64-66]. A relevant consequence of quasispecies is that the behavior of an individual genome, with a defined nucleotide sequence, may be conditioned by the mutant cloud that surrounds it. This was initially proposed as a result of in silico experiments as a derivation of quasispecies theory (reviewed in Ref. [56]), and confirmed by several observations with viruses in cell culture and in vivo [67-71]. A

Fig. 17.1 A rooted phylogenetic tree with branches of different lengths that define related viral genomes. Each virus (tip of a branch) includes in reality a cloud of mutants that despite showing close relatedness may nevertheless express widely different biological properties. On the right are listed a number of variant phenotypes frequently seen in viruses.

Fig. 17.1 A rooted phylogenetic tree with branches of different lengths that define related viral genomes. Each virus (tip of a branch) includes in reality a cloud of mutants that despite showing close relatedness may nevertheless express widely different biological properties. On the right are listed a number of variant phenotypes frequently seen in viruses.

17.4 Beyond Reference Strains: Towards a Second-Generation Virogenomics? | 381

current working model is that components of the mutant spectra expressing suboptimal viral functions may act collectively as dominant-negative mutants and, therefore, interfere with the replication of fitter genomes within the same replica-tive ensemble. A concerted action of dominant-negative mutants provides a biochemical basis for modulating effects of mutant spectra, predicted by quasispecies theory [56]. This model is finding increasing experimental support [71-73].

How are we to reconcile the reality of the population structure of many important viruses - which dictates that relevant biological traits may depend on quantitatively minor changes in the genome - with the general principles of data banks centered around "reference sequences"? Does it make sense to attempt the "neighbor" sequences to form new data sets or subsections of the more general data sets? For what purpose, and how? These are questions without an easy answer. A computer program named Partition Analysis of Quasispecies (PAQ) was developed to identify natural groupings of nucleotide or amino acid sequences that are very similar, as found in mutant spectra of viral quasispecies. It is a nonhierarchical clustering method that partitions sequences in spherical groups, allowing for overlapping groups to occur [74] (program files http:// www.vetmed.iastate.edu/faculty_staff/Users/carplab/PAQ/main.html). The program assumes that the less distant sequences should be grouped together. A radius is selected and each sequence is used as a center to define spherical clusters. One relevant output of the program is compactness, which characterizes the number of variants surrounding the center of the cluster. Increasingly smaller radii can be selected to search for subgroups. This and other clustering techniques are less dependent on evolutionary models than other phylogenetic methods described in Section 17.3. PAQ has been applied to the analysis of envelope sequences of human immunodeficiency virus isolated from different brain regions of infected patients, and to rev sequences of a sample of equine infectious anemia virus [74].

There are some reasons for wanting minority genomes to enter the data banks. One obvious one is that modifications of the host range may relate to minor genetic change independent of the phylogenetic position of the virus. This is more likely to occur with RNA viruses because of their generalized high mutation rates, and quasispecies behavior that results in adaptability through point mutations [75]. A second reason is the presence of memory genomes in viral quasispecies, documented in cell culture [76-79] and in vivo [80]. Memory genomes are a subset of the genomes found in mutant spectra of viral quasispecies that reflect those genomes that were dominant at an early stage of the evolution of the same viral lineage. Since memory genomes often represent variants with interesting phenotypic traits (at variance with the traits of the dominant populations in which they are immersed), their detection and recording in data banks is highly relevant. Microarray technology suitable for detecting memory genomes in viral quasispe-cies is summarized in Section 17.5.

Our proposal is that understanding of the complexity of viral populations and the relationships between limited genetic change and modification of phenotypic traits would benefit from second-generation data banks in which mutant viruses can be organized around their respective standard sequences as in the PAQ program [74]. The concept is parallel to cataloguing single nucleotide polymorphisms of the human genome, particularly in relation to genetic disease. Implementation of "second-generation" data banks with point mutations of viral genomes would not, however, be without difficulties. One problem derives from the complex relationships between genotype and phenotype. A current example may serve to illustrate this point. The catalogue of mutations in the HIV-1 genome that are associated with decreased sensitivity to antiretroviral inhibitors (the most frequent being reverse transcriptase and protease inhibitors) has steadily increased with time. It is now evident that combinations of mutations, in a sequence-context-dependent manner, may produce different degrees of resistance to one or several inhibitors. For this reason, a data bank relating point mutations to phenotypes of viruses (such as http://hivdb.stanford.edu/ or other data banks listed in Table 17.1) will have to be periodically updated and will unavoidably suffer of some degree of uncertainty. A second problem may be still more severe: quasispe-cies behavior may depend on ensembles of mutants (those that constitute mutant spectra), as described previously. Such behavior is often difficult to predict despite knowledge about the types of individual genomes that dominate the mutant spectra. It seems clear that considerable scientific progress will be needed before complex interactions among individual viral genomes that produce a range of pheno-typic traits can be incorporated into data sets.

0 0

Post a comment