The molecular pathology of hemoglobin

As is the case for most monogenic diseases, the inherited disorders of hemoglobin fall into two major classes. First, there are those that result from a reduced output of one or other globin genes, the thalassemias. Second, there is a wide range of conditions that result from the production of structurally abnormal globin chains; the type of disease depends on how the particular alteration in protein structure interferes with its stability or function. Of course, no biological classification is entirely satisfactory; those which attempt to define the hemoglobin disorders are no exception. There are some structural hemoglobin variants which happen to be synthesized at a reduced rate and hence are associated with a clinical picture similar to thalassemia. And there are other classes of mutations which simply interfere with the normal transition from fetal to adult hemoglobin synthesis, a family of conditions that is given the general title 'hereditary persistence of fetal hemoglobin'. Furthermore, because these diseases are all so common and occur together in particular populations, it is not uncommon for an individual to inherit a gene for one or other form of thalassemia and a structural hemoglobin variant. The rather heterogeneous group of conditions that results from all these different mutations and interactions is summarized in Table 1.1.

Table 1.1 The thalassemias and related disorders.

a Thalassemia a0 a+

Deletion (-a) Non-deletion (aT) ß Thalassemia ß0 ß+

Normal Hb A2 'Silent'

8ß Thalassemia

Y Thalassemia 5 Thalassemia EYÖß Thalassemia

Hereditary persistence of fetal hemoglobin

Non-deletion Linked to ß globin genes GYß+

Unlinked to ß globin genes

Beginnings: the molecular pathology of hemoglobin 7

Over recent years, the determination of the molecular pathology of the two common forms of thalassemia, a and p, has provided a remarkable picture of the repertoire of mutations that can underlie human monogenic disease. Similarly, studies of the relationship between structure and function in the structurally abnormal hemoglobins have provided a great deal of information about normal human hemoglobin function.

In the sections that follow we will describe, in outline, the different forms of molecular pathology that underlie these conditions.

The p thalassemias

There are two main classes of p thalassemia, p0 thalassemia, in which there is an absence of p globin chain production, and p+ thalassemia, in which there is a variable reduction in the output of p globin chains. As shown in Figure 1.4, mutations of the p globin genes may cause a reduced output of gene product at the level of transcription or mRNA processing, translation, or through the stability of the globin gene product.

Defective p globin gene transcription

There are a variety of mechanisms that interfere with the normal transcription of the p globin genes. First, the genes may be either completely or partially deleted. Overall, deletions of the p globin genes are not commonly found in patients with p thalassemia, with one exception: a 619 bp deletion involving the 3' end of the gene is found frequently in the Sind populations of India and Pakistan, where it constitutes about 30% of the p thalassemia alleles. Other deletions are extremely rare.

A much more common group of mutations, which results in a moderate decrease in the rate of transcription of the p globin genes, involves single nucleotide substitutions in or near the TATA box at about -30 nucleotides (nt) from the transcription start site, or in the proximal or distal promoter elements at -90 nt and -105 nt. These mutations result in decreased P globin mRNA production, ranging from 10 to 25% of the normal output. Thus, they are usually associated with the mild forms of P+ thalassemia. They are particularly common in African populations, an observation which explains the unusual mildness of P thalassemia in this racial group. One particular mutation, C^T at position -101 nt to the P globin gene, causes an extremely mild deficit of P globin mRNA. Indeed, this allele is so mild that it is completely silent in carriers and can only be identified by its interaction with more severe P thalassemia alleles in compound heterozygotes.

Mutations that cause abnormal processing of mRNA

As mentioned earlier, the boundaries between exons and in-trons are marked by the invariant dinucleotides GT at the donor (5') site and AG at the acceptor (3') site. Mutations (base changes) that affect either of these sites completely abolish normal splicing and produce the phenotype of P0 thalassemia. The transcription of genes carrying these mutations appears to be normal, but there is complete inactivation of splicing at the altered junction.

Another family of mutations involves what are called 'splice site consensus sequences'. Although only the GT dinucleotide is invariant at the donor splice site, there is conservation of adjacent nucleotides and a common, or consensus, sequence of these regions can be identified. Mutations within this sequence can reduce the efficiency of splicing to varying degrees because they lead to alternate splicing at the surrounding cryptic sites. For example, mutations of the nucleotide at








Poly A

100 bp

Point mutations

Fig. 1.4 The mutations of the p globin gene that underlie p thalassemia

The heavy black lines indicate the length of the deletions. The point mutations are designated as follows: PR, promoter; C, CAP site; I, initiation codon; FS, NS, frameshift and nonsense mutations; SPL, splice mutations; Poly A, poly A addition site mutations.

position 5 of IVS-1 (the first intervening sequence), G^C or T, result in a marked reduction of P chain production and in the phenotype of severe P+ thalassemia. On the other hand, the substitution of C for T at position 6 in IVS-1 leads to only a mild reduction in the output of P chains.

Another mechanism that leads to abnormal splicing involves 'cryptic splice sites'. These are regions of DNA which, if mutated, assume the function of a splice site at an inappropriate region of the mRNA precursor. For example, a variety of mutations activate a cryptic site which spans codons 24-27 of exon 1 of the P globin gene. This site contains a GT di-nucleotide, and adjacent substitutions that alter it so that it more closely resembles the consensus donor splice site result in its activation, even though the normal splice site is intact. A mutation at codon 24 GGT^GGA, though it does not alter the amino acid which is normally found in this position in the P globin chain (glycine), allows some splicing to occur at this site instead of the exon-intron boundary. This results in the production of both normal and abnormally spliced P globin mRNA and hence in the clinical phenotype of severe P thalassemia. Interestingly, mutations at codons 19, 26 and 27 result in both reduced production of normal mRNA (due to abnormal splicing) and an amino acid substitution when the mRNA which is spliced normally is translated into protein. The abnormal hemoglobins produced are hemoglobins Malay, E and Knossos, respectively. All these variants are associated with a mild P+ thalassemia-like phenotype. These mutations illustrate how sequence changes in coding rather than intervening sequences influence RNA processing, and underline the importance of competition between potential splice site sequences in generating both normal and abnormal varieties of P globin mRNA.

Cryptic splice sites in introns may also carry mutations that activate them even though the normal splice sites remain intact. A common mutation of this kind in Mediterranean populations involves a base substitution at position 110 in IVS-1. This region contains a sequence similar to a 3' acceptor site, though it lacks the invariant AG dinucleotide. The change of the G to A at position 110 creates this dinucleotide. The result is that about 90% of the RNA transcript splices to this particular site and only 10% to the normal site, again producing the phenotype of severe P+ thalassemia (Figure 1.5). Several other P thalassemia mutations have been described which generate new donor sites within IVS-2 of the P globin gene.

Another family of mutations that interferes with P globin gene processing involves the sequence AAUAAA in the 3' untranslated regions, which is the signal for cleavage and poly-adenylation of the P globin gene transcript. Somehow, these mutations destabilize the transcript. For example, a T^C substitution in this sequence leads to only one-tenth of the normal amount of P globin mRNA transcript and hence to the phenotype of a moderately severe P+ thalassemia. Another example of a mutation which probably leads to defective processing of function of P globin mRNA is the single base substitution, A^C, in the CAP site. It is not yet understood how this mutation causes a reduced rate of transcription of the P globin gene.

There is another small subset of rare mutations which involve the 3' untranslated region of the P globin gene and are associated with relatively mild forms of P thalassemia. It is

Normal splicing

P gene





P+ thalassemia

Fig. 1.5 The generation of a new splice site in an intron as the mechanism for a form of P+ thalassemia

For details see text.

thought that these interfere in some way with transcription but the mechanism is unknown.

Mutations that result in abnormal translation of | globin mRNA

There are three main classes of mutations of this kind. Base substitutions that change an amino acid codon to a chain termination codon prevent the translation of | globin mRNA and result in the phenotype of |0 thalassemia. Several mutations of this kind have been described; the commonest, involving codon 17, occurs widely throughout Southeast Asia. Similarly, a codon 39 mutation is encountered frequently in the Mediterranean region.

The second class involves the insertion or deletion of one, two or four nucleotides in the coding region of the | globin gene. These disrupt the normal reading frame, cause a frameshift, and hence interfere with the translation of | globin mRNA. The end result is the insertion of anomalous amino acids after the frameshift until a termination codon is reached in the new reading frame. This type of mutation always leads to the phenotype of |0 thalassemia.

Finally, there are several mutations which involve the | globin gene initiation codon and which, presumably, reduce the efficiency of translation.

Unstable | globin chain variants

Some forms of | thalassemia result from the synthesis of highly unstable | globin chains which are incapable of forming hemoglobin tetramers, and which are rapidly degraded, leading to the phenotype of |0 thalassemia. Indeed, in many of these conditions no abnormal globin chain product can be demonstrated by protein analysis and the molecular pathology has to be interpreted simply on the basis of a derived sequence of the variant | chain obtained by DNA analysis.

Recent studies have provided some interesting insights into how complex clinical phenotypes may result from the synthesis of unstable | globin products. For example, there is a spectrum of disorders that result from mutations in exon 3 which give rise to a moderately severe form of | thalassemia in heterozygotes. It has been found that nonsense or frameshift mutations in exons I and II are associated with the absence of messenger RNA from the cytoplasm of red cell precursors. This appears to be an adaptive mechanism, called 'nonsensemediated decay', whereby abnormal messenger RNA of this type is not transported to the cytoplasm, where it would act as a template for the production of truncated gene products. However, in the case of exon III mutations, apparently because this process requires the presence of an intact upstream exon, the abnormal messenger RNA is transported into the cytoplasm and hence can act as a template for the production of unstable | globin chains. The latter precipitate in the red cell precursors together with excess a chains to form large inclusion bodies, and hence there is enough globin chain imbalance in heterozygotes to produce a moderately severe degree of anemia.

The molecular pathology of the a thalassemias

The molecular pathology of the a thalassemias is more complicated than that of the | thalassemias, simply because there are two a globin genes per haploid genome. Thus, the normal a globin genotype can be written aa/aa. As in the case of | thalassemia, there are two major varieties of a thalassemia, a+ and a0 thalassemia. In a+ thalassemia one of the linked a globin genes is lost, either by deletion (-) or mutation (T); the heterozygous genotype can be written -a/aa or aTa/ aa. In a0 thalassemia the loss of both a globin genes nearly always results from a deletion; the heterozygous genotype is therefore written - -/aa. In populations where specific deletions are particularly common—Southeast Asia (SEA) or the Mediterranean region (MED)—it is useful to add the appropriate superscript, as follows: — SEA/aa or — MED/aa. It follows that when we speak of an 'a thalassemia gene' what we are really referring to is a haplotype; that is, the state and function of both of the linked a globin genes.

a0 Thalassemia

Three main molecular pathologies, all involving deletions, have been found to underlie the a0 thalassemia phenotype. The majority of cases result from deletions that remove both a globin genes and a varying length of the a globin gene cluster (Figure 1.6). Occasionally, however, the a globin gene cluster is intact but is inactivated by a deletion which involves the major regulatory region HS40, 40 kb upstream from the a globin genes. Finally, the a globin genes may be lost as part of a truncation of the tip of the short arm of chromosome 16.

As well as providing us with an understanding of the molecular basis for a0 thalassemia, detailed studies of these deletions have yielded more general information about the mechanisms that underlie this form of molecular pathology. For example, it has been found that the 5' breakpoints of a number of deletions of the a globin gene cluster are located approximately the same distance apart and in the same order along the chromosome as their respective 3' breakpoints; similar findings have been observed in deletions of the | globin gene cluster. These deletions seem to have resulted from illegitimate recombination events which have led to the deletion of an integral number of chromatin loops as they pass through their nuclear attachment points during chromosomal replication. Another long deletion has been characterized

10 I

20 I

30 I

Va2 a2 a1 91





Fig. 1.6 Some of the deletions that underlie a° and a+ thalassemia

The heavy red lines indicate the lengths of the deletions. The unshaded regions indicate uncertainty about the precise breakpoints. The three small deletions at the bottom of the figure represent the common a+ thalassemia deletions.

in which a new piece of DNA bridges the two breakpoints in the a globin gene cluster. The inserted sequence originates upstream from the a globin gene cluster, where it normally is found in an inverted orientation with respect to that found between the breakpoints of the deletion. Thus it appears to have been incorporated into the junction in a way that reflects its close proximity to the deletion breakpoint region during replication. Other deletions seem to be related to the family of Alu-repeats, simple repeat sequences that are widely dispersed throughout the genome; one deletion appears to have resulted from a simple homologous recombination between two repeats of this kind that are usually 62 kb apart.

A number of forms of a0 thalassemia result from terminal truncations of the short arm of chromosome 16 to a site about 50 kb distal to the a globin genes. The telomeric consensus sequence TTAGGGn has been added directly to the site of the break. Since these mutations are stably inherited, it appears that telomeric DNA alone is sufficient to stabilize the ends of broken chromosomes.

The molecular pathology of a+ thalassemia

As mentioned earlier, the a+ thalassemias result from the in-activation of one of the duplicated a globin genes, either by deletion or point mutation.

a+ Thalassemia due to gene deletions

There are two common forms of a+ thalassemia that are due to loss of one or other of the duplicated a globin genes, -a37 and - a4 2, where 3.7 and 4.2 indicate the sizes of the deletions. The way in which these deletions have been generated reflects the underlying structure of the a globin gene complex (Figure 1.7). Each a gene lies within a boundary of homology, approximately 4 kb long, probably generated by an ancient duplication event. The homologous regions, which are divided by small inserts, are designated X, Y and Z. The duplicated Z boxes are 3.7 kb apart and the X boxes are 4.2 kb apart. As the result of misalignment and reciprocal crossover between these segments at meiosis, a chromosome is produced with either a single (- a) or triplicated (aaa) a globin gene. As shown in Figure 1.7, if a crossover occurs between homologous Z boxes 3.7 kb of DNA are lost, an event which is described as a rightward deletion, -a37. A similar crossover between the two X boxes deletes 4.2 kb, the leftward deletion -a42. The corresponding triplicated a gene arrangements are called aaaanti 37 and aaa11" 42. A variety of different points of crossing over within the Z boxes give rise to different length deletions, still involving 3.7 kb.

Non-deletion types of a+ thalassemia

These disorders result from single or oligonucleotide mutations of the particular a globin gene. Most of them involve the a2 gene but, since the output from this locus is two to three times greater than that from the a1 gene, this may simply reflect ascertainment bias due to the greater phenotypic effect and, possibly, a greater selective advantage.

Overall, these mutations interfere with a globin gene function in a similar way to those that affect the P globin genes. They affect the transcription, translation or post-translation-al stability of the gene product. Since the principles are the same as for P thalassemia, we do not need to describe them in detail with one exception, a mutation which has not been observed in the P globin gene cluster. It turns out that there is a family of mutations that involves the a2 globin gene termination codon, TAA. Each specifically changes this codon so that an amino acid is inserted instead of the chain terminating. This is followed by 'read-through' of a globin mRNA, which is not normally translated until another in-phase termination codon is reached. The result is an elongated a chain with 31 additional residues at the C terminal end. Five hemoglobin variants of this type have been identified. The commonest, ya1

zzi n i

(b) Rightward crossover

Fig. 1.7 Mechanisms of the generation of the common deletion forms of a+ thalassemia

(a) The normal arrangement of the a globin genes, with the regions of homology X, Y and Z.

(b) The crossover that generates the -a3 7 deletion.

(c) The crossover that generates the -a42 deletion.

J\ / %


(c) Leftward crossover aanti4

(c) Leftward crossover

-a hemoglobin Constant Spring, occurs at a high frequency in many parts of Southeast Asia. It is not absolutely clear why the read-through of normally untranslated mRNAs leads to a reduced output from the a2 gene, although there is considerable evidence that it in some way destabilizes the mRNA.

a Thalassemia/mental retardation syndromes

There is a family of mild forms of a thalassemia which is quite different to that described in the previous section and which is associated with varying degrees of mental retardation. Recent studies indicate that there are two quite different varieties of this condition, one encoded on chromosome 16 (ATR-16) and the other on the X chromosome (ATR-X).

The ATR-16 syndrome is characterized by a relatively mild mental handicap with a variable constellation of facial and skeletal dysmorphisms. These individuals have long deletions involving the a globin gene cluster, but removing at least 1-2 Mb. This condition can arise in several ways, including unbalanced translocation involving chromosome 16, truncation of the tip of chromosome 16, and the loss of the a globin gene cluster and parts of its flanking regions by other mechanisms.

The ATR-X syndrome results from mutations in a gene on the X chromosome, Xq13.1-q21.1. The product of this gene is one of a family of proteins that are involved in chromatin-me-diated transcriptional regulation. It is expressed ubiquitously during development and at interphase it is found entirely within the nucleus in association with pericentomeric het-erochromatin. In metaphase, it is similarly found close to the centromeres of many chromosomes but, in addition, occurs at the stalks of acrocentric chromosomes, where the ribos-omal (r) RNA is located. These locations provide important clues to the potential role of this protein in the establishment and/or maintenance of methylation of the genome. Although it is clear that ATR-X is involved in a globin transcription, it also must be an important player in early fetal development, particularly of the urogenital system and brain. Many different mutations of this gene have been discovered in association with the widespread morphological and developmental abnormalities which characterize the ATR-X syndrome.

a Thalassemia and the myelodysplastic syndrome

Since the first description of the finding of Hb H in the red cells of a patient with leukemia, many examples of this association have been reported. The condition usually is reflected in a mild form of Hb H disease, with typical Hb H inclusions in a proportion of the red cells and varying amounts of Hb H demonstrable by hemoglobin electrophoresis. The hemato-logical findings are usually those of one or other form of the myelodysplastic syndrome. The condition occurs predominantly in males in older age groups. Very recently it has been found that some patients with this condition have mutations involving ATR-X. The relationship of these mutations to the associated myelodysplasia remains to be determined.

Rarer forms of thalassemia and related disorders

There are a variety of other conditions that involve the P globin gene cluster which, although less common than the P thalassemias, provide some important information about mechanisms of molecular pathology and therefore should be mentioned briefly.

The SP thalassemias

Like the P thalassemias, the SP thalassemias, which result from defective S and P chain synthesis, are subdivided into the (SP)+ and (SP)0 forms.

The (§P)+ thalassemias result from unequal crossing over between the S and P globin gene loci at meiosis with the production of SP fusion genes. The resulting SP fusion chain products combine with a chains to form a family of hemoglobin variants called the hemoglobin Lepores, after the family name of the first patient of this kind to be discovered. Because the synthesis of these variants is directed by genes with the 5' sequences of the S globin genes, which have defective promoters, they are synthesized at a reduced rate and result in the phenotype of a moderately severe form of SP thalassemia.

The (SP)0 thalassemias nearly all result from long deletions involving the P globin gene complex. Sometimes they involve the ^y globin chains and hence the only active locus remaining is the gy locus. In other cases the Gy and Ay loci are left intact and the deletion simply removes the S and P globin genes; in these cases both the Gy and the ^y globin gene remains functional. For some reason, these long deletions allow persistent synthesis of the y globin genes at a relatively high level during adult life, which helps to compensate for the absence of P and S globin chain production. They are classified according to the kind of fetal hemoglobin that is produced, and hence into two varieties, Gy(AySP)0 and GyAy(SP)0 thalassemia; in line with other forms of thalassemia, they are best described by what is not produced: (^SP)0 and (SP)0 thalassemia, respectively. Homozygotes produce only fetal hemoglobin, while heterozygotes have a thalassemic blood picture together with about 5-15% hemoglobin F.

Hereditary persistence of fetal hemoglobin (HPFH)

Genetically determined persistent fetal hemoglobin synthesis in adult life is of no clinical importance except that its genetic determinants can interact with the P thalassemias or structural hemoglobin variants; the resulting high level of Hb F production often ameliorates these conditions. The different forms of HPFH result from either long deletions involving the SP globin gene cluster, similar to those that cause (SP)0 thalassemia, or from point mutations that involve the promoters of the Gy or Ay globin gene. In the former case there is no P globin chain synthesis and therefore these conditions are classified as (SP)0 HPFH. In cases in which there are promoter mutations involving the y globin genes, there is increased y globin chain production in adult life associated with some P and S chain synthesis in cis, i.e. directed by the same chromosome, to the HPFH mutations. Thus, depending on whether the point mutations involve the promoter of the Gy or Ay globin gene, these conditions are called Gy P+ HPFH and ^y P+ HPFH, respectively.

There is another family of HPFH-like disorders in which the genetic determinant is not encoded in the P chain cluster. In one case the determinant encodes on chromosome 6, although its nature has not yet been determined.

It should be pointed out that all these conditions are very heterogeneous and that many different deletions or point mutations have been discovered that produce the rather similar phenotypes of (SP)0 or Gy or ^y P+ HPFH.

Structural hemoglobin variants

Over 700 structural hemoglobin variants have been described, most of which are of no clinical significance. Only if the underlying mutation interferes with the stability or function of the hemoglobin molecule is there any important clinical accompaniment.

The majority of these variants result from missense mutations; that is, base substitutions which produce a codon change which encodes a different amino acid in the affected globin chain. Rarely, structural variants result from more subtle alterations in the structure of the a or P globin genes. Shortened chains may result from internal deletions of their particular genes, while elongated chains result either from duplications within genes or frameshift mutations which allow the chain termination codon to be read through and additional amino acids to be added to the C terminal end.

10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook

Post a comment