Sequence Analyses

The availability of a family of four proteins in humans allowed us to perform sequence comparisons between them and also with the proteins present in other organisms [41]. The comparative analyses showed that, among the human proteins, VPS13A is the most similar to Vps13p from yeast. VPS13C is very similar to VPS13A, and its origin seems to be from a very recent duplication event. The other two human proteins are less conserved. VPS13D shows similarities with VPS13A throughout its sequence but this has become much more localised in VPS13B. These results agree with the phylogenetic data in a context in which VPS13A keeps a high similarity with the yeast homologue. This suggests that some selective pressure might be acting, and that this human protein has probably retained more functions than the rest of human VPS13 proteins from its yeast counterpart. The other members of the family have a faster rate of divergence from the original sequence and their function may have also become more different or specialised. VPS13B is probably the result of an ancient duplication, and it has lost most of the similarity with other proteins from the family except for some regions.

Figure 1 shows the features found in the human VPS13 proteins. The most conserved regions are the N- and C-termini but another region, named C2 and present in the four proteins, also shows a good degree of conservation (for alignment data see supplementary Fig. B in [26] and Fig. 3 in [41]). In proteins VPS13A, C and D, this sequence partially overlaps with a region called DUF1162, a domain deduced from comparisons among proteins belonging to the VPS13 family but which is not detected in VPS13B. These three regions (N, C, and C2) are probably involved in functions common to all the four VPS13 proteins as they have remained conserved through their evolution.

Analysis of the VPS13C sequence showed an internal duplication of a 494aa-region, caused by the duplication of 11 exons, responsible for the larger size of this protein when compared with VPS13A [41]. From this finding, it was possible to detect that this internal duplication in VPS13C was not a single event, but that the same process had happened in several occasions during the evolution of this protein family. An extra region similar to the two described above was found in VPS13C, two are present in VPS13A and three in VPS13D. The conservation between all these repeated regions (called R1, R2 and R2b, see Fig. 1) varies, but a common core element around 45 residues long, containing the sequence P-X4-P-X1317-G, was found in all of them and is also present in other, non-human, members of this family [41]. The fact that this smaller element is conserved in all R regions suggests that it may be important for the function and/or the structure of the VPS13 proteins. The exception is VPS13B, where these regions cannot be detected. However, it is possible that this protein has also undergone internal duplication events (which would explain its size) and that the similarities have been lost during its evolution.

As with chorein, database comparisons with VPS13B, -C and -D protein sequences did not predict any known domains or motifs with a high degree of confidence. The two motifs with the highest probability are UBA (Ubiquitin-associated domain) and Ricin-B-lectin (lectin domain of ricin B chain profile), both in

Fig. 1 Features of human VPS13 proteins (variant 1A). Areas in common are indicated in the main bar: N, N-terminal region; C, C-terminal region; C2, C-terminal region 2; R (1, 2, 2b), repeated region (black area shows the 45-aa core element, see Sect. 2.3). Small black boxes above the bar indicate TM regions predicted by programs shown on the right; HMMTOP or TMHMM did not predict any TM region. Differences with variant 2 appear below for VPS13B, C, and D (see Table 1). Regions showing an above-threshold score with described motifs appear below the protein bar. Long black boxes above VPS13C protein show the duplicated regions 859-1350 and 1373-1864. Horizontal bars below R regions indicate the fragment of that R region that shows similarity in pairwise comparison with regions (VPS13)A-R1, A-R2, C-R1, C-R2b, C-R2, D-R1, D-R2b and D-R2, respectively. For more details, please see [41]. (Reprinted from [41] with permission from Elsevier.)

Fig. 1 Features of human VPS13 proteins (variant 1A). Areas in common are indicated in the main bar: N, N-terminal region; C, C-terminal region; C2, C-terminal region 2; R (1, 2, 2b), repeated region (black area shows the 45-aa core element, see Sect. 2.3). Small black boxes above the bar indicate TM regions predicted by programs shown on the right; HMMTOP or TMHMM did not predict any TM region. Differences with variant 2 appear below for VPS13B, C, and D (see Table 1). Regions showing an above-threshold score with described motifs appear below the protein bar. Long black boxes above VPS13C protein show the duplicated regions 859-1350 and 1373-1864. Horizontal bars below R regions indicate the fragment of that R region that shows similarity in pairwise comparison with regions (VPS13)A-R1, A-R2, C-R1, C-R2b, C-R2, D-R1, D-R2b and D-R2, respectively. For more details, please see [41]. (Reprinted from [41] with permission from Elsevier.)

VPS13D [41]. In particular, the domains originally described for VPS13B/COH1 [21] were either undetectable or below the default confidence threshold for the software we used. Another question that has not been resolved since chorein was first described is whether this is a family of soluble or membrane proteins. The VPS13B/COH1 protein was described as a multiple transmembrane protein, with ten transmembrane (TM) domains [21]. However, TM prediction showed an inconsistent pattern, detecting from none to 18 TM domains in the same VPS13 protein depending on the program used (see Fig. 1). The lack of a signal peptide in all VPS13 proteins (although this is not always present in transmembrane proteins) and the functional data from their yeast homologue (see above) support the "soluble" option.

One motif that is detected in chorein by computational analysis is the Tetratrico Peptide Repeat (TPR), a structural motif of 34 aa defined by a pattern of small and large hydrophobic residues, where no positions are completely invariant, present in a wide range of proteins, and which mediates protein-protein interactions and assembly of multiprotein complexes [7]. Ten TPR motifs are detected in chorein (see Swiss-Prot entry Q96RL7) but, interestingly, none is detected in the closely related VPS13C protein (nor in VPS13B, VPS13D, or the yeast Vps13p proteins) and only six of these motifs are detected in the mouse chorein homologue (Swiss-Prot entry Q5H8C4). These data suggest that the detection of these motifs in the chorein sequence is most probably just a coincidence due to the high degree of degeneration of the TPR consensus sequence.

0 0

Post a comment