Source of Sequences and Multiple Alignment

Next: Conservation Analysis and Up: Methods Previous: Methods

Source of Sequences and Multiple Alignment

Forty four complete or nearly complete eukaryotic phosphatase sequences were gathered from the literature, NBRF-PIR version 33 databank and unpublished results (see Figure 1 for references). The sequences were first compared pairwise using the Needleman and Wunsch algorithm [24], followed by automatic multiple alignment [26][25]. The resulting alignment clearly showed the common region of the phosphatases, but the alignment contained inconsistencies in the long N- and C- terminal extensions that are only seen in some members of the family. Accordingly, a central segment was identified in PP1 - (rabbit) starting at residue 21 for 271 residues. This polypeptide was used to locate the common catalytic core in the other 43 phosphatase sequences. The alignment procedure was then repeated on these core sequences. Pairwise standard deviation (S.D.) scores for the core sequences obtained using the Needleman and Wunsch [24] algorithm were all above 25.0, indicating that automatic multiple alignment of the core regions would yield an accurate alignment within common secondary structural regions [26]. After multiple alignment, small numbers of residues were trimmed from the N and C terminii of some of the initially identified core sequences. The resulting multiple alignment is shown in Figure 1.

A scan of the NBRF-PIR version 33 databank was performed using the alignment shown in Figure 1, and an adaptation of the Smith Waterman [27] local similarity algorithm [28] (Program SCANPS - GJB unpublished). This scan identified a possible match with E. coli diadenosine tetra-phosphatase. The bacteriophage phosphatase sequences and diadenosine tetra-phosphatase sequences were then aligned with a subset of the eukaryotic phosphatase sequences as shown in Figure 2.

gjb@bioch.ox.ac.uk