Forty four complete or nearly complete eukaryotic phosphatase
sequences were gathered from the literature, NBRF-PIR version 33
databank and unpublished results (see Figure 1 for references).
The sequences were first compared pairwise using the Needleman and
Wunsch algorithm [24], followed by automatic multiple alignment
[26][25]. The resulting alignment clearly showed the
common region of the phosphatases, but the alignment contained
inconsistencies in the long N- and C- terminal extensions that are
only seen in some members of the family. Accordingly, a central
segment was identified in PP1 - (rabbit) starting at residue 21 for
271 residues. This polypeptide was used to locate the common
catalytic core in the other 43 phosphatase sequences. The alignment
procedure was then repeated on these core sequences. Pairwise
standard deviation (S.D.) scores for the core sequences obtained using
the Needleman and Wunsch [24] algorithm were all above 25.0,
indicating that automatic multiple alignment of the core regions would
yield an accurate alignment within common secondary structural regions
[26]. After multiple alignment, small numbers of residues
were trimmed from the N and C terminii of some of the initially
identified core sequences. The resulting multiple alignment is shown
in Figure 1.
A scan of the NBRF-PIR version 33 databank was performed using the alignment shown in Figure 1, and an adaptation of the Smith Waterman [27] local similarity algorithm [28] (Program SCANPS - GJB unpublished). This scan identified a possible match with E. coli diadenosine tetra-phosphatase. The bacteriophage phosphatase sequences and diadenosine tetra-phosphatase sequences were then aligned with a subset of the eukaryotic phosphatase sequences as shown in Figure 2.