Protein phosphorylation is important in controlling a variety of biological processes: metabolism, cell differentiation and proliferation, gene expression, transport, locomotion and memory, [2][1]. The activities of proteins controlled by protein phosphorylation are extremely varied and include enzymes, membrane receptors, transport proteins, ion pumps and proteins mediating DNA replication, transcription and translation. In eukaryotes, reversible phosphorylation occurs predominantly on serine, threonine and tyrosine residues, (reviewed by [3]), although recently a histone H4 protein histidine kinase has been reported [4]. Signal transduction in prokaryotes is also mediated by phosphorylation on serine, threonine, histidine and aspartate residues [5], although no instances of tyrosine phosphorylation have been reported [6].
Protein kinases and phosphatases catalyse protein phosphorylation and dephosphorylation, respectively. In eukaryotes most of these enzymes show either specificity for serine and threonine, or tyrosine residues. However, a number of dual specificity protein kinases and phosphatases have recently been reported, (reviewed by [7]). All protein kinases are related in sequence and therefore belong to a single family, that has presumably arisen by gene duplication from a common ancestor.
Serine/threonine protein phosphatases are classified into 4 major classes
according to substrate specificity, metal ion dependence and sensitivity to
phosphatase inhibitors; PP1, PP2A, PP2B and PP2C, reviewed by [8][2].
PP1 and PP2A, purified from physiological sources, are active
independent of metal ions whereas PP2B is -calmodulin dependent and
PP2C is
stimulated.
The isolation and sequencing of cDNA clones of protein phosphatases from a number of tissues of various species has shed light on the structure and evolutionary relationships between these enzymes. Complementary DNA cloning [10][9] polymerase chain reactions [11] and mutant analysis [12] have allowed the isolation of novel forms of phosphatases belonging to the PP1/PP2A/PP2B family. The primary structures indicate that PP1, PP2A and PP2B share a common catalytic core of approximately 280 residues which has no relationship to PP2C and the more recently discovered protein tyrosine phosphatases, [10][13].
Techniques for the prediction of protein structure fall into two main
classes. Molecular modelling methods (e.g. see [14]) use
the known three dimensional structure of a homologue as a scaffold on
which to base the prediction. Such predictions can be very reliable
and may help explain differing substrate specificities and suggest the
regions of the protein that would best be modified by site-directed
mutagenesis (e.g. see [15]). In the absence of a structural
homologue, the accuracy of prediction has historically been very low.
Methods to predict the secondary structure (- helix,
- strand or
loop) from a single protein sequence give at best 64%accuracy (e.g.
see [16]), and few methods assign degrees of confidence to
each predicted region. However, significant improvements in the
prediction of protein secondary structure have recently been obtained
through the use of multiply aligned sequences
[21][20][19][18][17]. Multiply aligned
sequences can also allow the accurate identification of residues buried in
the protein core [22][20][19].
In this paper, the common cores of 44 eukaryotic, two bacteriophage and one
bacterial protein phosphatase are analysed by residue conservation and
a novel combination of secondary structure prediction methods. The use of multiple sequence data
allows a more accurate picture of the protein secondary structure to be
determined by accurately defining the positions of insertions and deletions
which almost invariably occur between secondary structural elements
[23], and by
identifying patterns of residue conservation consistent with - helix and
- strand. In addition, the multiple alignment characterises invariant
residues across the eukaryotes and between eukaryotes
and bacteriophage. Invariant residues are likely to be important in
catalysis through phosphate binding,
in forming a putative phosphoryl-enzyme intermediate and also in protein
folding. The characterization of invariant residues allows candidate
residues which perform these functions to be identified and, when coupled with
the secondary structure prediction, provides a
basis for designing site-directed mutagenesis experiments.