Bioinformatics - Applications Note
James A. Cuff,
Asim S. Siddiqui, Matt Finlay
and Geoffrey J. Barton
Laboratory of Molecular Biophysics, Rex Richards Building, South Parks Road, Oxford, OX1 3QU, UK
European Molecular Biology Laboratory Outstation - The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Corresponding Author: G. J. Barton, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Keywords: protein; secondary structure prediction; combination of methods; www server
Summary: An interactive protein secondary structure prediction Internet server is presented. The server allows a single sequence or multiple alignment to be submitted, and returns predictions from six secondary structure prediction algorithms that exploit evolutionary information from multiple sequences. A consensus prediction is also returned which improves the average Q3 accuracy of prediction by 1% to 72.9%. The server simplifies the use of current prediction algorithms and allows conservation patterns important to structure and function to be identified.
When predicting the secondary structure of a protein 'blind', without knowledge of the answer, it is useful to exploit the features of all available prediction algorithms rather than rely on one. Combination of methods has been applied successfully in a number of accurate predictions of protein secondary structure (e.g. see [Edwards & Perkins, 1996,Crawford et al., 1987,Russell et al., 1992,Livingstone & Barton, 1994,Livingstone & Barton, 1996,Russell & Barton, 1993]). Unfortunately, combining prediction methods on a large scale is complicated by the fact that prediction programs have very different input requirements and output formats. In order to perform a recent large-scale comparative analysis of secondary structure prediction algorithms [Cuff & Barton, 1998], we developed flexible software to standardise the input and output requirements of 6 prediction algorithms. In this Applications Note we describe a development of this work to provide the fully automatic JPred WWW server for multiple secondary structure prediction.
The server accepts two input types, a family of aligned protein sequences or a single protein sequence. If a single sequence is submitted, an automatic process creates a multiple sequence alignment, prior to prediction [Cuff & Barton, 1998]. Six different prediction methods (DSC [King & Sternberg, 1996], PHD [Rost & Sander, 1993], NNSSP [Salamov & Solovyev, 1995], PREDATOR [Frishman & Argos, 1997], ZPRED [Zvelebil et al., 1987] and MULPRED (Barton, 1988, unpublished) are then run, and the results from each method are combined into a simple file format.
The NNSSP, DSC, PREDATOR, MULPRED, ZPRED and PHD methods were chosen as representatives of current state of the art secondary structure prediction methods, that exploit the evolutionary information from multiple sequences. Each derives its prediction using a different heuristic, based upon nearest neighbours (NNSSP), jury decision neural networks (PHD), linear discrimination (DSC), consensus single sequence method combination (MULPRED), hydrogen bonding propensities (PREDATOR), or conservation number weighted prediction (ZPRED).
The predictions and corresponding sequence alignment are rendered in coloured HTML, Java [Clamp et al., 1998] and Postscript. The predictions are coloured and aligned with their corresponding family of sequences. Physico-chemical properties, solvent accessibility, prediction reliability and conservation number values [Zvelebil et al., 1987] for each amino acid are included in the output. The original ASCII text data from each of the prediction methods can also be downloaded. For example, BLAST results, MSF and HSSP format alignments, pair comparison files and so on.
The text based MULPRED output is of particular interest as it is a combination of different single sequence prediction methods (GOR [Garnier et al., 1978], Chou-Fasman (1974), Lim (1974), Rose (1978), and Wilmot & Thornton (1988)). While the automatic consensus within the MULPRED program is not as accurate as current methods [Cuff & Barton, 1998], the profile based output, when combined with the modern algorithms is helpful for human interpretation of the prediction.
A consensus prediction based upon a simple majority wins combination of NNSSP, DSC, PREDATOR and PHD is provided by the JPred server. If there is a tie, the prediction from PHD is used. In our independent test this approach gave the highest accuracy compared to all other combinations [Cuff & Barton, 1998].
The consensus prediction achieved an average Q3 score of 72.9%, where Q3 is the percentage of residues predicted correctly for the three conformational states, strand helix and loop. This result is 1% better than PHD (71.9%) for the same data. The segment overlap score [Rost et al., 1994] for the consensus method improves by 0.1 %, to 75.4%. These results were obtained on a non-redundant set of 396 protein domain sequences, that did not contain sequences similar to the proteins used to train the methods. Cuff & Barton 1998, includes a full analysis and description of the data sets, similarity cutoffs, accuracies and methods used for this test.
As well as providing a more accurate consensus prediction, the JPred server also permits the different prediction methods to be viewed concurrently with the alignment. This allows for easy interpretation and analysis of the prediction and multiple sequence alignment. Interactive analysis and re-alignment can also be carried out with the Java viewer and editor [Clamp et al., 1998], where one may interactively change the colouring within the alignment to highlight important residues and conserved features.
In summary, JPred provides an automatic and simple to use tool to assist in accurate secondary structure prediction.
We thank Drs B. Rost, D. Frishman, V. Solovyev, R. King and M. Zvelebil, for permitting us to include their software in the JPred server and for helpful discussions. This work was supported in part by grants from the Medical Research Council and the Royal Society. JC is an Oxford Centre for Molecular Sciences/MRC student.
Next: Bibliography firstname.lastname@example.org