next up previous contents
Next: Contents


NOTE: This is a preprint - for the final copy please refer t o the final published article

Cuff J. A. and Barton G. J. Evaluation and improvement of multiple
sequence methods for protein secondary structure prediction,
PROTEINS: Structure, Function and Genetics. 34:508-519 (1999)


Evaluation and improvement of multiple sequence methods for protein secondary structure prediction

James A. Cuff and Geoffrey J. Barton\dagLaboratory of Molecular Biophysics,
Rex Richards Building,
South Parks Road,
Oxford, OX1 3QU, UK
and
European Molecular Biology Laboratory Outstation
The European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SD, UK

Keywords: protein; secondary structure prediction; combination of methods; benchmarks

\dagCorresponding Author: G. J. Barton, EMBL-European Bioinformatics Institute,
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Abstract:

A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396 protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8 to 3 state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross-validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/



 
next up previous contents
Next: Contents
james@ebi.ac.uk