next up previous
Next: Methods Up: No Title Previous: No Title

Introduction

Methods for predicting protein secondary structure provide information that is useful both in ab-initio structure prediction and as additional restraints for fold recognition algorithms [1,2,3,4,5]. Secondary structure predictions may also be used to guide the design of site directed mutagenesis studies, and to locate potential functionally important residues [6]. However, for these applications, it is essential that the predictions are accurate, or at the very least, that reliability information can be obtained for each residue's predicted secondary structure state. Many approaches have been devised for predicting the secondary structure from the protein sequence alone. Different core algorithms or heuristics have been applied. Simple linear statistics, [7,8,9,10,11], physicochemical properties [12], linear discrimination [13], machine learning [14,15], neural networks [16,17,18,19,20,21,22], k-way nearest neighbours [23,24,25,26,27,28], evolutionary trees [29,30], simple residue substitution matrices [31] and combinations of different methods with consensus approaches [32,33,34,35,36]. The most successful methods for protein secondary structure prediction exploit the evolutionary information that is available from protein families [17,25,23,13,6].

In our previous study of algorithms that use multiple sequences as the basis for prediction, neural network prediction methods were found to be the most accurate [34]. However, a detailed comparison of methods was made difficult due to each algorithm having different training sets [34]. In the recent CASP (Critical Assessment of Structure Prediction) experiments [37,38] neural network methods also generated the most accurate predictions. Although the sample size was small, the best performance in CASPIII was from a new neural network prediction method, PSIPRED [39]. PSIPRED exploited the ability of PSIBLAST [40] to build alignment profiles that include sequences with more remote similarities than can be found by conventional pairwise sequence searching methods [41].

In this paper, we systematically investigate the effect of presenting alternative representations of the aligned sequences to a new two-level neural network algorithm similar to that applied in PHD [17]. Since combining different prediction methods can improve the average accuracy of prediction [34,28,13] we also investigate the effect on accuracy of different consensus methods.


next up previous
Next: Methods Up: No Title Previous: No Title
James Cuff
2001-06-29