The prediction methods we have carefully selected for this work represent current state-of-the-art prediction methods, that use multiple sequences. However for completeness, the SIMPA, SOPM, and GORIV, single sequence methods were also examined. These methods do not have pre-calculated propensity tables, and as such we could perform a full jack-knife test with the new datasets. We only compare the results for the single sequence methods to those obtained for the PHD algorithm, as PHD was the only other method for which we were able to carry out cross validation. The SIMPA, SOPM and GORIV methods have quoted accuracies based on removing helices shorter than 4 residues and strands less than 2 residues. For testing these methods, we used method A as the reduction method and also converted G and B states to coil. If reduction method A alone is used, SIMPA, SOPM and GORIV reduce in accuracy from those shown in table 13 by 2-4%.
The difference between the single sequence methods we examined and PHD ranges between 23.3%, and 6.6% depending upon the method and database used. Table 13 shows the GORIV method to improve remarkably (11.3%) with an increased database size 126 proteins to 396 proteins). This is to be expected as GORIV no longer uses 'dummy frequencies' instead relying on a large database to calculate its propensity tables. To examine if this feature scaled, we also applied the GORIV method to the CB513 dataset. The accuracy improved by 1.1% from 64.6% to 65.7%. SOPM only achieved 66.8% on the RS126 protein set, and 64.6% for the CB396 set. The authors of the SOPM method quoted 69%. However, the database used in their study, was non-redundant at 50% sequence identity, and so included a number of clear homologues.