next up previous
Next: About this document ... Up: No Title Previous: Acknowledgements

Bibliography

1
D. T. Jones, W. R. Taylor, and J. M. Thornton.
A new approach to protein fold recognition.
Nature, 358:86-89, 1992.

2
B. Rost.
Protein fold recognition by prediction-based threading.
J. Mol. Biol., 270:1-10, 1997.

3
R. B. Russell, R. R. Copley, and G. J. Barton.
Protein fold recognition by mapping predicted secondary structures.
J. Mol. Biol., 259:349-365, 1996.

4
R. B. Russell, M. A. S. Saqi, R. A. Sayle, Bates P. A., and M. J. E. Sternberg.
Recognition of analogous and homologous protein folds: Analysis of sequence and structure conservation.
J. Mol. Biol., 269:423-439, 1997.

5
D. Fischer and D. Eisenberg.
Protein fold recognition using sequence-derived potentials.
Prot. Sci., 5:947-955, 1996.

6
M. J. J. M. Zvelebil, G. J. Barton, W. R. Taylor, and M. J. E. Sternberg.
Prediction of protein secondary structure and active sites using the alignment of homologous sequences.
J. Mol. Biol., 195:957-961, 1987.

7
J. Garnier, D. J. Osguthorpe, and B. Robson.
Analysis and implications of simple methods for predicting the secondary structure of globular proteins.
J. Mol. Biol., 120:97-120, 1978.

8
P. Y. Chou and G. D. Fasman.
Conformational parameters for amino acids in helical, $\beta$-sheet, and random coil regions calculated from proteins.
Biochem., 13:211-222, 1974.

9
J. Garnier, J. Gibrat, and B. Robson.
GOR method for predicting protein secondary structure from amino acid sequence.
Meth. Enz., 266:540-553, 1996.

10
J. F. Gibrat, J. Garnier, and B. Robson.
Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs.
J. Mol. Biol., 198:425-443, 1987.

11
K. Nishikawa and T. Nogughi.
Predicting protein secondary structure based on amino acid sequence.
Meth. Enz., 202:31-44, 1995.

12
V. I. Lim.
Algorithms for prediction of $\alpha$ helices and $\beta$ structural regions in globular proteins.
J. Mol. Biol., 88:873-894, 1974.

13
R. D. King and M. J. E. Sternberg.
Identification and application of the concepts important for accurate and reliable protein secondary structure prediction.
Prot. Sci., 5:2298-2310, 1996.

14
R. King and M. J. E. Sternberg.
Machine learning approach for the prediction of protein secondary structure.
J. Mol. Biol., 216:441-457, 1990.

15
S. Muggleton, R. King, and M. J. E. Sternberg.
Protein secondary structure prediction using logic-based machine learning.
Prot. Eng., 5:647-657, 1992.

16
N. Qian and T. J. Sejnowski.
Predicting the secondary structure of globular proteins using neural network models.
J. Mol. Biol., 202:865-884, 1988.

17
B. Rost and C. Sander.
Prediction of protein secondary structure at better than 70% accuracy.
J. Mol. Biol., 232:584-599, 1993.

18
H. L. Holley and M. Karplus.
Protein secondary structure prediction with a neural network.
Proc. Nat. Acad. Sci., 86:152-156, 1989.

19
D. G. Kneller, F. E. Cohen, and R. Langridge.
Improvements in protein secondary structure prediction by an enhanced neural network.
J. Mol. Biol., 1:171-182, 1990.

20
S. K. Riis and A. Krogh.
Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments.
J. Comput. Biol., 1:163-183, 1996.

21
J. M. Chandonia and M. Karplus.
The importance of larger data sets for protein secondary structure prediction with neural networks.
Protein Sci., 5:768-774, 1996.

22
J. M. Chandonia and M. Karplus.
New methods for accurate prediction of protein secondary structure.
Proteins, 35:293-306, 1999.

23
A. A. Salamov and V. V. Solovyev.
Prediction of protein secondary structure by combining nearest- neighbor algorithms and multiple sequence alignments.
J. Mol. Biol., 247:11-15, 1995.

24
L. Rychlewski and A. Godzik.
Secondary structure prediction using segment similarity.
Prot. Eng., 10:1143-1153, 1997.

25
D. Frishman and P. Argos.
Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence.
Prot. Eng., 9:133-142, 1996.

26
T. Yi and E. S. Lander.
Protein secondary structure prediction using nearest-neighbor methods.
J. Mol. Biol., 232:1117-1129, 1993.

27
J. M. Levin.
Exploring the limits of nearest neighbour secondary structure prediction.
Prot. Eng., 10:771-776, 1997.

28
C. Geourjon and G. Deleage.
Sopma : Significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments.
Comp. App. Biosci., 11:681-684, 1995.

29
N. Goldman, J. Thorne, and D. T. Jones.
Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses.
J. Mol. Biol., 263:196-208, 1996.

30
P. Lio, N. Goldman, and D. T. Jones.
PASSML: combining evolutionary inference and protein secondary structure prediction.
Bioinformatics, 8:726-733, 1998.

31
P. K. Mehta, J. Heringa, and P. Argos.
A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%.
Prot. Sci., 4:2517-2525, 1995.

32
K Zimmermann and J. F. Gibrat.
In unison: regularization of protein secondary structure predictions that makes use of multiple sequence alignments.
Prot. Eng., 10:861-865, 1998.

33
V. N. Viswanadhan, B. Denckla, and J. N. Weinstein.
New joint prediction algorithm (Q7-JASEP) improves the prediction of protein secondary structure.
Biochemistry, 46:11164-11172, 1991.

34
J. A. Cuff and G. J. Barton.
Evaluation and improvement of multiple sequence methods for protein secondary structure prediction.
Proteins: Structure, Function and Genetics, 34:508-519, 1999.

35
V. Biou, J. F. Gilbrat, B. Robson, and J. Garnier.
Secondary structure prediction: combination of three different methods.
Prot. Eng., 2:185-191, 1995.

36
Y. Guermeur, C. Geourjon, P. Gallinari, and G. Deleage.
Improved performance in protein secondary structure prediction by inhomogeneous score combination.
Bioinformatics, 5:413-421, 1999.

37
J. Moult, T. Hubbard, S. H. Bryant, K. Fidelis, and J. T. Pedersen.
Critical assesment of methods of protein structure prediction (CASP): Round II.
Proteins, Suppl. 1:1-230, 1997.

38
J. Moult, T. Hubbard, J. T. Pedersen, and K. Fidelis.
Third Meeting on the Critical Assesment of Techniques for Protein Structure Prediction: Asilomar Conference Centre, Dec ember 13-17.
http://predictioncenter.llnl.gov/casp3/Casp3.html, 1998.

39
D. T. Jones.
Prediction of protein secondary structure at 77% accuracy based on PSIBLAST derived sequence profiles.
Third Meeting on the Critical Assesment of Techniques for Protein Structure Prediction: Asilomar Conference Centre, December 13-17, 1998.

40
S. F. Altshul, T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, and D. J. Lipman.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nuc. Ac. Res., 25:3389-3402, 1997.

41
J. Park, K. Karplus, C. Barrett, R. Hughey, Haussler D., T. Hubbard, and Chothia C.
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.
J. Mol. Biol., 284:1201-1210, 1998.

42
G. J. Barton.
Protein multiple sequence alignment and flexible pattern matching.
Meth. Enz., 183:403-428, 1990.

43
A. Bairoch and R. Apweiler.
The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999.
Nucleic Acids Res., 27:49-54, 1998.

44
J. D. Thompson, D. G. Higgins, and T. J. Gibson.
CLUSTAL W: improving the sesitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice.
Nuc. Ac. Res., 22:4673-4680, 1994.

45
S. R. Eddy.
HMMer2.
http://hmmer.wustl.edu/, 1999.

46
A. Krogh, M. Brown, I. S. Mian, K. Sjolander, and D. Haussler.
Hidden markov models in computational biology.
J. Mol. Biol., 235:1501-1531, 1994.

47
R. Durbin, S. Eddy, Krogh A., and G. Mitchison.
Biological sequence analysis.
Cambridge University Press, 1998.

48
J. C. Wootton and S. Federhen.
Statistics of local complexity in amino acid sequences and sequence databases.
Comput. Chem, 17:149-163, 1993.

49
D. T. Jones, W. R. Taylor, and J. M. Thornton.
A model recognition approach to the prediction of all-helical membrane protein structure and topology.
Biochemistry, 33:3038-3049, 1994.

50
A. Zell, G. Mamier, M. Vogt, N. Mache, R. Hubner, S. Doring, K. U. Herrmann, T. Soyez, T. Schmalzl, T. Sommer, A. Hatzigeorgiou, D. Posselt, T. Schreiner, B. Kett, G. Clemente, and J. Wieland.
The SNNS users manual version 4.1.
http://www.informatik.uni-stuttgart.de/ipvr/bv/projekte/snns/UserManual/User Manual.html, 1995.

51
M. F. Moller.
A scaled conjugate gradient algorithm for fast supervised learning.
Neural Networks, 6:525-533, 1993.

52
W. Kabsch and C. Sander.
A dictionary of protein secondary structure.
Biopolymers, 22:2577-2637, 1983.

53
G. D. Rose and J. E. Dworkin.
The hydrophobicity profile.
In G. D. Fasman, editor, Prediction of protein structure and the principles of protein conformation, chapter 15, pages 625-634. Plenum press, 233 Spring street, New York, NY, 10013, 1989.

54
B. Rost, C. Sander, and R. Schneider.
Redefining the goals of protein secondary structure prediction.
J. Mol. Biol., 235:13-26, 1994.

55
J. A. Cuff, M. E. Clamp, A. S. Siddiqui, M. Finlay, and G. J. Barton.
Jpred: a consensus secondary structure prediction server.
Bioinformatics, 14:892-893, 1998.

56
A. Zemla, C. Vencolvas, K. Fidelis, and B. Rost.
A modified definition of SOV, a segment-based measure for protein secondary structure prediction assesment.
Proteins, 34:220-223, 1999.

57
F. C. Bernstein, T. F. Koetzle, G. J. B. Williams, E. F. Meyer Jr., M. D. Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi.
The protein data bank: A computer based archival file for macromolecur structures.
J. Mol. Biol., 112:535-542, 1977.

58
A. Wlodawer, R. Bott, and L. Sjolin.
The refined crystal structure of ribonuclease A at 2.0 A resolution.
J. Biol. Chem., 257:1325-1332, 1982.

59
R. McGill, J. W. Tukey, and W. A. Larsen.
Variations of box plots.
The American Statistician, 32:12-16, 1978.

60
A. Lupas, M. Van Dyke, and J. Stock.
Predicting coiled coils from protein sequences.
Science, 252:1162-1164, 1991.

61
E. Wolf, P. S. Kim, and B. Berger.
MultiCoil: a program for predicting two- and three-stranded coi led coils.
Prot. Sci., 6:1178-1189, 1997.


 
Table 1: Comparison of BLAST alignments against those generated from a PSIBLAST search of the SWALL [43] database, compared to those for a database filtered to remove coiled-coils, low complexity segments and transmembrane helices (see text). Figures are for Q3accuracies based on a 7 fold cross validated test on 480 proteins for the neural network prediction method.
Sequence searching method Q3 accuracy
CLUSTALW alignments for BLAST against SWALL un-filtered database 70.5 %
CLUSTALW alignments for PSIBLAST against a filtered database 71.6 %
 


 
Table 2: Comparison of profiles for training neural networks. Based on 480 proteins, cross-validated. AMPS [42] was run with trees generated using the normalised alignment score from the pairwise sequence comparisons. All alignments have gaps in the primary sequence and any data directly below that gap in the alignment removed. For the `Alignments with gaps' run, the alignments were left unmodified. This run was no more accurate, and took twice as long to train the networks for prediction.

Method used to generate alignment profile

Q3 accuracy
Simple frequency profiles, alignments from CLUSTALW 71.6%
Simple frequency profiles, alignments from AMPS 69.5%
Blosum62 profiles, alignments from CLUSTALW 70.6%
Alignments with gaps (frequency profiles scored from CLUSTALW) 70.5%

 



 
Table 3: All values are for 7 fold cross-validation on 480 proteins.

Network

Q3 accuracy
Frequency profile alignments from CLUSTALW 71.6%
BLOSUM62 scored profile alignments from CLUSTALW 70.8%
PSIBLAST alignment profiles 72.1%
Arithmetic sum based on the above 3 networks 73.4%
 


 
Table 4: Improving the Jnet method through the use of different scoring methods and alignment approaches. These figures were generated from cross-validated predictions of the 480 non-redundant test set proteins.
Matrix scoring and alignment method Q3 accuracy (%)
BLOSUM62 profile CLUSTALW 70.8
Frequency profile CLUSTALW 71.6
Frequency profile PSIBLAST 72.1
HMMER Profile CLUSTALW 74.4
HMMER Profile Iterative Alignment (see Figure 2) 74.3
PSSM PSIBLAST 75.2
Numerical average of HMMER and PSSM PSIBLAST 76.5
Jury/No Jury network (see Figures 3 and 1) 76.9
 


 
Table 5: Comparison of prediction methods. Tested on the 406 new protein structures not used in the development of the Jnet method (see Methods).
Prediction method Q3 accuracy (%)
Zpred [6] 62.0
DSC [13] 70.6
PREDATOR [25] 70.7
NNSSP [23] 72.3
PHD [17] 73.3
Jpred [34] 74.6
Jnet (this work) 76.4

 
 


 
Table 6: Average prediction accuracies from 7 fold cross-validation experiments (based on the 480 protein set) for 2-state solvent accessibility. `Rel. Acc.' corresponds to three thresholds of solvent accessibility, 25%, 5% and 0% accessibility. For the combined method, a simple arithmetic sum between the PSIBLAST and HMMER2 network outputs was applied

Rel. Acc. (%)

PSIBLAST (%) HMMER2 (%) Combined [change] (%)

25%

75.0 74.2 76.2 [+1.2]

5%

79.0 78.8 79.8 [+0.8]
0% 86.6 86.3 86.5 [-0.1]

     
 


 
Table 7: Improvement assessment between Jnet (this work) and PHD [17], broken down into helix, strand, coil and SOV accuracies. Predictions are for the 406 proteins not used to develop the Jnet method.
Measurement of Accuracy Jnet (%) PHD (%) Improvement (%)
Q3 76.4 73.3 +3.1
$\alpha$-helix accuracy 78.4 76.8 +1.6
$\beta$-strand accuracy 63.9 63.8 +0.1
coil accuracy 80.6 76.5 +4.1

Sov2 [56]

74.2 69.8 +4.4
SOV ($\delta$=0%) [54] 61.6 57.8 +3.8
SOV ($\delta$=50%) [54] 82.9 79.6 +3.3

     
 


  
Figure 1: Outline of the final neural network method incorporated into the Jnet method.
\begin{figure}\begin{center}
\leavevmode
\epsfxsize 360pt
\epsfbox{pics/network.ps}\end{center}\end{figure}


 \begin{landscape}% latex2html id marker 226
\begin{figure}[h]
\begin{center}
\le...
...ed proteins from a profile based search.}\end{center}\end{figure}\end{landscape}


  
Figure 3: Positions where there is `no jury' (predictions do not all agree) are marked with a (*). These positions are re-predicted with a further neural network (the jury network (see Figure 1). This network has only been trained with positions where there is `no jury'. Also shown is the relative solvent accessibility prediction at 25, 5 and 0% relative accessibility. `B' corresponds to buried residues, with `-' corresponding to exposed residues. Protein shown is Ribonuclease A (PDB [57] code, 7rsa [58]). DSSP [52] relative solvent accessibility and secondary structure definitions are shown. Cross-validated prediction accuracy is 72.5% for this protein. (For reference PHD [17] predicts this protein at 68.5% accuracy)
\begin{figure}
\begin{center}
\leavevmode
\begin{tex2html_preform}\begin{verbat...
...B-----BBB-BBB--\end{verbatim}\end{tex2html_preform}
\par\end{center}\end{figure}


  
Figure 4: Boxplots [59] of per protein average secondary structure prediction accuracy (Q3) for each of the secondary structure prediction methods. Predictions are for the 406 blind test set. Boxplots show the variability of the median (white line), the dark box shows the limits of the middle half of the data. The upper and lower brackets mark the upper and lower quartiles. In this case the extreme data (outliers) have been removed for clarity.
\begin{figure}
\begin{center}
\leavevmode
\epsfxsize 440pt
\epsfbox{pics/dist.ps}\end{center}\end{figure}


  
Figure 5: Average secondary structure prediction accuracy (Q3), and percentage of residues against cumulative reliability score from the Jnet method. For example, for residues with reliability scores of greater or equal to 9, the average accuracy is 92.9%, and the percentage of residues with this score is 20.8%. Predictions are for the 406 blind test set proteins.
\begin{figure}
\begin{center}
\leavevmode
\epsfxsize 420pt
\epsfbox{pics/reliability.ps}\end{center}\end{figure}


  
Figure 6: Jnet reliabilities compared to PHD reliabilities
\begin{figure}\begin{center}
\leavevmode
\epsfxsize 350pt
\epsfysize 370pt
\e...
... confidence of
greater or equal to the values shown on the x axis}
\end{figure}


  
Figure 7: Jnet residue coverage against reliability compared to PHD
\begin{figure}\begin{center}
\leavevmode
\epsfxsize 350pt
\epsfysize 370pt
\ep...
...a confidence of greater or equal to the
values shown on the x axis}
\end{figure}


  
Figure 8: Predictions from the improved Jpred2 server
\begin{figure}\begin{center}
\leavevmode
\epsfysize 520pt
\epsfxsize 400pt
\epsf...
...
predictions by COILS \cite{coils} and MultiCoil \cite{multicoil}}
\end{figure}


next up previous
Next: About this document ... Up: No Title Previous: Acknowledgements
James Cuff
2001-06-29