next up previous contents
Next: About this document ... Up: No Title Previous: Acknowledgements

Bibliography

1
A. Sali.
Modelling mutations and homologous proteins.
Current Opinion in Biotechnology, 6:437-451, 1995.

2
S. E. Brenner, C. Chothia, and T. J. P. Hubbard.
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.
Proc. Nat. Acad. Sci., 95:6073-6078, 1998.

3
W. R. Taylor.
Identification of protein sequence homology by consensus template alignment.
J. Mol. Biol., 188:233-258, 1986.

4
M. Gribskov, A. D. McLachlan, and D. Eisenberg.
Profile analysis: Detection of distantly related proteins.
Proc. Nat. Acad. Sci., 84:4355-4358, 1987.

5
G. J. Barton.
Protein multiple sequence alignment and flexible pattern matching.
Meth. Enz., 183:403-428, 1990.

6
G.J. Barton and M.J.E. Sternberg.
A strategy for the rapid multiple alignment of protein sequences: Confidence levels from tertiary structure comparisons.
J. Mol. Biol., 198:327-337, 1987.

7
S. R. Eddy.
Hidden markov models.
Current Opinion Structural Biol., 6:361-365, 1996.

8
K. Karplus, K. Sjolander, C. Barrett, M. Cline, D. Haussler, R. Hughey, L. Holm, and C. Sander.
Predicting protein structure using hidden markov models.
Proteins, Suppl. 1:134-139, 1997.

9
J. Park, S. A. Teichmann, T. Hubbard, and C. Chothia.
Intermediate sequences increase the detection of distant sequence homologues.
J. Mol. Biol., 273:349-354, 1997.

10
D. T. Jones, W. R Taylor, and J. M. Thornton.
A new approach to protein fold recognition.
Nature, 358:86-89, 1992.

11
C. Lemer, M. J. Rooman, and S. J. Wodak.
Protein structure prediction by threading methods: Evaluation of current techniques.
Proteins, 23:337-355, 1996.

12
R. B. Russell and G. J. Barton.
An SH2-SH3 domain hybrid.
Nature, 364:765, 1993.

13
R. B. Russell, R. R. Copley, and G. J. Barton.
Protein fold recognition by mapping predicted secondary structures.
J. Mol. Biol., 259:349-365, 1996.

14
B. Rost.
TOPITS: Threading one-dimensional preditions into three-dimensional structures.
Proc. 3rd. Int. Conf. Intel. Sys. Mol. Biol., pages 314-321, 1995.

15
B. Rost.
Protein fold recognition by prediction-based threading.
J. Mol. Biol., 270:1-10, 1997.

16
V. I. Lim.
Algorithms for prediction of $\alpha$ helices and $\beta$ structural regions in globular proteins.
J. Mol. Biol., 88:873-894, 1974.

17
P. Y. Chou and G. D. Fasman.
Conformational parameters for amino acids in helical, $\beta$-sheet, and random coil regions calculated from proteins.
Biochem., 13:211-222, 1974.

18
J. Garnier, D. J. Osguthorpe, and B. Robson.
Analysis and implications of simple methods for predicting the secondary structure of globular proteins.
J. Mol. Biol., 120:97-120, 1978.

19
G. E. Schulz and R. H. Schirmer.
Principles of Proteins Strcuture.
Springer-Verlag, New York, 1979.

20
W. Kabsch and C. Sander.
How good are predictions of protein secondary structure?
FEBS Letters, 155:179-182, 1983.

21
C. D. Livingstone and G. J. Barton.
Identification of functional residues and secondary structure from protein multiple sequence alignment.
Meth. Enz., 266:497-512, 1996.

22
I. P. Crawford, T. Niermann, and K Kirchner.
Prediction of secondary structure by evolutionary comparison: Application to the alpha subunit of tryptophan synthase.
Proteins, 2:118-129, 1987.

23
G. J. Barton, R. H. Newman, P. F. Freemont, and M. J. Crumpton.
Amino acid sequence analysis of the annexin super-gene family of proteins.
European J. Biochem., 198:749-760, 1991.

24
R. B. Russell, J. Breed, and G. J. Barton.
Conservation analysis and structure prediction of the SH2 family of phosphotyrosine binding domains.
FEBS Letters, 304:15-20, 1992.

25
S.A. Benner and D. Gerloff.
Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: A prediction of the structure of the catalytic domain of protein kinases.
Adv. Enz. Reg., 31:121-181, 1990.

26
M. J. J. M. Zvelebil, G. J. Barton, W. R. Taylor, and M. J. E. Sternberg.
Prediction of protein secondary structure and active sites using the alignment of homologous sequences.
J. Mol. Biol., 195:957-961, 1987.

27
B. Rost and C. Sander.
Prediction of protein secondary structure at better than 70% accuracy.
J. Mol. Biol., 232:584-599, 1993.

28
D. Frishman and P. Argos.
Seventy-five percent accuracy in protein secondary structure prediction.
Proteins, 27:329-335, 1997.

29
R. D. King and M. J. E. Sternberg.
Identification and application of the concepts important for accurate and reliable protein secondary structure prediction.
Prot. Sci., 5:2298-2310, 1996.

30
A. A. Salamov and V. V. Solovyev.
Prediction of protein secondary structure by combining nearest- neighbor algorithms and multiple sequence alignments.
J. Mol. Biol., 247:11-15, 1995.

31
Proteins, Suppl. 1:1-230, 1997.

32
B. Rost.
Better 1D predictions by experts with machines.
Proteins, Suppl. 1:192-197, 1997.

33
V. Biou, J. F. Gilbrat, B. Robson, and J. Garnier.
Secondary structure prediction: combination of three different methods.
Prot. Eng., 2:185-191, 1995.

34
X. Zhang and D. Mesriov, J.and Waltz.
A hybrid system for protein secondary structure prediction.
J. Mol. Biol., 225:1049-1063, 1992.

35
K. Nishikawa and T. Ooi.
Amino acid sequence homology applied to the prediction of protein secondary structures, and joint prediction with existing methods.
Biochem.. Biophys. Acta, 871:45-54, 1986.

36
K. Nishikawa and T. Nogughi.
Predicting protein secondary structure based on amino acid sequence.
Meth. Enz., 202:31-44, 1995.

37
C. Geourjon and G. Deleage.
Sopma : Significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments.
Comp. App. Biosci., 11:681-684, 1995.

38
W. Kabsch and C. Sander.
A dictionary of protein secondary structure.
Biopolymers, 22:2577-2637, 1983.

39
F. M. Richards and C. E. Kundrot.
Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure.
Proteins, 3:71-84, 1988.

40
D. Frishman and P. Argos.
Knowledge-based protein secondary structure assignment.
Proteins, 23:566-579, 1995.

41
P. E. Boscott, G. J. Barton, and W. G Richards.
Secondary structure prediction for modelling by homology.
Prot. Eng., 6:261-266, 1993.

42
C. Sander and R. Schneider.
Database of homology-derived protein structures and the structural meaning of sequence alignment.
Proteins, 9:56-68, 1991.

43
D. F. Feng, M. S. Johnson, and R. F Doolittle.
Aligning amino acid sequences: comparison of commonly used methods.
J. Mol. Evol., 21:112-125, 1985.

44
S. B. Needleman and C. D. Wunsch.
A general method applicable to the search for similarities in the amino acid sequence of two proteins.
J. Mol. Biol., 48:443-453, 1970.

45
A. Siddiqui and G. J. Barton.
3Dee -- database of protein domain definitions.
submitted., 1998.

46
G. J. Barton and M. J. Sternberg.
Evaluation and improvements in the automatic alignment of protein sequences.
Protein Eng., 1:89-94, 1987.

47
A. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia.
Scop: A structural classification of proteins database and the investigation of sequences and structures.
J. Mol. Biol., 247:536-540, 1995.

48
M. Newman, C. Frazao, G. Khan, I. J. Tickle, T. L. Blundell, M. Safro, N. Andreeva, and A. Zdanov.
X-ray analyses of Aspartic Proteinases. structure and refinement at 2.2 Angstroms resolution of Bovine Chymosin.
J. Mol. Biol., 221:1295, 1991.

49
A. Sali, B. Veerapandian, J. B. Cooper, S. I. Foundling, D. J. Hoover, and T. L. Blundell.
High resolution x-ray diffraction study of the complex between endothiapepsin and an oligopeptide inhibitor. the analysis of the inhibitor binding and description of the ridgid body shift in the enzyme.
EMBO J., 8:2179, 1989.

50
Y.Satow, G.H.Cohen, E.A.Padlan, and D.R.Davies.
Phosphocholine binding Immunoglobulin study at 2.7 angstroms.
J. Mol. Biol., 190:593, 1987.

51
M.Bolognesi, G.Gatti, E.Menegatti, M.Guarneri, M.Marquart, E.Papamokos, and R.Huber.
Three dimensional structure of the complex between pancreatic secretory inhibitor (kazal type) and trypsinogen at 1.8 angstroms resolution.
J. Mol. Biol., 162:839, 1982.

52
R.B.Honzatko, W.A.Hendrickson, and W.E.Love.
Refinement of a molecular model for Lamprey Hemoglobin from Perromyzon Marinus.
J. Mol. Biol., 184:147, 1985.

53
T.P.J.Garrett, J.M.Guss, and H.C.Freeman.
The crystal structure of Poplar Apoplastocyanin at 1.8 Angstroms resolution.
J. Biol. Chem., 259:2822, 1984.

54
J.L.Smith, P.W.R.Corfields, W.A.Hendrickson, and B.W.Low.
Refinement at 1.4 Angstroms resolution of a model of Erabutoxin B. treatment of ordered olvent and discrete order.
Acta Cryst., 44:357, 1988.

55
V.D.Kumar, L.Lee, and B.F.P.Edwards.
Refined crystal structure of Calcium liganded Carp Paravalbumin 4.25 at 1.5 Angstroms resolution.
Biochem., 29:1404, 1990.

56
P.M.D.Fitzgerald, B.M.Mc Keever, J.F.Van Middlesworth, and J.P.Springer.
Crystallographic analysis of a complex between Human Immunodeficiency Virus Type 1 Protease and Acetyl Pepstatin at 2.0 Angstroms resolution.
J. Biol. Chem., 265:14209, 1990.

57
F.A.Quiocho, D.K.Wilson, and N.K.Vyas.
Substrate specificity and affinity of a protein modulated by bound water molecules.
Nature, 340:404, 1989.

58
D. Frishman and P. Argos.
Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence.
Prot. Eng., 9:133-142, 1996.

59
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman.
Basic local alignment search tool.
J. Mol. Biol., 215:403-410, 1990.

60
A. J. Bleasby, D. Akrigg, and T. K. Attwood.
OWL -- A non-redundant, composite protein sequence database.
Nuc. Ac. Res., 22:3574-3577, 1994.

61
T. F. Smith and M. S. Waterman.
Identification of common molecular subsequences.
J. Mol. Biol., 147:195-197, 1981.

62
G. J. Barton.
Alscript: A tool to format multiple sequence alignments.
Prot. Eng., 6:37-40, 1993.

63
J. D. Thompson, D. G. Higgins, and T. J. Gibson.
CLUSTAL W: improving the sesitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weigh matrix choice.
Nuc. Ac. Res., 22:4673-4680, 1994.

64
B. R. Rost, C. Sander, and R. Schneider.
Redefining the goals of protein secondary structure prediction.
J. Mol. Biol., 235:13-26, 1994.

65
S.M. Weiss and C.A. Kulikowski.
San Mateo, 1991.

66
J. U. Bowie, R. Luthy, and D. Eisnenberg.
A method to identify protein sequences that fold into a known three-dimensional structure.
Science, 253:164-170, 1991.

67
X. Huang and W. A. Miller.
Adv. Appl. Math., 12:337-357, 1991.

68
W. R. Taylor.
Classification of amino acid conservation.
J. Theor. Biol., 119:205-218, 1986.

69
G. D. Rose.
Prediction of chain turns in globular proteins on a hydrophobic basis.
Nature, 272:586-591, 1978.

70
A. C. M. Wilmot and J. M. Thornton.
Analysis and prediction of the different types of beta-turn in proteins.
J. Mol. Biol., 203:221-232, 1988.

71
H. Sklenar, C. Etchebest, and R. Lavery.
Proteins, 6:46-60, 1989.

72
J. M. Levin.
Exploring the limits of nearest neighbour secondary structure prediction.
Prot. Eng., 10:771-776, 1997.

73
C. Geourjon and G. Deleage.
SOPM : a self optimised method for protein secondary structure prediction.
Prot. Eng., 7:157-164, 1994.

74
B. Robson J. Garnier, J. Gibrat.
GOR method for predicting protein secondary structure from amino acid sequence.
Meth. Enz., 266:540-553, 1996.

75
W. Steigemann and E. Webber.
Structure of Erythrocruorin in different ligand states refined at 1.4 Angstroms resolution.
J. Mol. Biol., 127:309, 1979.

76
E.T.Adman, L.C.Sieker, and L.H.Jensen.
Structural features of Azurin at 2.7 Angstroms.
Isr. J. Chem., 21:8, 1981.

77
A. Wlodawer, M.Miller, and M.Jaskolski.
Crystal structure of a retroviral protease proves relationship to aspartic protease family.
Nature, 337:576, 1989.

78
K.Petratos, Z.Dauter, and K.S.Wilson.
Refinement of the structure of Pseudoazurin from Alcaligenes Faecalis S-6 at 1.55 Angstroms.
Acta Cryst., 44:628, 1988.

79
W.E.Royer (Jr.).
High resolution crysallographic analysis of a cooperative dimeric Haemoglbin.
J. Mol. Biol., 657:657, 1994.

80
B.Rees, A.Bilwes, J.P.Samama, and D.Moras.
Cardiotoxin from Naja Mossanmica: The refined crystal structure.
J. Mol. Biol., 214:281, 1990.

81
Y.S.Babu, C.E.Bugg, and W.J.Cook.
Structure of Calmodulin refined at 2.2 Angstroms resolution.
J. Mol. Biol., 204:191, 1988.

82
N.K.Vyas, M.N.Vyas, and F.A.Quiocho.
Sugar and signal transducer binding sites of the escherichia coli galactose chemoreceptor protein.
Science, 242:1290, 1988.

83
E.Weber, E.Papamokos, W.Bode, R.Huber, I.Kato, and M. Laskowski.
Ovomucoid, a Kazal-type inhibitor, and model building studies of complexes with serine proteases.
J. Mol. Biol., 158:515, 1982.

84
T.O.Fischmann and R.J.Poljak.
Crystallographic refinement of the three-dimensional structure of FAB D1.2 Lysozyme complex at 2.5 Angstroms.
J. Biol. Chem., 266:12915, 1991.


 
Table 1: Pairs in the RS126 set that have an SD score of greater than 5. Alignments were generated by the AMPS package[6] a blosum62 matrix, and gap penalty of 10, with 100 randomisations. Fold definitions come from the current release (1.37) of the SCOP database [47]

(1)

(2) SD score Fold (1) Fold(2)
1eca[75] 2lhb[52] 5.12 Globin-like Globin-like
1azu[76] 2pcy[53] 5.40 Cupredoxins Cupredoxins
2rspa[77] 5hvpa[56] 5.81 Acid proteases Acid proteases
1paz[78] 2pcy[53] 7.22 Cupredoxins Cupredoxins
2lhb[52] 4sdha[79] 7.70 Globin-like Globin-like
1cdta[80] 3ebx[54] 8.26 Snake toxin-like Snake toxin-like
3cln[81] 4cpv[55] 8.27 EF Hand-like EF Hand-like
2gbp[82] 8abp[57] 8.86 Periplasmic binding Periplasmic binding
1ovoa[83] 1tgsi[51] 9.45 Ovomucoid/PCI-1 like Ovomucoid/PCI-1 like
1fdlh[84] 1mcpl[50] 12.66 Immunoglobulin Immunoglobulin
4cms[48] 5er2e [49] 15.98 Acid proteases Acid proteases

 



 
Table 2: Percentages of secondary structural state per secondary structure definition method
  DSSP[38] STRIDE[40] DEFINE[39]
Helix 28.9 29.8 30.2
Sheet 22.9 24.4 30.0
Coil 48.1 45.8 39.7
 


 
Table 3: Ranges of length for secondary structural elements as defined by DSSP[38] STRIDE[40] and DEFINE[39] for the RS126 set
State Method Min Mean Max Total number of secondary structures
  DSSP[38] 3 9 54 817
Helix STRIDE 2 10 51 753
  DEFINE[39] 5 14 65 553
  DSSP 1 4 19 1302
Strand STRIDE[40] 1 4 19 1303
  DEFINE 4 6 26 1030
 


 
Table 4: Summary statistics of the alignments used in the predictions
  Ave. % ID. Ave. sequence Ave. No of sequences
  between sequences length per alignment
CB396 set 34 157 residues 18
RS126 set 31 185 residues 30
 


 
Table 5: Data for class types used for the predictions
Class definition RS126 set No. (%) CB396 set No. (%)
Alpha and beta (a/b) 25 (20) 107 (27)
Alpha and beta (a+b) 17 (13) 101 (26)
All alpha 27 (21) 68 (17)
All beta 38 (20) 78 (20)
Multi-domain 3 ( 2) 0 ( 0)
Small proteins 18 (14) 27 ( 7)
Membrane 1 ($\le$1) 3 ($\le$1)
Peptides 1 ($\le$1) 12 ( 3)
 


 
Table 6: Q3 and segment overlap results for the set of RS126, and CB396 proteins

Method

RS126 Protein set   CB396 protein set  
  Q3 SOV Q3 SOV
PHD[64] 73.5 73.5 71.9 75.3
DSC[29] 71.1 71.6 68.4 72.0
PREDATOR[28] 70.3 69.9 68.6 69.8
NNSSP[30] 72.7 70.6 71.4 71.3
CONSENSUS 74.8 74.5 72.9 75.4
 


 
Table 7: Family size for the automatically generated alignments for the RS126 protein set, considering 2 levels of BLAST[59] p-value cutoff
  p-Value cutoff 10-10 p-Value cutoff 10-2

Total Number of Residues

1716356 2013632
Total Number of Sequences 7013 8974
Average Number of Sequences per Family 55.6 71.2
 


 
Table 8: Comparison of the of Q3 accuracy for a decrease in the BLAST[59] P-value cut-off from 10-10 to 10-2 with the RS126 set. (The alignments used for these predictions did not use a percentage identity filter)
  p-value cutoff 10-10 p-value cutoff 10-2
Method DSSP STRIDE DSSP STRIDE
PHD[64] 72.4 72.4 73.2 73.2
DSC[29] 70.2 70.0 71.0 70.7
PREDATOR[58] 69.7 69.3 70.7 70.3
NNSSP [30] 71.8 71.2 72.4 71.7
MULPRED 66.7 65.4 67.2 66.8
ZPRED [26] 65.5 64.7 66.7 65.9
CONSENSUS 73.9 73.7 74.5 74.3
 


 
Table 9: Effect on Q3 accuracy by removing all sequences similar to the query at different % identity thresholds, using the RS126 protein set
  100% 95% 80% 75% 60%
PHD[64] 73.2 73.3 73.4 73.5 73.3
DSC[29] 71.0 71.0 71.0 71.1 70.9
PREDATOR[28] 70.7 70.4 70.1 70.3 70.4
NNSSP[30] 72.4 72.5 72.7 72.7 72.8
CONSENSUS 74.5 74.5 74.6 74.8 74.7
Total No. of Sequences 8974 5907 4320 3833 2681
Ave No. of Sequences/Alignment 71 47 34 30 20
 


 
Table 10: Mean percentages of secondary structure state defined by DSSP[38] when different 8 to 3 state reduction methods are used
  Mean % of Helix Mean % of Sheet Mean % of Coil
Method A 28.9 22.9 48.1
Method B 25.3 21.2 52.6
Method C 25.6 21.2 52.3
 


 
Table 11: Changing 8 to 3 state reduction, for DSSP and resultant Q3 accuracy for the consensus method, based on the RS126 set of proteins
Change Q3
Reduction method A[64] 74.8
B $\rightarrow$ Coil only 75.7
G $\rightarrow$ Coil only 76.6
B and G $\rightarrow$ Coil 77.5
GGGHHHH $\rightarrow$ HHHHHHH, B and G $\rightarrow$ Coil (Method C)[30] 77.5
B, G and HHHH EE $\rightarrow$ Coil (Method B)[58] 77.9
 


 
Table 12: Results for the RS126 protein set, by reducing the definition to 3 state by methods A and B
Method DSSP (A) STRIDE (A) DSSP (B) STRIDE (B)
PHD[64] 73.5 73.5 76.3 76.3
DSC[29] 71.1 70.9 73.3 73.4
PREDATOR[28] 70.3 69.6 75.2 74.0
NNSSP[30] 72.7 72.2 77.3 76.5
CONSENSUS 74.8 74.7 77.9 77.9
 


 
Table 13: Results for single sequence prediction methods via a full jack-knife test. The column 'Author' is the authors jack-knife value for the method with their dataset, and definition reduction method. All results are calculated using reduction method A, and also converting G and B states to coil. For PHD[64] the authors quote 71.6% as their cross-validated accuracy. However, G and B states were considered in the accuracy calculation for PHD[64]
Method RS126 CB396 Author
PHD[64] 76.3 74.2 -
SIMPA[72] 67.3 67.6 67.7
GOR IV[74] 53.3 64.6 64.4
SOPM[73] 66.8 64.7 69.0
 


  
Figure 1: Comparison of segment length distributions for each definition method
\begin{figure}\begin{center}
\leavevmode
\epsfxsize 400pt
\par\epsfbox{ms98188_fig.ps}
\end{center}\end{figure}


next up previous contents
Next: About this document ... Up: No Title Previous: Acknowledgements
james@ebi.ac.uk