Bibliography

Next: About this document ... Up: No Title Previous: Acknowledgements

Bibliography

1: A. Sali.
Modelling mutations and homologous proteins.
Current Opinion in Biotechnology, 6:437-451, 1995.
2: S. E. Brenner, C. Chothia, and T. J. P. Hubbard.
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.
Proc. Nat. Acad. Sci., 95:6073-6078, 1998.
3: W. R. Taylor.
Identification of protein sequence homology by consensus template alignment.
J. Mol. Biol., 188:233-258, 1986.
4: M. Gribskov, A. D. McLachlan, and D. Eisenberg.
Profile analysis: Detection of distantly related proteins.
Proc. Nat. Acad. Sci., 84:4355-4358, 1987.
5: G. J. Barton.
Protein multiple sequence alignment and flexible pattern matching.
Meth. Enz., 183:403-428, 1990.
6: G.J. Barton and M.J.E. Sternberg.
A strategy for the rapid multiple alignment of protein sequences: Confidence levels from tertiary structure comparisons.
J. Mol. Biol., 198:327-337, 1987.
7: S. R. Eddy.
Hidden markov models.
Current Opinion Structural Biol., 6:361-365, 1996.
8: K. Karplus, K. Sjolander, C. Barrett, M. Cline, D. Haussler, R. Hughey, L. Holm, and C. Sander.
Predicting protein structure using hidden markov models.
Proteins, Suppl. 1:134-139, 1997.
9: J. Park, S. A. Teichmann, T. Hubbard, and C. Chothia.
Intermediate sequences increase the detection of distant sequence homologues.
J. Mol. Biol., 273:349-354, 1997.
10: D. T. Jones, W. R Taylor, and J. M. Thornton.
A new approach to protein fold recognition.
Nature, 358:86-89, 1992.
11: C. Lemer, M. J. Rooman, and S. J. Wodak.
Protein structure prediction by threading methods: Evaluation of current techniques.
Proteins, 23:337-355, 1996.
12: R. B. Russell and G. J. Barton.
An SH2-SH3 domain hybrid.
Nature, 364:765, 1993.
13: R. B. Russell, R. R. Copley, and G. J. Barton.
Protein fold recognition by mapping predicted secondary structures.
J. Mol. Biol., 259:349-365, 1996.
14: B. Rost.
TOPITS: Threading one-dimensional preditions into three-dimensional structures.
Proc. 3rd. Int. Conf. Intel. Sys. Mol. Biol., pages 314-321, 1995.
15: B. Rost.
Protein fold recognition by prediction-based threading.
J. Mol. Biol., 270:1-10, 1997.
16: V. I. Lim.
Algorithms for prediction of $\alpha$ helices and $\beta$ structural regions in globular proteins.
J. Mol. Biol., 88:873-894, 1974.
17: P. Y. Chou and G. D. Fasman.
Conformational parameters for amino acids in helical, $\beta$ -sheet, and random coil regions calculated from proteins.
Biochem., 13:211-222, 1974.
18: J. Garnier, D. J. Osguthorpe, and B. Robson.
Analysis and implications of simple methods for predicting the secondary structure of globular proteins.
J. Mol. Biol., 120:97-120, 1978.
19: G. E. Schulz and R. H. Schirmer.
Principles of Proteins Strcuture.
Springer-Verlag, New York, 1979.
20: W. Kabsch and C. Sander.
How good are predictions of protein secondary structure?
FEBS Letters, 155:179-182, 1983.
21: C. D. Livingstone and G. J. Barton.
Identification of functional residues and secondary structure from protein multiple sequence alignment.
Meth. Enz., 266:497-512, 1996.
22: I. P. Crawford, T. Niermann, and K Kirchner.
Prediction of secondary structure by evolutionary comparison: Application to the alpha subunit of tryptophan synthase.
Proteins, 2:118-129, 1987.
23: G. J. Barton, R. H. Newman, P. F. Freemont, and M. J. Crumpton.
Amino acid sequence analysis of the annexin super-gene family of proteins.
European J. Biochem., 198:749-760, 1991.
24: R. B. Russell, J. Breed, and G. J. Barton.
Conservation analysis and structure prediction of the SH2 family of phosphotyrosine binding domains.
FEBS Letters, 304:15-20, 1992.
25: S.A. Benner and D. Gerloff.
Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: A prediction of the structure of the catalytic domain of protein kinases.
Adv. Enz. Reg., 31:121-181, 1990.
26: M. J. J. M. Zvelebil, G. J. Barton, W. R. Taylor, and M. J. E. Sternberg.
Prediction of protein secondary structure and active sites using the alignment of homologous sequences.
J. Mol. Biol., 195:957-961, 1987.
27: B. Rost and C. Sander.
Prediction of protein secondary structure at better than 70% accuracy.
J. Mol. Biol., 232:584-599, 1993.
28: D. Frishman and P. Argos.
Seventy-five percent accuracy in protein secondary structure prediction.
Proteins, 27:329-335, 1997.
29: R. D. King and M. J. E. Sternberg.
Identification and application of the concepts important for accurate and reliable protein secondary structure prediction.
Prot. Sci., 5:2298-2310, 1996.
30: A. A. Salamov and V. V. Solovyev.
Prediction of protein secondary structure by combining nearest- neighbor algorithms and multiple sequence alignments.
J. Mol. Biol., 247:11-15, 1995.
31: Proteins, Suppl. 1:1-230, 1997.
32: B. Rost.
Better 1D predictions by experts with machines.
Proteins, Suppl. 1:192-197, 1997.
33: V. Biou, J. F. Gilbrat, B. Robson, and J. Garnier.
Secondary structure prediction: combination of three different methods.
Prot. Eng., 2:185-191, 1995.
34: X. Zhang and D. Mesriov, J.and Waltz.
A hybrid system for protein secondary structure prediction.
J. Mol. Biol., 225:1049-1063, 1992.
35: K. Nishikawa and T. Ooi.
Amino acid sequence homology applied to the prediction of protein secondary structures, and joint prediction with existing methods.
Biochem.. Biophys. Acta, 871:45-54, 1986.
36: K. Nishikawa and T. Nogughi.
Predicting protein secondary structure based on amino acid sequence.
Meth. Enz., 202:31-44, 1995.
37: C. Geourjon and G. Deleage.
Sopma : Significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments.
Comp. App. Biosci., 11:681-684, 1995.
38: W. Kabsch and C. Sander.
A dictionary of protein secondary structure.
Biopolymers, 22:2577-2637, 1983.
39: F. M. Richards and C. E. Kundrot.
Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure.
Proteins, 3:71-84, 1988.
40: D. Frishman and P. Argos.
Knowledge-based protein secondary structure assignment.
Proteins, 23:566-579, 1995.
41: P. E. Boscott, G. J. Barton, and W. G Richards.
Secondary structure prediction for modelling by homology.
Prot. Eng., 6:261-266, 1993.
42: C. Sander and R. Schneider.
Database of homology-derived protein structures and the structural meaning of sequence alignment.
Proteins, 9:56-68, 1991.
43: D. F. Feng, M. S. Johnson, and R. F Doolittle.
Aligning amino acid sequences: comparison of commonly used methods.
J. Mol. Evol., 21:112-125, 1985.
44: S. B. Needleman and C. D. Wunsch.
A general method applicable to the search for similarities in the amino acid sequence of two proteins.
J. Mol. Biol., 48:443-453, 1970.
45: A. Siddiqui and G. J. Barton.
3Dee -- database of protein domain definitions.
submitted., 1998.
46: G. J. Barton and M. J. Sternberg.
Evaluation and improvements in the automatic alignment of protein sequences.
Protein Eng., 1:89-94, 1987.
47: A. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia.
Scop: A structural classification of proteins database and the investigation of sequences and structures.
J. Mol. Biol., 247:536-540, 1995.
48: M. Newman, C. Frazao, G. Khan, I. J. Tickle, T. L. Blundell, M. Safro, N. Andreeva, and A. Zdanov.
X-ray analyses of Aspartic Proteinases. structure and refinement at 2.2 Angstroms resolution of Bovine Chymosin.
J. Mol. Biol., 221:1295, 1991.
49: A. Sali, B. Veerapandian, J. B. Cooper, S. I. Foundling, D. J. Hoover, and T. L. Blundell.
High resolution x-ray diffraction study of the complex between endothiapepsin and an oligopeptide inhibitor. the analysis of the inhibitor binding and description of the ridgid body shift in the enzyme.
EMBO J., 8:2179, 1989.
50: Y.Satow, G.H.Cohen, E.A.Padlan, and D.R.Davies.
Phosphocholine binding Immunoglobulin study at 2.7 angstroms.
J. Mol. Biol., 190:593, 1987.
51: M.Bolognesi, G.Gatti, E.Menegatti, M.Guarneri, M.Marquart, E.Papamokos, and R.Huber.
Three dimensional structure of the complex between pancreatic secretory inhibitor (kazal type) and trypsinogen at 1.8 angstroms resolution.
J. Mol. Biol., 162:839, 1982.
52: R.B.Honzatko, W.A.Hendrickson, and W.E.Love.
Refinement of a molecular model for Lamprey Hemoglobin from Perromyzon Marinus.
J. Mol. Biol., 184:147, 1985.
53: T.P.J.Garrett, J.M.Guss, and H.C.Freeman.
The crystal structure of Poplar Apoplastocyanin at 1.8 Angstroms resolution.
J. Biol. Chem., 259:2822, 1984.
54: J.L.Smith, P.W.R.Corfields, W.A.Hendrickson, and B.W.Low.
Refinement at 1.4 Angstroms resolution of a model of Erabutoxin B. treatment of ordered olvent and discrete order.
Acta Cryst., 44:357, 1988.
55: V.D.Kumar, L.Lee, and B.F.P.Edwards.
Refined crystal structure of Calcium liganded Carp Paravalbumin 4.25 at 1.5 Angstroms resolution.
Biochem., 29:1404, 1990.
56: P.M.D.Fitzgerald, B.M.Mc Keever, J.F.Van Middlesworth, and J.P.Springer.
Crystallographic analysis of a complex between Human Immunodeficiency Virus Type 1 Protease and Acetyl Pepstatin at 2.0 Angstroms resolution.
J. Biol. Chem., 265:14209, 1990.
57: F.A.Quiocho, D.K.Wilson, and N.K.Vyas.
Substrate specificity and affinity of a protein modulated by bound water molecules.
Nature, 340:404, 1989.
58: D. Frishman and P. Argos.
Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence.
Prot. Eng., 9:133-142, 1996.
59: S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman.
Basic local alignment search tool.
J. Mol. Biol., 215:403-410, 1990.
60: A. J. Bleasby, D. Akrigg, and T. K. Attwood.
OWL -- A non-redundant, composite protein sequence database.
Nuc. Ac. Res., 22:3574-3577, 1994.
61: T. F. Smith and M. S. Waterman.
Identification of common molecular subsequences.
J. Mol. Biol., 147:195-197, 1981.
62: G. J. Barton.
Alscript: A tool to format multiple sequence alignments.
Prot. Eng., 6:37-40, 1993.
63: J. D. Thompson, D. G. Higgins, and T. J. Gibson.
CLUSTAL W: improving the sesitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weigh matrix choice.
Nuc. Ac. Res., 22:4673-4680, 1994.
64: B. R. Rost, C. Sander, and R. Schneider.
Redefining the goals of protein secondary structure prediction.
J. Mol. Biol., 235:13-26, 1994.
65: S.M. Weiss and C.A. Kulikowski.
San Mateo, 1991.
66: J. U. Bowie, R. Luthy, and D. Eisnenberg.
A method to identify protein sequences that fold into a known three-dimensional structure.
Science, 253:164-170, 1991.
67: X. Huang and W. A. Miller.
Adv. Appl. Math., 12:337-357, 1991.
68: W. R. Taylor.
Classification of amino acid conservation.
J. Theor. Biol., 119:205-218, 1986.
69: G. D. Rose.
Prediction of chain turns in globular proteins on a hydrophobic basis.
Nature, 272:586-591, 1978.
70: A. C. M. Wilmot and J. M. Thornton.
Analysis and prediction of the different types of beta-turn in proteins.
J. Mol. Biol., 203:221-232, 1988.
71: H. Sklenar, C. Etchebest, and R. Lavery.
Proteins, 6:46-60, 1989.
72: J. M. Levin.
Exploring the limits of nearest neighbour secondary structure prediction.
Prot. Eng., 10:771-776, 1997.
73: C. Geourjon and G. Deleage.
SOPM : a self optimised method for protein secondary structure prediction.
Prot. Eng., 7:157-164, 1994.
74: B. Robson J. Garnier, J. Gibrat.
GOR method for predicting protein secondary structure from amino acid sequence.
Meth. Enz., 266:540-553, 1996.
75: W. Steigemann and E. Webber.
Structure of Erythrocruorin in different ligand states refined at 1.4 Angstroms resolution.
J. Mol. Biol., 127:309, 1979.
76: E.T.Adman, L.C.Sieker, and L.H.Jensen.
Structural features of Azurin at 2.7 Angstroms.
Isr. J. Chem., 21:8, 1981.
77: A. Wlodawer, M.Miller, and M.Jaskolski.
Crystal structure of a retroviral protease proves relationship to aspartic protease family.
Nature, 337:576, 1989.
78: K.Petratos, Z.Dauter, and K.S.Wilson.
Refinement of the structure of Pseudoazurin from Alcaligenes Faecalis S-6 at 1.55 Angstroms.
Acta Cryst., 44:628, 1988.
79: W.E.Royer (Jr.).
High resolution crysallographic analysis of a cooperative dimeric Haemoglbin.
J. Mol. Biol., 657:657, 1994.
80: B.Rees, A.Bilwes, J.P.Samama, and D.Moras.
Cardiotoxin from Naja Mossanmica: The refined crystal structure.
J. Mol. Biol., 214:281, 1990.
81: Y.S.Babu, C.E.Bugg, and W.J.Cook.
Structure of Calmodulin refined at 2.2 Angstroms resolution.
J. Mol. Biol., 204:191, 1988.
82: N.K.Vyas, M.N.Vyas, and F.A.Quiocho.
Sugar and signal transducer binding sites of the escherichia coli galactose chemoreceptor protein.
Science, 242:1290, 1988.
83: E.Weber, E.Papamokos, W.Bode, R.Huber, I.Kato, and M. Laskowski.
Ovomucoid, a Kazal-type inhibitor, and model building studies of complexes with serine proteases.
J. Mol. Biol., 158:515, 1982.
84: T.O.Fischmann and R.J.Poljak.
Crystallographic refinement of the three-dimensional structure of FAB D1.2 Lysozyme complex at 2.5 Angstroms.
J. Biol. Chem., 266:12915, 1991.

Table 1: Pairs in the RS126 set that have an SD score of greater than 5. Alignments were generated by the AMPS package[6] a blosum62 matrix, and gap penalty of 10, with 100 randomisations. Fold definitions come from the current release (1.37) of the SCOP database [47]

(1)	(2)	SD score	Fold (1)	Fold(2)
1eca[75]	2lhb[52]	5.12	Globin-like	Globin-like
1azu[76]	2pcy[53]	5.40	Cupredoxins	Cupredoxins
2rspa[77]	5hvpa[56]	5.81	Acid proteases	Acid proteases
1paz[78]	2pcy[53]	7.22	Cupredoxins	Cupredoxins
2lhb[52]	4sdha[79]	7.70	Globin-like	Globin-like
1cdta[80]	3ebx[54]	8.26	Snake toxin-like	Snake toxin-like
3cln[81]	4cpv[55]	8.27	EF Hand-like	EF Hand-like
2gbp[82]	8abp[57]	8.86	Periplasmic binding	Periplasmic binding
1ovoa[83]	1tgsi[51]	9.45	Ovomucoid/PCI-1 like	Ovomucoid/PCI-1 like
1fdlh[84]	1mcpl[50]	12.66	Immunoglobulin	Immunoglobulin
4cms[48]	5er2e [49]	15.98	Acid proteases	Acid proteases

Table 2: Percentages of secondary structural state per secondary structure definition method

	DSSP[38]	STRIDE[40]	DEFINE[39]
Helix	28.9	29.8	30.2
Sheet	22.9	24.4	30.0
Coil	48.1	45.8	39.7

Table 3: Ranges of length for secondary structural elements as defined by DSSP[38] STRIDE[40] and DEFINE[39] for the RS126 set

State	Method	Min	Mean	Max	Total number of secondary structures
	DSSP[38]	3	9	54	817
Helix	STRIDE	2	10	51	753
	DEFINE[39]	5	14	65	553
	DSSP	1	4	19	1302
Strand	STRIDE[40]	1	4	19	1303
	DEFINE	4	6	26	1030

Table 4: Summary statistics of the alignments used in the predictions

	Ave. % ID.	Ave. sequence	Ave. No of sequences
	between sequences	length	per alignment
CB396 set	34	157 residues	18
RS126 set	31	185 residues	30

Table 5: Data for class types used for the predictions

Class definition	RS126 set No. (%)	CB396 set No. (%)
Alpha and beta (a/b)	25 (20)	107 (27)
Alpha and beta (a+b)	17 (13)	101 (26)
All alpha	27 (21)	68 (17)
All beta	38 (20)	78 (20)
Multi-domain	3 ( 2)	0 ( 0)
Small proteins	18 (14)	27 ( 7)
Membrane	1 ( $\le$ 1)	3 ( $\le$ 1)
Peptides	1 ( $\le$ 1)	12 ( 3)

Table 6: Q₃ and segment overlap results for the set of RS126, and CB396 proteins

Method	RS126 Protein set		CB396 protein set
	Q3	SOV	Q3	SOV
PHD[64]	73.5	73.5	71.9	75.3
DSC[29]	71.1	71.6	68.4	72.0
PREDATOR[28]	70.3	69.9	68.6	69.8
NNSSP[30]	72.7	70.6	71.4	71.3
CONSENSUS	74.8	74.5	72.9	75.4

Table 7: Family size for the automatically generated alignments for the RS126 protein set, considering 2 levels of BLAST[59] p-value cutoff

	p-Value cutoff 10^-10	p-Value cutoff 10^-2
Total Number of Residues	1716356	2013632
Total Number of Sequences	7013	8974
Average Number of Sequences per Family	55.6	71.2

Table 8: Comparison of the of Q₃ accuracy for a decrease in the BLAST[59] P-value cut-off from 10^-10 to 10^-2 with the RS126 set. (The alignments used for these predictions did not use a percentage identity filter)

	p-value cutoff	10^-10	p-value cutoff	10^-2
Method	DSSP	STRIDE	DSSP	STRIDE
PHD[64]	72.4	72.4	73.2	73.2
DSC[29]	70.2	70.0	71.0	70.7
PREDATOR[58]	69.7	69.3	70.7	70.3
NNSSP [30]	71.8	71.2	72.4	71.7
MULPRED	66.7	65.4	67.2	66.8
ZPRED [26]	65.5	64.7	66.7	65.9
CONSENSUS	73.9	73.7	74.5	74.3

Table 9: Effect on Q₃ accuracy by removing all sequences similar to the query at different % identity thresholds, using the RS126 protein set

	100%	95%	80%	75%	60%
PHD[64]	73.2	73.3	73.4	73.5	73.3
DSC[29]	71.0	71.0	71.0	71.1	70.9
PREDATOR[28]	70.7	70.4	70.1	70.3	70.4
NNSSP[30]	72.4	72.5	72.7	72.7	72.8
CONSENSUS	74.5	74.5	74.6	74.8	74.7
Total No. of Sequences	8974	5907	4320	3833	2681
Ave No. of Sequences/Alignment	71	47	34	30	20

Table 10: Mean percentages of secondary structure state defined by DSSP[38] when different 8 to 3 state reduction methods are used

	Mean % of Helix	Mean % of Sheet	Mean % of Coil
Method A	28.9	22.9	48.1
Method B	25.3	21.2	52.6
Method C	25.6	21.2	52.3

Table 11: Changing 8 to 3 state reduction, for DSSP and resultant Q₃ accuracy for the consensus method, based on the RS126 set of proteins

Change	Q₃
Reduction method A[64]	74.8
B $\rightarrow$ Coil only	75.7
G $\rightarrow$ Coil only	76.6
B and G $\rightarrow$ Coil	77.5
GGGHHHH $\rightarrow$ HHHHHHH, B and G $\rightarrow$ Coil (Method C)[30]	77.5
B, G and HHHH EE $\rightarrow$ Coil (Method B)[58]	77.9

Table 12: Results for the RS126 protein set, by reducing the definition to 3 state by methods A and B

Method	DSSP (A)	STRIDE (A)	DSSP (B)	STRIDE (B)
PHD[64]	73.5	73.5	76.3	76.3
DSC[29]	71.1	70.9	73.3	73.4
PREDATOR[28]	70.3	69.6	75.2	74.0
NNSSP[30]	72.7	72.2	77.3	76.5
CONSENSUS	74.8	74.7	77.9	77.9

Table 13: Results for single sequence prediction methods via a full jack-knife test. The column 'Author' is the authors jack-knife value for the method with their dataset, and definition reduction method. All results are calculated using reduction method A, and also converting G and B states to coil. For PHD[64] the authors quote 71.6% as their cross-validated accuracy. However, G and B states were considered in the accuracy calculation for PHD[64]

Method	RS126	CB396	Author
PHD[64]	76.3	74.2	-
SIMPA[72]	67.3	67.6	67.7
GOR IV[74]	53.3	64.6	64.4
SOPM[73]	66.8	64.7	69.0

**Figure 1:** Comparison of segment length distributions for each definition method
$\begin{figure}\begin{center} \leavevmode \epsfxsize 400pt \par\epsfbox{ms98188_fig.ps} \end{center}\end{figure}$

Next: About this document ... Up: No Title Previous: Acknowledgements

james@ebi.ac.uk