\documentstyle[12pt,html]{article} % jmb.sty is identical to apalike.sty 1989 June 19 % % apalike.sty style, used in conjunction with apalike.bst, % will produce an apa-like bibliography style: % % 1) Bibliography entries formatted alphabetically, last name % first, each entry having a hanging indentation and no label. % 2) References in the following formats: % (Author, 1986) % (Author and Author, 1986) % (Author et al., 1986). % 3) Multiple references in the form (Author1, 1986; Author2, 1987) % % To be used as an optional argument to the \documentstyle command; for example % \documentstyle[11pt,apalike]{book} % % 16-Sep-86, original version by Susan King and Oren Patashnik. % 13-Oct-87 changes: % Fixed bug in last line by adding the {} that disappeard when % the \hbox{} was removed from the pre-APALIKE definition; % added club and widow penalties; % patched the \newblock LaTeX bug from `-.07em' to simply `.07em'; % and made this work for document styles that don't define `chapter'. % % % Use parens instead of brackets for \cite, and no label in the bibliography % \def\@cite#1#2{(#1\if@tempswa , #2\fi)} \def\@biblabel#1{} % Set length of hanging indentation for bibliography entries % \newlength{\bibhang} \setlength{\bibhang}{2em} % \thebibliography environment depends on whether or not `chapter's can exist % \@ifundefined{chapter}{\def\thebibliography#1{\section*{References\@mkboth {REFERENCES}{REFERENCES}}\list {\relax}{\setlength{\labelsep}{0em} \setlength{\itemindent}{-\bibhang} \setlength{\leftmargin}{\bibhang}} \def\newblock{\hskip .11em plus .33em minus .07em} \sloppy\clubpenalty4000\widowpenalty4000 \sfcode`\.=1000\relax}}% {\def\thebibliography#1{\chapter*{Bibliography\@mkboth %{\def\thebibliography#1{\chapter*{References\@mkboth {BIBLIOGRAPHY}{BIBLIOGRAPHY}}\list {\relax}{\setlength{\labelsep}{0em} \setlength{\itemindent}{-\bibhang} \setlength{\leftmargin}{\bibhang}} \def\newblock{\hskip .11em plus .33em minus .07em} \sloppy\clubpenalty4000\widowpenalty4000 \sfcode`\.=1000\relax}} % `; ' goes between cites, and there's no \hbox around individual cites % \def\@citex[#1]#2{\if@filesw\immediate\write\@auxout{\string\citation{#2}}\fi \def\@citea{}\@cite{\@for\@citeb:=#2\do {\@citea\def\@citea{; }\@ifundefined {b@\@citeb}{{\bf ?}\@warning {Citation `\@citeb' on page \thepage \space undefined}}% {\csname b@\@citeb\endcsname}}}{#1}} % % Very cut-down version of this paper % % $Id: map_cut.tex,v 1.6 1995/09/29 15:54:07 gjb Exp gjb $ % % $Log: map_cut.tex,v $ % Revision 1.6 1995/09/29 15:54:07 gjb % hacked about some more % % Revision 1.5 1995/09/28 16:32:53 gjb % Extensive discussion of results added. % % Revision 1.4 1995/09/27 16:37:40 gjb % notes in intro - start of results section % % Revision 1.3 1995/09/25 17:17:55 gjb % small changes % % Revision 1.2 1995/09/25 16:16:25 gjb % Initial version from Rob % % % Quite a few additions and changes by Rob. 18/10/95 % % % I am begining to think that our best bet is to send this % back to JMB.... though this may just reflect a downward mood swing. % There is just so much to say, and I would only fancy its chances % in Science if it was a "report" rather than an article... % % Feedback? % % More changes 2/11/95 % % 12/12/1995 % Geoff's hack at the nearly final version - check tenses, splice out the % tables, re-write bits here and there. Rewrite Abstract. % % 8/2/96 Accepted at bloody long last, some revisions by Rob % in answer to the referees criticisms. Annotated by comments % where done. I'll put a little "REF_RBR" in the comment so that % others can search forward for the changes % % Note: 1PAZ just didn't seem to score highly in the MAP runs in answer to Ref II % % Note: and I don't know what the hell to say about Rost's... I mean referee I's % vague question: "What is fold recognition?" Any ideas? % % Note: I don't think we should merge Tables 6 & 7. There would be % a loss of clarity. Just how would we do it? Damn referee. % % Note: I also don't know what to do about Figure 1 and the clarity. % I don't think the referee made much sense. I think we % are dealing with a general problem of not being able to use colour % because we are poor. % % There are a few things that I think we should clarify or bring out % % a) The fact that one will do better if one uses carefully constructued predictions % and that we have had to be systematic in the assessment to avoid biasing % our results. I know we say it, but it doesn't seem to carry somehow. % Ignore this comment if you disagree. % % b) Both referees seemed to miss the distinction between folds and maps. I don't % really know where to change this; maybe you (Rich, Geoff) can think of something. % % c) Both referees seem a bit confused about the Sec-Sec versus Res-Res assessment % of accuracy. I have tried to fix this, but you may want to try and clarify things % more. Bon chance. % % d) After where we say "the simplicity of the technique" it might be worth saying % that it is surprising that such simple principles are able to out-perform % THREADER, and argue about just what fold recognition methods a la THREADER and % PROFIT are actually doing when they work: Secondary structure prediction and % accessibility preferences and length. Nothing more, nothing less. Also % % e) I (personally) wouldn't mind suggesting that the problem of FOLD recognition would % now seem to be one of distinguishing between folds of the same folding class. % Look at how the methods do with, say, patterns containing 8 strands or 4 helices: % they reliably find folds containing this sequential arrangment of secondary structures % but are often unable to distinguish between them. This is not really surprising; % For example, if one considers the different between a lipocalin (up-and-down % eight stranded be barrel) and an Ig fold (eight strands in a sandwich), the % amphipathic character and even the length of the strands are comparable (I think), % and yet the folds are really wildly different. Similarly for the four helix bundle % examples and the plastocyanin example (which finds a nice six stranded barrel as % the top scoring using MAP(PHD)). This is a bit of a rant, so I will let you stop % and think. % % f) Oh, and I think we should somehow separate the true successes from the homologues in % the globin and plastocyanin searches (i.e. shade them different, or something), since % they aren't really the same thing as the other "successes". % % g) Over to you two. % \newcommand{\al}{\mbox{$\alpha$~}} \newcommand{\be}{\mbox{$\beta$~}} \newcommand{\albe}{\mbox{$\alpha/\beta$~}} \newcommand{\tten}{\mbox{${\rm 3_{10}}$~}} \newcommand{\ea}{\mbox{\em et al. \/}} \newcommand{\Cal}{\mbox{${\rm C}_{\alpha}$~}} \newcommand{\Cbe}{\mbox{${\rm C}_{\beta}$~}} \newcommand{\ii}{\mbox{$i$~}} \newcommand{\jj}{\mbox{$j$~}} \newcommand{\ip}{\mbox{$i^{\prime}$~}} \newcommand{\jp}{\mbox{$j^{\prime}$~}} \newcommand{\ra}{\mbox{$\rightarrow$}} \newcommand{\sdp}{\setlength{\baselineskip}{18truept}} \newcommand{\ssp}{\setlength{\baselineskip}{13.6truept}} \begin{document} \begin{titlepage} \begin{center} \begin{Large} {\bf Protein fold recognition by mapping predicted secondary structures}\\ %{\bf Protein fold recognition from secondary structure prediction}\\ \vskip 0.10in \end{Large} {\em Robert B. Russell}\ddag{\em, Richard R. Copley and Geoffrey J. Barton}\dag University of Oxford\\ Laboratory of Molecular Biophysics\\ The Rex Richards Building, South Parks Road\\ Oxford, OX1 3QU, England\\ Tel: 44 1865 275368 FAX: 44 1865 510454\\ E-mail: gjb@bioch.ox.ac.uk \ddag Present Address:\\ Biomolecular Modelling Laboratory\\ Imperial Cancer Research Fund Laboratories\\ 44 Lincoln's Inn Fields, P.O. Box 123\\ London, WC2A 3PX, England\\ E-mail: russell@icrf.icnet.uk \dag To whom correspondence should be addressed. Keywords: protein; structure prediction; fold recognition; threading; nuclear magnetic resonance; secondary structure mapping. Running title: Fold recognition from secondary structure prediction Published in {\em J. Mol. Biol.} (1996), {\bf 259}, (3), 349-365. \end{center} \end{titlepage} \section{abstract} A strategy is presented for protein fold recognition from secondary structure assignments (\al-helix and \be-strand). The method can detect similarities between protein folds in the absence of sequence similarity. {\em Secondary structure mapping} first identifies all possible matches (maps) between a query string of secondary structures and the secondary structures of protein domains of known three--dimensional structure. The maps are then passed through a series of structural filters to remove those that do not obey simple rules of protein structure. The surviving maps are ranked by scores from the alignment of predicted and experimental accessibilities. Searches made with secondary structure assignments for a test set of eleven fold-families put the correct sequence-dissimilar fold in the first rank 8/11 times. With cross-validated predictions of secondary structure this drops to 4/11 which compares favourably with the widely used THREADER program (1/11). The structural class is correctly predicted 10/11 times by the method in contrast to 5/11 for THREADER. The new technique obtains comparable accuracy in the alignment of amino acid residues and secondary structure elements. Searches are also performed with published secondary structure predictions for the von-Willebrand factor type A domain, the proteasome 20S \al subunit and the phosphotyrosine interaction domain. These searches demonstrate how the method can find the correct fold for a protein from a carefully constructed secondary structure prediction, multiple sequence alignment and distance restraints. Scans with experimentally determined secondary structures and accessibility, recognise the correct fold with high alignment accuracy (86\% on secondary structures). This suggests that the accuracy of mapping will improve alongside any improvements in the prediction of secondary structure or accessibility. Application to NMR structure determination is also discussed. \newpage \section{Introduction} The flood of new protein sequences demands techniques to infer protein 3D $\ddag$ structure from sequence alone. For $\approx 30$\% of protein sequences, conventional alignment techniques (e.g. \cite{lipman85,altschul90,smith81}) or profile and pattern methods (e.g. \cite{gribskov87,bs90}) find similarities to a protein of known 3D structure \cite{chothia92}. The remaining 70\% of protein sequences may adopt previously unseen protein folds. Alternatively, they may have topologies (folds) similar to known protein structures but share no detectable sequence similarity \cite{rb94}. Such fold similarities will normally not be found until both protein 3D structures have been determined experimentally \cite{orengo94a,holm94a}. In an attempt to find fold similarities of this type in advance of 3D structure determination, several fold recognition techniques have been developed. (see \cite{bowie93,wodak93,jones93} and refs therein.) These techniques may locate some fold similarities that are undetectable by the comparison of sequence. However, the methods are often computationally intensive and many similarities still go un-detected \cite{pickett92,lemer96}.\\ \\ In parallel with the development of fold detection methods, the accuracy of secondary structure prediction has improved from $\approx 65$\% to $\approx 72$\% on average. Though this is only a small percentage increase, recent predictions are more useful, since the application of multiple sequence alignments improve the identification of the number, type and location of core secondary structure elements. Prediction from sequence alignments can also accurately identify the position of loops, and residues likely to be buried in the the protein core \cite{benner94,barton95,russell95}. Given a good secondary structure prediction, the next question to ask is how the secondary structures might be arranged into a tertiary fold. {\em ab initio} methods for folding secondary into tertiary structure search for possible arrangements of secondary structures that obey general packing rules \cite{cohen80a,cohen80b,cohen82,smith-brown93,sun95}. These methods have been applied in numerous blind predictions \cite{hurle87,cohen86,curtis91,jin94,huang94} with varied results. A limitation is the number of packing combinations that must be considered. This can become unmanageable for $>9$ secondary structures \cite{cohen82}, though approaches to reduce the number of combinations have been described \cite{taylor91,clark91}.\\ \\ The most successful predictions of protein tertiary structure in the absence of clear sequence similarity to a protein of known 3D structure, have been those where secondary structure predictions, and/or experimental information were combined to suggest resemblance to an already known fold. Correct folds have been predicted in this way for the \al subunit of tryptophan synthase \cite{crawford87}, a family of cytokines \cite{bazan90}, and recently, for the von Willebrand factor type A domain \cite{edwards95}, and the synaptotagmin C2 domain \cite{gerloff95}. Although the details of these studies differed, all used predicted secondary structures from multiple alignment, combined with the careful application of protein structural principles (often together with experimental data) to suggest a protein fold. Two automated methods for comparing predicted and experimental secondary structures have been described previously \cite{sheridan85,rost95} with promising though limited preliminary results.\\ % % REF_RBR I had an idea about ending this sentence in a better way, though % I can't remember it now. It was something about saying that they % were potentially promising, but weren't explored to a full enough % extent, or something. Work on it. % \\ In this paper we show how secondary structure and accessibility prediction together with basic rules of protein structure may be used to find the correct fold within a database of protein structural domains. The method first generates all possible matches (referred to as `maps') between query and database secondary structure patterns, allowing for insertions and deletions of whole secondary structure elements. Maps are filtered by a series of structural criteria to arrive at a collection of sensible template structures. The sequence of the query protein is then aligned to the template structures by matching predicted and observed patterns of residue accessibility. Finally, alignments are ranked by a score that combines accessibility matching with a penalty for differences in secondary structure length. The method is designed to cope with incorrect secondary structure assignments, insertions/deletions of whole secondary structure elements, and differences in the lengths and orientations of secondary structures. \section{Methods} \subsection{Database of unique protein 3D structural domains} A database of protein 3D structural domains was derived from the Brookhaven Protein Databank \cite{brookhaven}. $930$ non-identical chains were clustered by sequence comparison \cite{smith81,barton93b} to leave $275$ sequence families. One representative of each family was chosen to have the highest resolution and lowest R-factor. The representative structures were then split into $377$ domains by eye. A sub--database of higher quality domains was created for analysis. This contained only those structures determined by X--ray crystallography, refined and of a resolution of $2.5$ \AA~ or better. Secondary structures for all domains were defined by the programs DSSP (definition of secondary structure in proteins) \cite{dssp} or by DEFINE \cite{define} when only \Cal atoms were available. Axial coordinates were calculated for all secondary structures as described in \cite{define}. Extra axial coordinates were calculated at the N-- and C-- terminal ends to allow for possible differences in secondary structure length. The domain database is available via the WWW (http://geoff.biop.ox.ac.uk/). \subsection{Alignment of secondary structures} The secondary structure of the protein is represented as a sequence of H and B characters where each H represents an entire \al helix and each B a \be strand. A fast method for generating all exact matching alignments between two strings that allows up to a maximum number of deletions from each string \cite{rcb95a} is used to find all {\em maps} between the query pattern of secondary structures and the domain database. The method is recursive, and reminiscent of regular expression matching. In this study up to two deletions were permitted from the query secondary structure string, to allow for errors in the prediction. Up to five deletions were permitted from each database structure, to allow insertions or deletions of secondary structures typical of proteins having similar 3D structures in the absence of sequence similarity. Deletions from the database structure were only counted if they were contained within matched elements (overhanging deletions were ignored). Explicit mismatches were not allowed, but were treated as deletions from either the query or database structure. % % REF_RBR Point to this paragraph in answer to Ref I's comment delta % I have added one new sentence to the end of the above paragraph, % since it wasn't quite accurate before. % REF_GJB I've added another sentence to mention mismatches. % % I have added the sentence below in an attempt to answer part % of Ref I's question about optimisation of "free" parameters % These values were chosen since they are typical of the expected accuracy of secondary structure prediction, and typical of insertions and deletions of secondary structure elements across members of a diverse structural family. In practice, the allowable deletions from query and database should be chosen on a case by case basis. For consistency, we kept the maximum numbers of deletions fixed during this study. \subsection{Filters} The alignment method will find all maps between two strings of secondary structure elements, but due to the allowance for deletions, many of these will correspond to implausible topologies. Accordingly, seven filters are used to remove maps corresponding to nonsensical protein 3D structures and/or those not satisfying imposed experimental restraints. \subsubsection{Removing un-compact structures} Two filters exploit the radius of gyration, $R_{g}$, to remove non-compact maps. Analysis of the $275$ high quality domains shows that $R_{g} \leq 2.8 L^{0.34} + 4.0$, where $L$ is the length of the structure in residues. For each map, a coarse $R_{g}$ is first calculated by considering the centroids of secondary structures, and their C-terminal loops as point masses. A fine $R_{g}$ is also calculated by considering all matched residues (plus C-terminal loops) as point masses. Maps are removed if either $R_{g}$ value is greater than the maximum for compact domains of the same length. \subsubsection{Loop length distance restraints} Analysis of the $275$ high quality domains shows that the maximum distance $D_{max}$ between axial coordinates that can be bridged by a loop of $N_{l}$ residues is $11.621 (N_{l}+0.25)^{0.359} + 0.5$ \AA. % % REF_RBR Ref I added units % Maps having any loop with distances larger than $D_{max} + 4$ \AA~ are removed. $4$\AA~is added to allow for differences in the packing of database and query secondary structures, since similar structures with little sequence similarity can have shifts of up to $4$ \AA~ \cite{holm95}.\\ \\ Care is taken to allow a range of possible positions for the match of query and database structures. This allows for errors in secondary structure prediction, which may fail to predict the precise start or end of correctly identified elements, and allows for the observed differences between the lengths of secondary structure elements within proteins having similar topologies despite no significant sequence similarity. For a position $x$ on a database secondary structure, and a minimum and maximum length for a query secondary structure, $L_{min}$ and $L_{max}$, the range of allowable positions of the query residue on the database structure (of length $L_{obs}$) is given by:\\ \begin{center} $x_{min} = $ min $(L_{obs} - L_{max},0) - h + x$\\ $x_{max} = $ max $(L_{obs} - L_{min},0) + h + x$\\ \end{center} where $h$ is a leniency parameter, allowing for differences in the length of query and database secondary structures. $h = 4$ allows for differences typical of those found in proteins having similar 3D structures despite no sequence similarity. \subsubsection{Poor \be sheets} The deletion of \be strands from a \be sheet can lead to maps corresponding to nonsensical 3D structures. Maps containing isolated \be strands (i.e. those lacking hydrogen bonding partners) are removed. Maps are also removed if \be strands are deleted from the centre of \be sheets contained within the map.\\ \\ Analysis of high quality domains shows that the number of \Cal -- \Cal contacts $\leq 6$ \AA~ made by a \be strand ($C_{\beta - \alpha\alpha}$) with any of its neighbouring \be strands is always $\geq N_{\beta} - 2$, where $N_{\beta}$ is the number of residues in the $\beta$ strand. Thus maps are also removed if one or more \be strands has $C_{\beta - \alpha\alpha} < N_{\beta} - 2$. \subsubsection{Adjacent parallel structures} Maps are removed if tandem secondary structures in the query are made to match parallel structures in the database by the deletion of intervening secondary structures. Genuine adjacent parallel structures within the database are allowed. This filter can be turned off in instances when there are long loops connecting query secondary structure elements, as in the phosphotyrosine interaction domain example (see Results). \subsubsection{Distance restraints} Distance restraints may be imposed from the results of NMR experiments, knowledge of the disulphide linkages, or knowledge of residues involved in the active or binding site of the query. In this study, distance restraints are only included in the von Willebrand factor and Proteasome examples (see results). A tolerance value $t = 4$\AA~ is added to all distance restraints as for the loop length filtering. % REF_RBR - not a change due to comments % I just thought the two sections below could be combined, since % "Redundancy" previously only contained one sentence. \subsubsection{Consistency \& Redundancy} Maps are only kept if there is at least one placement of the query onto the database secondary structures where all distance restraints (loop length and/or experimental) are satisfied simultaneously. After application of all the other filters, matches contained entirely within another match are considered redundant, and removed. \subsubsection{Maps removed by each filter} It is illustrative to consider the fraction of maps removed by each of the filters described above. For example, a pattern derived from a DSSP assignment of secondary structure for thioredoxin that allows for 2 secondary structure element deletions from the query and 5 from the database, the initial alignment of secondary structure elements reduces the number of folds from $377 \ra 212$. $165$ folds have no match of secondary structures with the predicted thioredoxin pattern. \htmladdnormallink{Table~1}{ntable1.ps} illustrates the fractions of the initial $204783$ maps within $212$ folds that are removed by each filter when applied independently. \htmladdnormallink{Table~2}{ntable2.ps} shows for the same example, how the number of maps drops as the filters are applied in succession. The filters are independent of one another apart from consistency filtering, which must be applied {\em after} loop and distance restraint filtering, and redundancy filtering, which must be applied last. The order of filters shown in \htmladdnormallink{Table~1}{ntable1.ps} was chosen so as to optomise speed.\\ % % RBR_REF Ref I's comment about the order of filters addressed by % new sentences (above). \\ The gradual elimination of maps and folds shows how the simple principles of protein structure are sufficient to reduce the number of possible alignments by two orders of magnitude. Interestingly, the number of folds drops very little after the generation of maps, suggesting that the filters are tending mostly to remove nonsensical maps associated with each identified fold rather than ruling out folds. Note that consistency filtering tends only to remove maps when tight loop lengths or distance restraints are included in the pattern. % % RBR_REF Ref I commented that the number of folds only dropped % by 10 %. Clearly this reflects a failure to distinguish between % maps and folds. I have added the sentence "Interestingly... % to try and clarify this a little. % \subsection{Fitting sequences on to 3D structures} Accessibilities for residues within each map are calculated quickly by exploting the relationship between relative accessibility and the number of other $C_{\beta}$ atoms within $7$ \AA~ ($N_{C\beta7}$) of a residues $C_{\beta}$ atom. $N_{C\beta7}$ is calculated by considering secondary structures and the C-terminal coils for the matched structures. Analysis of the high quality domains shows that helical residues are buried (b) when $N_{C\beta7} \geq 3$, exposed (e) when $N_{C\beta7} = 0$ and intermediate/unknown (u) otherwise. Similarly, residues in \be strands are b when $N_{C\beta7} \geq 6$, e when $N_{C\beta7} \leq 3$ and u otherwise. In the examples presented here, predicted accessibilities were taken from the SUB line within PHD \cite{rost94b} output, which highlights those regions predicted with confidence. Remaining positions were assigned as unknown (u) accessibility.\\ \\ Given assignments of accessibility, the best alignment for each pair of secondary structures not permitting gaps within either secondary structure is found by applying the scoring matrix shown in \htmladdnormallink{Table~3}{ntable3.ps}. % % REF_RBR more on "free" parameter optimisation comment by Ref I. These values were chosen to prevent long overhanging gaps in the alignment of predicted and experimental secondary structures, and designed not to penalise mismatches too heavily. The total similarity score for the alignment is then defined as: \[ (\sum_{i=0}^{i=N} S_{acc}) - L_{{diff}} \] where $S_{acc}$ is the best score for a pair of matched secondary structures calculated by summing values from \htmladdnormallink{Table~3}{ntable3.ps}, $N$ is the number of matched secondary structures, and $L_{diff}$ is the total difference in the lengths of the two protein domains being compared. When calculating $L_{diff}$ those secondary structures that have been equivalenced are ignored, since overhanging gaps are already penalised by the gap score in \htmladdnormallink{Table~3}{ntable3.ps}. \subsection{Protein Structure Patterns for Evaluation} Representatives (queries) from each of $11$ structural families containing structural similarities despite no sequence similarity \cite{rb94} were chosen to assess the method. The $11$ queries are shown in \htmladdnormallink{Table~4}{ntable4.ps} and represent a diversity of folds from all four protein folding classes. For all queries, there is at least one clear example of a similar fold in the database that does not show any detectable sequence similarity to the query. For reference, similar folds in the database were found by the STAMP (structural alignment of multiple proteins) structure comparison program \cite{rb92b} and with reference to the structural classification of proteins (SCOP) database \cite{murzin95}.\\ % % REF_RBR added an explanation about why the 11 were chosen. % (covers range of folding classes, at least one example, etc. % \\ Two patterns were defined for each of the eleven structures: a) one taken directly from the DSSP secondary structure assignment and accessibility (i.e. perfect prediction) and b) one from cross--validated secondary structure and accessibility prediction by the methods of Rost \& Sander \cite{rost93a,rost94b}. The PHD program and jack-knifed neural network architectures were kindly provided by Dr Burkhard Rost (EMBL). % % REF_RBR added sentence about where we got the PHD program from % in answer to Ref II. % Experimental secondary structure summaries and accessibilities (a) were taken from DSSP \cite{dssp}. Predicted secondary structure summaries (b) were taken from the `PHD sec' % % REF_RBR More in answer to Ref I's comments re point epsilon % and the minor point about SUB_ACC several changes in the % paragraph below entries and accessibilities from the `SUB acc' entries, since these most closely resembled the assignments from the $N_{C\beta7}$ calculation of accessibility. PHD assignments of buried and exposed states were classified as buried and exposed, with all other positions `i' or no assignment as `u'. Strands shorter than two residues, and helices shorter than four residues were ignored. The length of the secondary structure was given by the number of residues in each secondary structure (maximum = minimum), and the number of residues between the secondary structures was taken as the minimum loop length.\\ \\ Patterns may also contain distance restraints, such as those available from NMR experiments, disulphide linkages, or SDM studies. Distance restraints were only added in the von--Willebrand factor and Proteasome patterns (see Results). \subsection{Cross--validation} Any predictive method that needs large numbers of parameters must be cross-validated to ensure that the method does not do artificially well on the examples used to derive the parameters. For cross validation of the secondary structure and accessibility predictions, we used the jack-knifed neural--network architectures described by Rost \& Sander (1993a) (Kindly provided by Dr. B. Rost.) Secondary structure and accessibility for each query protein was predicted by an architecture that did not include the query protein or any homologue.\\ \\ The filters and matching algorithm described here use only a few geometric parameters all of which are independent of the protein sequence. Accordingly, removal of query proteins and homologues from the set used to derive the equations above makes a negligible difference to the parameters. \subsection{Computational details} Runs for the patterns shown in \htmladdnormallink{Table~4}{ntable4.ps} take between 5 and 60 minutes on a Silicon Graphics Indigo 2 (150 MHZ IP22 Processor MIPS R4400). The MAP program is available from the authors. Contact GJB by e-mail: gjb@bioch.ox.ac.uk or see the WWW address http://geoff.biop.ox.ac.uk/ for details. \section{Results} \subsection{Assessing accuracy} Structural similarity is a continuum and for some fold types opinions differ as to what constitutes ``similar''. For example, thioredoxin has a $\beta$-sheet with helices packing on each side which superficially resembles a Rossmann fold domain. However, the topology of the sheet is different from a Rossmann fold: the connectivity is different, and it contains a mixture of parallel and antiparallel \be hairpins rather than all parallel. To build a detailed model of thioredoxin based on a Rossmann fold would be incorrect, but recognising that thioredoxin has a ``single sheet with helix on each side'' is still useful. For some folds, e.g. the $\beta$-trefoils, there is no such ambiguity. We discuss the accuracy of our method using two grades of success `strict' and `loose', which are outlined in \htmladdnormallink{Table~5}{ntable5.ps}. Strict similarities are those where the topology of the structure in the database is nearly an exact match of that found in the query (e.g. plastocyanin and azurin). Loose similarities are those where the topologies are broadly similar, with additional secondary structures in one fold relative to another, and with some differences in topological ordering or orientation of equivalent secondary structure elements (e.g. plastocyanin and an Ig fold). Strict similarities tend to correspond with those specified by {\bf scop} \cite{murzin95}, whereas the loose similarities tend to correspond roughly with those identified by CATH \cite{orengo93a} and by the assessors of the protein structure prediction challenge \cite{lemer96}.\\ \\ For comparison, we also scanned the same eleven queries against the database of domains using the fold recognition program THREADER \cite{jones92} with default parameters. \\ In addition to the recognition of the correct fold, it is important to consider how well the query is aligned onto the database structure. Two measures of alignment accuracy are given: a) the fraction of correct residue equivalences found by each method {\em \%~Res--Res}, and b) the fraction of correctly overlapping secondary structure elements found {\em \%~Sec--Sec}. Secondary structures were considered correctly matched if at least two residues from structurally equivalent secondary structures overlapped in the alignment generated by each method. \%~Res--Res is a % % REF_RBR Tried to clarify Sec-Sec a little better to % help out our poor referees % strict definition, and broadly measures how accurate a 3D model would be if based on the alignment found. \%~Sec--Sec is a looser definition, and allows for slippages of secondary structures and thus indicates the accuracy of the predicted topology. The second measure is arguably a more reliable guide, since for many pairs of similar protein structures, alignments of sequence based on 3D structure are ambiguous. Problems arise when assessing the symmetrical \albe barrel structures. Shifting the alignment of secondary structure elements by one \be\al unit can lead to zero accuracy by these measures, though the resulting structure is largely correct. We thus report average accuracies with and without the \albe barrels. To assess the overall alignment accuracies of each method, only those strict similarities that were not detectable by a sensitive sequence comparison algorithm \cite{barton93b} were considered. Similarities excluded were those with the globins, 1ECA, 1HBG and 1MYGA when scanning with Sea Hare Myoglobin, and that with 1PAZ when scanning with plastocyanin. For all other examples, accuracies were included in the calculation of an average, regardless of whether the similarity was found at or near the top of the ranked lists. A total of 36 strict similarities were used in the calculation. % REF_RBR added the last sentence here in answer to % Referee I's point beta. \subsection{Searches with eleven test proteins} The results of comparing the eleven protein structures to the database of domains using DSSP patterns, PHD patterns, and the THREADER program are shown in \htmladdnormallink{Table~6}{ntable6.ps}. The table lists the top 10 ranked domains for each query by each method. For each domain, the code, score, structural class and fold description are shown together with the alignment score and the percentage accuracies of the alignments at the residue (\% Res-Res) and secondary structure (\% Sec--Sec) level (see below). Within \htmladdnormallink{Table~6}{ntable6.ps}, domains classified as strict similarities (ignoring those detectable by sequence comparison) are shown in inverse text; loose similarities are shown as shaded. \htmladdnormallink{Table~7}{ntable7.ps} summarises the rankings shown in \htmladdnormallink{Table~6}{ntable6.ps} (see legend).\\ \\ Judging by the strict criteria shown in \htmladdnormallink{Table~5}{ntable5.ps}, 8/11 of the scans made with experimentally determined secondary structure (MAP(DSSP)) put the correct fold in the first rank. By the loose definition, the method located 10/11 folds in the first rank. Predictably, the scans based on patterns from secondary structure prediction fare worse. 4/11 folds were correctly ranked at position 1 by the strict criteria. However, this compares favourably with THREADER which placed 1 fold correctly in the first rank. When the loose definitions of fold similarity are used, our method placed 5/11 correct folds at the top of the list compared to 2/11 for THREADER. Expanding the definition of success to include any search that places a correct fold in the top 10, as described by Lemer \ea (1996) \nocite{lemer96}, shows a similar trend \htmladdnormallink{(Table~7)}{ntable7.ps}. The greater success of the DSSP derived patterns suggests that fold recognition by this method will improve alongside any improvements in secondary structure and accessibility prediction. The structural class of proteins (as identified using SCOP) in the top 10 domains was more consistent by our method: MAP(PHD) scans lead to 10/11 correct protein class predictions for the 1st ranked protein, compared to 5/11 for THREADER. Although this improvement may be due mostly to the accuracy of the PHD predictions, the result suggests that other fold recognition methods could profit from the consideration of predicted secondary structures.\\ % % REF_RBR added comment above about classes from SCOP and % that class predictions by MAP(PHD) may be due % to the accuracy of PHD. \\ Our method (MAP) shows an improvement over THREADER with respect to detecting the correct fold. What of alignments of sequence to structure? Values for individual accuracies are given in \htmladdnormallink{Table~6}{ntable6.ps}. Reference alignments of 3D structures were found by the STAMP algorithm \cite{rb92b} for all strict similarities with the eleven protein families. The averaged values for \% Res--Res and \% Sec--Sec are shown in \htmladdnormallink{Table~8}{ntable8.ps}. MAP(DSSP), MAP(PHD) and THREADER give \% Res--Res of 35, 15 and 11 \% respectively and \% Sec--Sec of 75, 43 and 37\%. If one ignores the repetitive \albe barrel alignments, accuracies improve slightly with \% Res--Res 39, 15 and 13\% and \% Sec--Sec of 86, 49 and 50 \% for MAP(DSSP), MAP(PHD) and THREADER. None of the methods perform well by the \% Res--Res criterion, though \% Sec--Sec suggests that the correct topology is achieved about 50 \% of the time. The high \% Sec--Sec for MAP(DSSP) scans suggests that alignment accuracy, like fold recognition, will improve with developments in secondary structure and accessibility prediction.\\ \\ How useful are the detected loose similarities? For some examples, loose similarities imply only a broadly similar architecture, and may not immediately be used for homology modelling studies. However, for others the loose similarity genuinely represents a feasible modelling template. For example, the PHD prediction of hepatocyte nuclear factor 3 (HNF-3) failed to predict two short \be strands found in the native structure, and thus the MAP search did not detect BirA domain I (PDB code 1BIA) or GAP domain I (2GAP) as possible templates. However, the search with the predominantly helical prediction did rank another helix-turn-helix motif first, as shown in \htmladdnormallink{Figure~1}{figure1.ps}. The core three helices have been aligned correctly at the secondary structure level and a prediction of this type could be useful in the absence of experimental 3D structure information. \subsection{Fold recognition from published predictions} In the tests above only the type and length of secondary structures, the loop length observed in the query structure, and the pattern of burial and exposure, observed or predicted for each secondary structure segment were used in the search. Many published predictions are augmented by human insight, contain detailed predictions of loop lengths, and consider experimental distance restraints. All of this information can be used with the MAP method described here. To test the method under these circumstances, we considered three predictions: 1) the von Willebrand factor (vWf) prediction by Edwards \& Perkins (1995), 2) the Proteasome prediction by Lupas \ea (1994) \nocite{lupas94} and 3) a prediction for the Phosphotyrosine Interaction Domain (PID) by Bork \& Margolis (1995). \nocite{bork95a,edwards95} All of these predictions were made from very diverse sequences, which is likely to improve prediction accuracy \cite{russell95}. The predictions also comprise carefully constructed sequence alignments, that can provide tight loop--length distance restraints. For the three searches, a larger and more up-to-date database of 780 protein domains was scanned (A. S. Siddiqui per. comm.) Subsequent 3D structure determination has shown all three of these proteins to resemble previously observed folds \cite{lee95,brannigan95,zhou95}. \subsubsection{The vWF domain} Perkins \& co--workers (Perkins \ea, 1994; Edwards \& Perkins, 1995) \nocite{perkins94,edwards95} used an alignment of 92 sequences together with spectroscopic data, and prediction algorithms to predict that the vWf domain would comprise a repeating arrangement of \be strands and \al helices. Edwards \& Perkins combined a THREADER scan with analysis of the location of active site residues, a putative disulphide bridge, and the principles of protein 3D structure. They suggested that the vWf domain would be most likely to resemble ras p21. The subsequently determined 3D structures \cite{lee95} showed this prediction of secondary structure and fold to be largely correct \cite{russell95}.\\ \\ Our mapping technique allows many of the features exploited by Perkins \ea to be combined in a prediction. \htmladdnormallink{Figure~2}{figure2.ps} shows a vWf pattern based on the prediction of Perkins \& co-workers \cite{perkins94,edwards95}. In addition to a pattern of predicted secondary structures, the pattern also contains detailed information as to the loop lengths, and details of two distance restraints: one from a pair of aspartic acids thought to be involved in a metal binding site (constrained to have their axial coordinates within 15 \AA), and a putative disulphide bond (constrained to have their axial coordinates within 9.5 \AA). A tolerance of $t = 4$ \AA~ was added to each of these restraints to allow for changes in secondary structure packing across similar protein 3D structures.\\ % % REF_RBR Re: Ref I's "free" parameters comment % I think that the last sentence (above) is a good enough caveat. % I am tempted to call it a "slop factor", but this is sloppy. % % \\ A comparison of the vWf pattern to the database of 780 domains finds Elongation factor Tu (PDB code 1ETU), Ras P21 (821P) and Che-Y (3CHY) as the three top scoring folds, with other double--wound, \albe, Rossmann-type folds following in the top 20 scoring folds. The top 3 scoring proteins are highly similar to the recently solved structures of the vWf, with Ras P21/Elongation factor Tu being the most similar \cite{lee95}. \subsubsection{The Proteasome} Lupas \ea (1994) predicted the secondary structure for the 20S proteasome \al subunits by a variety of algorithms. We took their predicted pattern of secondary structure elements and accessibility and searched the database of 780 non-redundant protein domains. Without imposing any experimental distance restraints, the method finds $7$ folds ($173$ maps). The top scoring fold, according the the amphipathicity scoring scheme, is that of glutamine amidotransferase (PDB code 1GPH), which is structurally and functionally similar to the proteasome \cite{lowe95,brannigan95}.\\ \\ A small number of weak distance restraints can make a significant difference to the results of this search. If alignment positions identified as putative active site residues by Lupas \ea, by the method of Benner and co-workers \cite{benner93a}, are required to have axial coordinates within $15$ \AA~ (tolerance of $4$ \AA) of each other, only $4$ folds ($19$ maps) remain, with the correct fold still at the first rank. Although distance restraints are not always available prior to 3D structure determination, our results suggest that they should be used to aid fold recognition whenever possible. \subsubsection{The phosphotyrosine interaction domain} Bork \& Margolis (1995) recently identified a new phosphotyrosine interaction domain (PID) involved in the cytoplasmic signalling cascade. They constructed an alignment of several diverse members of this sequence family, and performed a prediction of secondary structure. We ran the PHD program on a slightly more up-to-date alignment of PID proteins (P. Bork, personal communication), to predict the secondary structure and accessibility. A search pattern was made from the prediction, and the loop length ranges taken from the multiple alignment. The pattern of 9 secondary structures was BBHBBBBBH and these elements are numbered sequentially from 1--9 below. Since there were two long loops connecting the predicted secondary structures, the adjacent parallel filter was not used during the search. Structures corresponding to the best alignment with each of the top six scoring folds are shown in \htmladdnormallink{Figure~3}{figure3.ps}. Recent structure determination has shown the PID (PTB domain) to resemble the plekstrin homology (PH) domain in structure and function \cite{zhou95}. By the accessibility scoring scheme, the top ranked fold is not a PH domain, although a PH domain (from dynamin) is ranked at position 2. The top 6 folds are illustrative in that they show how the method can suggest alternative plausible folds that satisfy a pattern of predicted secondary structures and accessibilities.\\ \\ The best scoring fold \htmladdnormallink{(Figure 3a)}{figure3.ps} is that of profilin (PDB code 2BFPP), and the best scoring map gives an anti-parallel \be sheet with the strand order 218754 (predicted strand 6 is deleted) with one helix packing against each face. The second best scoring fold is a correct match with the PH domain from human dynamin (1DYNB), having deleted the first predicted \al helix from the PID pattern. The third best scoring fold (3c) comes from {\em S. aureus} \be lactamase (1BLH, domain 1), with an anti-parallel \be sheet of order 54876, with both helices packing against one face. The fourth and fifth best scoring folds come from members of the Ig superfamily, and comprise alternative arrangements of \be strands to form a greek key \be sandwich. Both of the predicted \al helices from the PID pattern have been deleted in these matches. Finally, the sixth (3e) match comes from the tryptic core of {\em E. coli lac} repressor (1TLFD domain 4), and comprises a parallel \be sheet (42576) with both helices packing against one face. This fold is perhaps the least plausible, since it would require 3 crossover connections between adjacent and parallel \be strands. \\ \\ The method has suggested plausible alternative structures that can be scrutinised, in the absence of 3D structural data, by way of further experiments, secondary structure predictions, or even other methods of fold recognition. The results show how the predicted secondary structure elements can be accommodated into a compact, plausible protein fold, and encouragingly, the method has identified the correct fold high in the list of alternatives. \section{Discussion \& Conclusions} In this paper we have presented a new method for protein fold recognition which exploits recent improvements in protein secondary structure prediction, and can use other information such as predictions of accessibility, loop lengths and experimental data to restrict possible folds. When applied to predicted secondary structures and accessibilities, the method has been shown to be slightly better than one widely used fold recognition method \cite{jones92} at detecting the correct fold for eleven test examples. The alignments generated by the method are of comparable accuracy at the residue-residue and secondary structure alignment level. When the query is defined by experimental secondary structures and accessibilities, the method is highly successful at recognising the correct fold. This suggests that the mapping method will improve alongside any future improvement in secondary structure and accessibility prediction. The method also has the advantage of being computationally inexpensive, and so allows for multiple searches to be performed quickly.\\ \\ The simplicity of the technique suggests several enhancements that could improve accuracy even further. The method of aligning sequences onto 3D structures might be developed by the use of empirically derived pair-potentials or accessibility preferences (e.g. \cite{sippl90,jones92}), or by the identification of favourable interaction sites between secondary structures \cite{cohen80a,cohen80b,cohen82}. A more sophisticated alignment and ranking procedure is under development.\\ \\ The initial alignment and filtering procedures are perhaps the most unique feature of this technique. Other techniques for fold-recognition tend only to provide a single sequence alignment of query and database structures. The use of a secondary structure element alignment method has the advantage that exhaustive comparisons of two proteins can be performed; most folds identified have an ensemble of alternative alignments that can be explored further.\\ \\ Since most protein structure similarities occur at the domain level, it is advantageous, whenever possible to split both query and database structures into domains. The problem of assigning domains for protein 3D structures has been the subject of revived interest \cite{holm94b,siddiqui95,sowdhamini95,islam95} and is likely to lead to accessible databases of protein structural domains. Assigning domains within proteins of unknown 3D structure is more problematic, though approaches based in sequence homology \cite{pongor94,sonnhammer94} are undoubtedly the most promising; the vWf and PID proteins above are both examples of domains that occur in a variety of multi-domain contexts.\\ % REF_RBR point to this paragraph in answer to Ref I's point gamma % \\ The method described here has applications in protein structure determination by NMR. During NMR structure determination, a preliminary secondary structure assignment (equivalent to a very accurate prediction) and a small number of distance restraints may be available early in the study. However, these data are usually insufficient to determine a unique structure by distance geometry or molecular dynamics \cite{smith-brown93}. Our results for the vWF and Proteasome domains suggest that the data may be sufficient to locate a similar fold in the database if one is present. Folds predicted from distance restraints and secondary structure assignment may be used to guide the assignment of cross-peaks and thus speed up the structure determination process. Clearly, the alternative consistent topologies may also give clues as to possible structural/functional/evolutionary relationships that are generally not known until after 3D structure determination (such as that described in Matthews {\em et al.}, 1994). \nocite{matthews94}\\ \\ We have shown that secondary structure predictions of typical accuracy, together with simple principles of protein 3D structures and/or experimental data can be used to recognise correct protein folds in a library of domains. These results and others \cite{edwards95,russell95,gerloff95} suggest that secondary structure prediction, experimental data, and protein structural principles should be used to augment protein fold recognition whenever possible. \section{Acknowledgements} We thank Professor L.N. Johnson for encouragement and support. We are indebted to Dr D.T. Jones (University of Warwick) for giving advice on the THREADER program and its database, Dr B. Rost (EMBL, Heidelberg) for providing the PHD program, Drs P. Bork (EMBL, Heidelberg) and S.J. Perkins (Royal Free Hospital, London) for providing prediction data for the PID and vWF domains, Dr S.K. Burley (Rockefeller University, New York) for providing the coordinates of the HNF--3 structure and Mr A. S. Siddiqui (LMB, Oxford) for providing a database of protein structural domains. RBR thanks Dr C. P. Ponting (Fibrinolysis Research Unit, Oxford) for helpful discussions. RBR and GJB thank the Royal Commission for the Exhibition of 1851 and the Royal Society for support. RRC is funded by an MRC studentship. This research was funded in part by a grant from the BBSRC (UK). % % We also thank our parents for raising us so well, % erm... I'd like to thank my producer and all my friends for % support during the long years, and for understanding. Thanks % to Bill at the Betty Ford clinic. % \section{$\ddag$Abbreviations} 3D three dimensional; NMR nuclear magnetic resonance; Ig Immunoglobulin; SDM site directed mutagenesis; WWW world wide web; The standard one-- and three--letter abbreviations for the amino acids are also used throughout. \section{Figure and Table Legends} \subsection{Figure 1} \htmladdnormallink{Figure~1}{figure1.ps}. An example of a useful `loose' similarity between 3D structure detected using the MAP method and a secondary structure prediction. a) The alignment found by the method between the predicted pattern for HNF--3 and the helical DNA binding motif within phage 434 repressor. Boxed, bold-faced, upper-case regions indicate aligned predicted and experimental secondary structures. Sec denotes the PHD prediction for HNF--3, and a 3-state DSSP secondary structure assignment for 434 repressor. Bur shows predicted and experimental states of burial for HNF--3 and 434 repressor: b = buried, e = exposed; u = intermediate/unknown. b) The equivalent alignment found using the STAMP (Russell \& Barton, 1992) structure comparison algorithm. Boxed, bold-faced, upper-case regions indicate structural equivalences. Sec denotes DSSP 3--state secondary structures for both proteins. c) and d) show the crystallographic structures of the matched regions of HNF--3 and 434 repressor, with structurally equivalent residues shown in ribbon/coil format, and unequivalent regions shown as \Cal trace. The N- and C- termini of the structures are labelled.\\ \subsection{Figure 2} \htmladdnormallink{Figure~2}{figure2.ps}. Search pattern for the von-Willebrand factor type A domain (derived from Edwards \& Perkins, 1995) as discussed in the text. \al helices are indicated by cylinders, \be strands by arrows. The range of numbers given beside each secondary structure or loop are the range of predicted lengths. Bullets ($\bullet$) show those secondary structure that are required for any possible map (i.e. those involved in distance restraints). Two distance restraints, one from a putative disulphide bond ($9.5$ \AA)and the other from knowledge of two residues thought to be involve in metal coordination ($15$ \AA) are shown to the left of the figure.\\ \subsection{Figure 3} \htmladdnormallink{Figure~3}{figure3.ps}. Maps from the top six scoring folds found during a search with the PID pattern. Details are given in the text.\\ \subsection{Table 6} \htmladdnormallink{Table~6}{ntable6.ps}. Results of running MAP using secondary structure assignments (I) and PHD secondary structure predictions (II) shown beside THREADER results (III) for eleven protein structures having type B and C similarities (Russell \& Barton 1994) within the domain database. The first column for each method shows the top ten scoring domains, which are denoted by a PDB four letter code (Bernstein {\em et al.}, 1977), a chain identifier as the fifth character (if any), followed by an underscore and a Roman numeral denoting the domain (if any). Bold inverted text denotes a correct match using the strict classification, grey backgrounds show loose classifications (see text). The second column shows the score for each domain, the third the protein structure class, and the fourth the name of the fold/structure. Upper case denotes fold families under the strict definitions. Upper case names in parentheses (if present) denote the name of the loose family classification. The globins 1HBG, 1MYGA and 1ECA and the cupredoxin 1PAZ are sequence similar to the query so are not shown inverted and are not included in the evaluation statistics (see text). Strict fold classifications: 4HB-1= Up-down-up-down four helix bundle (4HB); 4HB-2= up-up-down-down (interleukin-4 type) 4HB; GLOBIN= globin-type folds; W-HTH= winged helix-turn-helix (HTH) folds; EF-HAND= calcium binding EF hands; CYTOC= cytochromes C; THIO= thioredoxin-like folds; FLAVO= flavodoxin-like folds; ROSS= Rossman folds; PBL= periplasmic binding protein-like folds; ACTIN-ATPASE= actin/HSC-70/hexokinse like folds; G-PROT= G-protein (ras) like folds; FAD-BIND= FAD/NAD binding protein-like folds; \al\be-BARREL= \al\be (TIM) barrels; \be-GRASP= \be-grasp (ferredoxin) like folds; IG= Immunoglobulin superfamily; CUP= Cuppredoxins (plastocyanin-like); \be-TREFOIL= \be-trefoils (interleukin-1-\be-like); OB-FOLD= oligonucleotide/oligosaccharride binding folds. Loose fold classifications: 4HB= 4HB-1, 4HB-2, ferritin; HTH= W-HTH, $\lambda$-rep., trp-rep.; DWAB (doubly-wound-\al\be)= ROSS, FLAVO, THIO, PBL, G-PROT, sugar phosphatase, pfk, pgk, dhfr; GKBS (greek key \be sandwich)= IG, CUP, \al-amylase inhibitor, sod, macromycin, prealbumin. Other abbreviations: sod: superoxide dismutase; pfk= phosphrofuctokinase; pgk= phosphoglcerate kinase; dhfr= dihdrofolate reducatse; ldh= lactate dehydrogenase; ser-prot= serine proteinase; asp-prot= aspartic proteinase; inh.= inhibitor; rep.=repressor; glut.=glutathione; red.=reductase; thym. phosph.=thymidine phosphorylase; ribo.=ribonuclease; glyc.=glycoprotein; P-glucomutase= phosphoglucomutase; glyc. ribo trans.= glycinamide ribotransferase. \subsection{Table 7} \htmladdnormallink{Table~7}{ntable7.ps}. Summary of fold recognition success rates. Strict and Loose refer to the critera for structural similarity discussed in the text. Class refers to structural class success as discussed in the text. (1st) refers to success measured as a correct fold at rank 1, (Top 10) as a correct fold in the top 10 ranked structures. \section{Figures} [\htmladdnormallink{Figure 1}{figure1.ps}][\htmladdnormallink{Figure 2}{figure2.ps}][\htmladdnormallink{Figure 3}{figure3.ps}]. \section{Tables} [\htmladdnormallink{Table 1}{ntable1.ps}][\htmladdnormallink{Table 2}{ntable2.ps}][\htmladdnormallink{Table 3}{ntable3.ps}][\htmladdnormallink{Table 4}{ntable4.ps}][\htmladdnormallink{Table 5}{ntable5.ps}][\htmladdnormallink{Table 6}{ntable6.ps}][\htmladdnormallink{Table 7}{ntable7.ps}][\htmladdnormallink{Table 8}{ntable8.ps}]. \nocite{TitlesOn} \bibliographystyle{jmb} \bibliography{rbr} \end{document}