Next: Sequence similarity Up: Methods Previous: Methods

Multiple alignment of protein 3D structures

The protein structural families considered are given in Table 1. All structures are refined and of a resolution of Å or better, with the exception of the virus coat proteins, which are all refined structures with resolutions between and Å. As a further test of structural quality, PROCHECK [Morris et al., 1992] was run on all proteins in the dataset using a resolution of Å. Those structures showing large deviations (i.e., ``WORSE'') from typical values for main- and side-chain parameters were not used in the study. Despite poorer resolutions, the viral coat proteins listed in Table 1 were found to have a stereochemical quality comparable to a good Å structure, which is expected since molecular averaging greatly improves the quality of these medium resolution structures. All structures were taken from the January 1993 release of the Brookhaven protein databank [Bernstein et al., 1977], with the exception of sheep 6-phosphogluconate dehydrogenase, L. mesenteroids glucose-6-phosphate dehydrogenase (PGD and G6P; kindly provided by Dr M. J. Adams); the human fyn SH3 domain (SH3; kindly provided by Dr M. E. M. Noble); chicken src SH2 domain (SH2; kindly provided by Dr J. Kuriyan); and human HNF-3 (HNF; kindly provided by Dr S. K. Burley).

Alignments were generated using the STAMP package [Russell, 1994][Russell \& Barton, 1992]. Pairs of structures which could not be aligned accurately by the method (due to gross structural deviations, etc.) were ignored. It is important to emphasise that the alignments used in this paper are derived from comparison of three-dimensional structures, and thus provide a more accurate set of residue equivalences than alignments created without 3D structural information. Within each family of protein 3D structures, every possible pair of structures were aligned separately to give the most accurate structural alignment. Structurally equivalent regions were defined according to Russell &Barton (1992) by those regions having a residue-by-residue structural similarity index for segments of two or more residues.

A total of pairs of aligned protein 3D structures, varying in sequence and 3D structural similarity, were obtained. Type similarities were defined as those proteins having a sequence similarity (, see below) greater than . The remaining similarities were classified as type or depending on whether the proteins were functionally similar. pairs were classified as type , as type and as type . In Table 1, functionally similar proteins (i.e., types and ) are named (e.g., ``Oxygen carriers'' within the globin fold family) and enclosed by single lines. In all the plots that follow, similarities of types and are indicated by the symbol `x' and type similarities are indicated by the symbol `o'. Type and / similarities are separated in all plots by a solid line at .

To get a measure of background, sixteen pairs of dissimilar structures were aligned using a sequence comparison algorithm [Barton \& Sternberg, 1987], with Dayhoff's PAM250 matrix [Dayhoff et al., 1978] and a fixed gap opening penalty of . These pairs are given in Table 2. In all the plots that follow, dissimilar structural pairs are indicated by the symbol `d'.



Next: Sequence similarity Up: Methods Previous: Methods


gjb@
Thu Feb 9 18:06:48 GMT 1995