next up previous contents
Next: Contents

Continuous and Discontinuous Domains: An Algorithm for the Automatic Generation of Reliable Protein Domain Definitions

Asim S. Siddiqui and Geoffrey J. Barton University of Oxford
Laboratory of Molecular Biophysics
The Rex Richards Building
South Parks Road
Oxford OX1 3QU
Tel: 44 865 275368
FAX: 44 865 510454
Author to whom correspondence should be addressed.
(e-mail: geoff@biop.ox.ac.uk)
(The published version of this preprint appeared in Protein Science (1995),4:872-884

Abstract:

An algorithm is presented for the fast and accurate definition of protein structural domains from coordinate data without prior knowledge of the number or type of domains. The algorithm explicitly locates domains that comprise one or two continuous segments of protein chain. Domains that include more than two segments are also located.

The algorithm was applied to a non-redundant database of 230 protein structures and the results compared to domain definitions obtained from the literature, or by inspection of the coordinates on molecular graphics. For 70% of the proteins, the derived domains agree with the reference definitions, 18% show minor differences and only 12% (28 proteins) show very different definitions. Three screens were applied to identify the derived domains least likely to agree with the subjective definition set. These screens revealed a set of 173 proteins, 97% of which agree well with the subjective definitions.

The algorithm represents a practical domain identification tool that can be run routinely on the entire structural database. Adjustment of parameters also allows smaller compact units to be identified in proteins.

Keywords: automatic domain definitions, protein structural domains, contacts, domains database





as@bioch.ox.ac.uk