The algorithm was applied to a non-redundant database of 230 protein structures and the results compared to domain definitions obtained from the literature, or by inspection of the coordinates on molecular graphics. For 70% of the proteins, the derived domains agree with the reference definitions, 18% show minor differences and only 12% (28 proteins) show very different definitions. Three screens were applied to identify the derived domains least likely to agree with the subjective definition set. These screens revealed a set of 173 proteins, 97% of which agree well with the subjective definitions.
The algorithm represents a practical domain identification tool that can be run routinely on the entire structural database. Adjustment of parameters also allows smaller compact units to be identified in proteins.
Keywords: automatic domain definitions, protein structural domains,
contacts, domains database