Next: Materials and Methods Up: No Title Previous: Analysis of the

Discussion

The algorithm described in this paper can locate domains for any length of protein and is fast enough to be run routinely on the large database of protein structures. After screening, the domain definitions agree very well with conventional subjective definitions (97%). The algorithm could be developed to include the screens at an earlier stage and thus detect unlikely domains, alter the relevant constraint values, then run the analysis again.

Most of the differences between the automatically derived domain definitions and the reference definitions lie with difficulties and inconsistencies in what is meant by a `domain'. The algorithm described here finds compact local regions of structure according to a set of thresholds (Table 6). However, these compact regions do not always correspond to what one would intuitively consider to be the domains in the protein. This problem is common to all previous algorithms for protein domain definition [Rossmann & Liljas, 1974,Crippen, 1978,Rose, 1979,Wodak & Janin, 1981,Rashin, 1981,Go, 1983,Zehfus & Rose, 1986,Zehfus, 1994,Holm & Sander, 1994], and is an inevitable consequence of applying an objective set of rules for domain definition to what is an essentially subjective interpretation. A major advantage of the algorithm described here is the ability to screen accurately the derived domains for domains that are unlikely to fit the normal concept of a domain. Accordingly, the final list of domains may be used with a high degree of confidence. A server of domain definitions, accessible via the World Wide Web, can be found at http://www.compbio.dundee.ac.uk/.

as@bioch.ox.ac.uk