Next: Filters Up: Methods Previous: Database of unique

Alignment of secondary structures

The secondary structure of the protein is represented as a sequence of H and B characters where each H represents an entire helix and each B a strand. A fast method for generating all exact matching alignments between two strings that allows up to a maximum number of deletions from each string [Russell et al., 1995] is used to find all maps between the query pattern of secondary structures and the domain database. The method is recursive, and reminiscent of regular expression matching. In this study up to two deletions were permitted from the query secondary structure string, to allow for errors in the prediction. Up to five deletions were permitted from each database structure, to allow insertions or deletions of secondary structures typical of proteins having similar 3D structures in the absence of sequence similarity. Deletions from the database structure were only counted if they were contained within matched elements (overhanging deletions were ignored). Explicit mismatches were not allowed, but were treated as deletions from either the query or database structure. These values were chosen since they are typical of the expected accuracy of secondary structure prediction, and typical of insertions and deletions of secondary structure elements across members of a diverse structural family. In practice, the allowable deletions from query and database should be chosen on a case by case basis. For consistency, we kept the maximum numbers of deletions fixed during this study.


gjb@bioch.ox.ac.uk