The secondary structure of the protein is represented as a sequence of
H and B characters where each H represents an entire helix and
each B a
strand. A fast method for generating all exact matching
alignments between two strings that allows up to a maximum number of
deletions from each string [Russell et al., 1995] is used to find all maps
between the query pattern of secondary structures and the domain
database. The method is recursive, and reminiscent of regular
expression matching. In this study up to two deletions were permitted
from the query secondary structure string, to allow for errors in the
prediction. Up to five deletions were permitted from each database
structure, to allow insertions or deletions of secondary structures
typical of proteins having similar 3D structures in the absence of
sequence similarity. Deletions from the database structure were only
counted if they were contained within matched elements (overhanging
deletions were ignored). Explicit mismatches were not allowed, but were
treated as deletions from either the query or database structure.
These values were chosen since they are typical of the expected
accuracy of secondary structure prediction, and typical of insertions
and deletions of secondary structure elements across members of a
diverse structural family. In practice, the allowable deletions from
query and database should be chosen on a case by case basis.
For consistency, we kept the maximum numbers of deletions fixed
during this study.