Next: Protein Structure Patterns Up: Methods Previous: Maps removed by

Fitting sequences on to 3D structures

Accessibilities for residues within each map are calculated quickly by exploting the relationship between relative accessibility and the number of other atoms within Å () of a residues atom. is calculated by considering secondary structures and the C-terminal coils for the matched structures. Analysis of the high quality domains shows that helical residues are buried (b) when , exposed (e) when and intermediate/unknown (u) otherwise. Similarly, residues in strands are b when , e when and u otherwise. In the examples presented here, predicted accessibilities were taken from the SUB line within PHD [Rost \& Sander, 1994] output, which highlights those regions predicted with confidence. Remaining positions were assigned as unknown (u) accessibility.

Given assignments of accessibility, the best alignment for each pair of secondary structures not permitting gaps within either secondary structure is found by applying the scoring matrix shown in Table 3. These values were chosen to prevent long overhanging gaps in the alignment of predicted and experimental secondary structures, and designed not to penalise mismatches too heavily. The total similarity score for the alignment is then defined as:

where is the best score for a pair of matched secondary structures calculated by summing values from Table 3, is the number of matched secondary structures, and is the total difference in the lengths of the two protein domains being compared. When calculating those secondary structures that have been equivalenced are ignored, since overhanging gaps are already penalised by the gap score in Table 3.


gjb@bioch.ox.ac.uk