next up previous contents
Next: Protein domain databases Up: Worked examples Previous: Database Scanning

Using scans as a starting point for multiple alignment

In certain instances initial fits based on multiple sequence alignment will be far from accurate, such that even an initial conformation based fit will not be able to correct the poor initial superposition, and even genuine structural homology will be missed. In these instances it is possible to make use of the SCAN option to provide a more accurate initial superimposition.

To do this one need only select one representative of the domains to be superimposed and use this domain in a sensiitve scan of the other domains. By applying the same techinques as used for the scan with the Ig light variable domain (above) one can arrive at a set of initial transformations consisting of the transformations of all other domains onto the domain which was used as a query for the scan.

Aspartic Proteinase Domains

An example of how such an initial superimposition might be obtained is shown by the alignment of the aspartly proteinase N and C terminal lobes (see directory examples/ac_prot):

The N-terminal domain of 1CMS (in the file 1cmsN.domain) can be used to scan a list of all aspartyl proteinase N- and C- terminal domains (ac_prot.domains):

stamp -l 1cmsN.domain -n 2 -s -slide 5 -d ac_prot.domains -prefix ac_prot

Should produce:

     Domain1  Domain2   Fits  Sc      RMS   Len1 Len2 Align Fit   Eq. Secs    %I    %S     P(m)
Scan 1cmsN    1cmsN       1   9.800  10.091  175  175  175  175  174   18  99.43  94.29 0.00e+00 
Scan 1cmsN    1cmsC       2   3.211   7.858  175  148  204   64   57   13   7.43  25.14 2.37e-03 
Scan 1cmsN    4apeN       1   8.195   9.708  175  178  182  155  151   15  26.97  72.47 1.36e-13 
Scan 1cmsN    4apeC       1   3.434   7.939  175  152  210   69   68   14   5.14  30.29 1.00e+00 
Scan 1cmsN    3appN       1   7.967   9.830  175  174  183  149  148   18  26.86  74.86 2.51e-13 
Scan 1cmsN    3appC       1   3.260   8.137  175  149  206   63   54   13   5.71  24.57 2.32e-02 
Scan 1cmsN    2aprN       1   8.386  10.130  175  178  178  158  154   15  30.34  76.40 3.81e-17 
Scan 1cmsN    2aprC       1   3.335   7.787  175  147  202   68   62   14   6.86  27.43 1.11e-02 
Scan 1cmsN    4pepN       1   8.880  10.162  175  174  174  170  169   15  56.00  87.43 3.00e-53 
Scan 1cmsN    4pepC       1   3.227   8.315  175  152  206   63   51   11   6.86  24.00 2.61e-03 
See the file ac_prot.scan

The file ac_prot.scan will contain all 10 domains superimposed onto 1cmsN. Note that we haven't run the program with the `-cut' option, since the file ac_prot.domains contains an assignment of domains (done by me using molecular graphics). Running SORTTRANS removes any redundancies:

sorttrans -f ac_prot.scan -s Sc 2.5 > ac_prot.sorted

and running stamp will generate the multiple alignment as described for the serine proteinase and globin examples above.

stamp -l ac_prot.sorted -prefix ac_prot

The output files from running all of these programs appear in the directory examples/ac_prot.


next up previous contents
Next: Protein domain databases Up: Worked examples Previous: Database Scanning
Geoff Barton
1999-04-16