next up previous contents
Next: Protein domain databases Up: Worked examples Previous: Database Scanning

Using scans as a starting point for multiple alignment

In certain instances initial fits based on multiple sequence alignment will be far from accurate, such that even an initial conformation based fit will not be able to correct the poor initial superposition, and even genuine structural homology will be missed. In these instances it is possible to make use of the SCAN option to provide a more accurate initial superimposition.

To do this one need only select one representative of the domains to be superimposed and use this domain in a sensiitve scan of the other domains. By applying the same techinques as used for the scan with the Ig light variable domain (above) one can arrive at a set of initial transformations consisting of the transformations of all other domains onto the domain which was used as a query for the scan.

Aspartic Proteinase Domains

An example of how such an initial superimposition might be obtained is shown by the alignment of the aspartly proteinase N and C terminal lobes (see directory examples/ac_prot):

The N--terminal domain of 1CMS (in the file 1cmsN.domain) can be used to scan a list of all aspartyl proteinase N-- and C-- terminal domains (ac_prot.domains):

stamp -l 1cmsN.domain -n 2 -s -slide 5 -d ac_prot.domains -prefix ac_prot

Should produce:

        Domain1    Domain2   Fits  Sc       RMS Len1 Len2 Align Fit   Eq. Secs    %I    %S 
Scan      1cmsN      1cmsN    1   9.744   0.000  174  175  175  174  173   18  99.43  94.86 
Scan      1cmsN      1cmsC    1   2.749   2.352  204  175  148   63   58   13   8.00  41.14 
Scan      1cmsN      4apeN    1   7.854   1.314  181  175  178  155  153   15  27.53  80.90 
Scan      1cmsN      4apeC    1   2.901   2.207  208  175  152   69   68   14   7.43  40.57 
Scan      1cmsN      3appN    1   7.574   1.291  182  175  174  148  147   18  26.86  83.43 
Scan      1cmsN      3appC    1   2.775   2.422  205  175  149   61   58   13   7.43  41.71 
Scan      1cmsN      2aprN    1   8.246   1.115  177  175  178  160  157   15  32.02  83.15 
Scan      1cmsN      2aprC    1   2.941   2.099  201  175  147   65   58   14   9.14  39.43 
Scan      1cmsN      4pepN    1   8.901   1.020  173  175  174  169  168   15  56.57  89.71 
Scan      1cmsN      4pepC    1   2.738   2.561  206  175  152   64   52   11   8.00  40.00 
See the file ac_prot.scan

The file ac_prot.scan will contain all 10 domains superimposed onto 1cmsN. Note that we haven't run the program with the `-cut' option, since the file ac_prot.domains contains an assignment of domains (done by me using molecular graphics). Running SORTTRANS removes any redundancies:

sorttrans -f ac_prot.scan -s Sc 2.5 > ac_prot.sorted

and running stamp will generate the multiple alignment as described for the serine proteinase and globin examples above.

stamp -l ac_prot.sorted -prefix ac_prot

The output files from running all of these programs appear in the directory examples/ac_prot.



Rob Russell and Geoff Barton