Next: Database Scanning Up: Worked examples Previous: Multiple alignment (PAIRWISE and

Alignment using an initial rough superimposition

This method avoids having to create an initial sequence alignment, and tends to work for homologous proteins, or those having very similar lengths despite no sequence similarity.

Globins

Since the globin sequences are of similar length an initial superimposition accurate enough to proceed with STAMP can be obtained by merely aligning the N-terminal ends of the sequences and using whatever equivalences result to obtain an initial superimposition. The command ROUGH (ROUGHFIT procedure) is used. In addition, an initial conformation based fit is performed in order that any inaccuracies in this initial superimposition may be corrected. See the directory examples/globins.

To run STAMP in this example, type:

stamp -l globin.domains -rough -n 2 -prefix globin

should produce the following on the standard output (ignoring the header):

Running roughfit.

    Sc = STAMP score, RMS = RMS deviation, Align = alignment length
    Len1, Len2 = length of domain, Nfit = residues fitted
    Secs = no. equivalent sec. strucs. Eq = no. equivalent residues
    %I = seq. identity, %S = sec. str. identity
    P(m)  = P value (p=1/10) calculated after Murzin (1993), JMB, 230, 689-694

     No.  Domain1  Domain2  Sc     RMS    Len1 Len2  Align NFit Eq. Secs.   %I   %S    P(m)
Pair   1  2hhbb    2hhba    6.59  10.63    146  141   147  125 125    7  39.04  72.60 1.45e-24 
Pair   2  2hhbb    2lhb     5.72  10.08    146  149   151  120 120    7  20.13  68.46 1.29e-06 
Pair   3  2hhbb    4mbn     6.03   9.93    146  153   155  122 114    7  18.95  66.01 1.32e-06 
Pair   4  2hhbb    1ecd     6.61  10.37    146  136   143  115 109    7  15.07  65.07 6.37e-04 
Pair   5  2hhbb    1lh1     5.62  10.89    146  153   155  106  92    5   9.80  49.02 1.96e-02 
                                     <etc.>
Pair  14  4mbn     1lh1     4.73  10.30    153  153   159   91  77    6  10.46  45.75 2.21e-03 
Pair  15  1ecd     1lh1     5.84  11.38    136  153   149  110 101    6  11.76  57.52 5.94e-03 
Reading in matrix file globin.mat...
Doing cluster analysis...
Cluster: 1 (  2hhba &   4mbn ) Sc 7.65 RMS 10.25 Len 148 nfit 134 
                                     <etc.>
Cluster: 5 (  1lh1 &   2lhb  2hhba   4mbn  2hhbb   1ecd ) Sc 7.63 RMS 10.11 Len 158 nfit 112 
 See file globin.5 for the alignment and transformations

where the output and files are as described for the serine proteinase example above, with `s_prot' replaced with `globin'.

-rough performs the initial superimpositions (ROUGHFIT) and -n 2 means that the conformation biased fit will be performed before the final fit. This conformation biased fit is usually necessary when the initial superimpositions are approximate.

ROUGHFIT will not always work. Note that in this example all the pairwise S_c values are above 5.6, suggesting strong structural similarity. If when using the ROUGHFIT option you find low S_c values (the program will cry out `LOW SCORE'), this usually means that ROUGHFIT hasn't managed to generate a good enough starting superimposition, and you should try something else, such as is described in the next section.

Next: Database Scanning Up: Worked examples Previous: Multiple alignment (PAIRWISE and

Geoff Barton
1999-04-16