Next: Protein domain databases Up: Worked examples Previous: Generating a set of Contents

Alignment without an initial multiple alignment using ROUGHFIT

This method described in this section, where the ROUGHFIT mode is used to create an initial alignment, was the one originally used in STAMP for cases where an initial multiple sequence alignment was not available. The SCAN mode (see the aligning section on aligning protease domains) now makes it possible to create a reasonable starting alignment even for cases where an accurate alignment based on sequence is impossible. Apart from cases where the structures are homologous or of very similar length, using the SCAN mode generally produces better results than using ROUGHFIT. Accordingly, ROUGHFIT is deprecated in favour of using SCAN mode as a starting point. It documented here for the sake of completeness.

This method avoids having to create an initial multiple sequence alignment and tends to work for homologous proteins, or those having very similar lengths despite no sequence similarity.

Globins

Since the globin sequences are of similar length an initial superimposition accurate enough to proceed with STAMP can be obtained by merely aligning the N-terminal ends of the sequences and using whatever equivalences result to obtain an initial superimposition. The command ROUGH (ROUGHFIT procedure) is used. In addition, an initial conformation based fit is performed in order that any inaccuracies in this initial superimposition may be corrected. See the directory examples/globins.

To run STAMP in this example, type:

stamp -l globin.domains -rough -n 2 -prefix globin

This should produce the following on the standard output (ignoring the header):

STAMP Structural Alignment of Multiple Proteins

Version 4.4 (May 2010)

 by Robert B. Russell & Geoffrey J. Barton 
 Please cite PROTEINS, v14, 309-323, 1992

Running roughfit.

    Sc = STAMP score, RMS = RMS deviation, Align = alignment length
    Len1, Len2 = length of domain, Nfit = residues fitted
    Secs = no. equivalent sec. strucs. Eq = no. equivalent residues
    %I = seq. identity, %S = sec. str. identity
    P(m)  = P value (p=1/10) calculated after Murzin (1993), JMB, 230, 689-694
            (NC = P value not calculated - potential FP overflow)

     No.  Domain1         Domain2         Sc     RMS    Len1 Len2  Align NFit Eq. Secs.   %I   %S   P(m)
Pair   1  2hhbb           2hhba           8.19   1.38    146  141   147  136 135    7  44.44  82.96 4.82e-25 
Pair   2  2hhbb           2lhb            7.11   1.39    146  149   151  127 127    7  26.77  86.61 4.90e-08 
Pair   3  2hhbb           4mbn            8.06   1.38    146  153   151  141 139    8  25.18  87.05 1.57e-07 
Pair   4  2hhbb           1ecd            6.89   2.04    146  136   144  127 119    7  20.17  86.55 3.91e-04 
Pair   5  2hhbb           1lh1            5.89   2.39    146  153   155  120 110    6  17.27  80.91 6.62e-03 
Pair   6  2hhba           2lhb            6.54   1.66    141  149   150  122 119    7  34.45  88.24 3.97e-13 
Pair   7  2hhba           4mbn            7.78   1.39    141  153   148  136 133    8  27.07  87.97 1.51e-08 
Pair   8  2hhba           1ecd            6.61   2.18    141  136   145  124 118    8  17.80  87.29 3.47e-03 
Pair   9  2hhba           1lh1            5.95   2.20    141  153   153  117 106    6  14.15  82.08 4.45e-02 
Pair  10  2lhb            4mbn            7.13   1.23    149  153   149  131 130    8  25.38  90.77 2.82e-07 
Pair  11  2lhb            1ecd            6.42   1.93    149  136   145  124 123    8  18.70  87.80 1.34e-03 
Pair  12  2lhb            1lh1            5.74   2.11    149  153   155  117 105    6  19.05  85.71 2.04e-03 
Pair  13  4mbn            1ecd            7.46   1.64    153  136   145  134 132    8  21.21  87.88 6.21e-05 
Pair  14  4mbn            1lh1            6.76   2.35    153  153   155  135 133    6  17.29  84.21 3.35e-03 
Pair  15  1ecd            1lh1            5.96   2.59    136  153   149  121 114    6  15.79  87.72 1.62e-02 
Reading in matrix file globin.mat...
Doing cluster analysis...
Cluster:  1 (   2hhbb  &    2hhba ) Sc  8.19 RMS   1.38 Len 147 nfit 136 
 See file globin.1 for the alignment and transformations
Cluster:  2 (    4mbn  &    2hhbb    2hhba ) Sc  8.96 RMS   1.31 Len 151 nfit 138 
 See file globin.2 for the alignment and transformations
Cluster:  3 (    1ecd  &     4mbn    2hhbb    2hhba ) Sc  8.35 RMS   1.81 Len 146 nfit 128 
 See file globin.3 for the alignment and transformations
Cluster:  4 (    2lhb  &     1ecd     4mbn    2hhbb    2hhba ) Sc  8.24 RMS   1.23 Len 152 nfit 120 
 See file globin.4 for the alignment and transformations
Cluster:  5 (    1lh1  &     2lhb     1ecd     4mbn    2hhbb    2hhba ) Sc  7.70 RMS   2.46 Len 160 nfit 121 
 See file globin.5 for the alignment and transformations

where the output and files are as described for the serine proteinase example above, with `s_prot' replaced with `globin'.

-rough performs the initial superimpositions (ROUGHFIT) and -n 2 means that the conformation biased fit will be performed before the final fit. This conformation biased fit is usually necessary when the initial superimpositions are approximate.

ROUGHFIT will not always work. Note that in this example all the pairwise $S_{c}$ values are above , suggesting strong structural similarity. If when using the ROUGHFIT option you find low $S_{c}$ values (the program will flag the values with the message `LOW SCORE'), this usually means that ROUGHFIT hasn't managed to generate a good enough starting superimposition, and you should try using SCAN mode to generate an initial alignment, as described in the previous section.

Next: Protein domain databases Up: Worked examples Previous: Generating a set of Contents