next up previous contents
Next: Database Scanning Up: Worked examples Previous: Setup of examples   Contents

Multiple alignment using an initial multiple sequence alignment

Mammalian and Bacterial Serine Proteinases


(This example is discussed in Russell & Barton, (1992).)

Despite a pronounced functional similarity (a highly conserved catalytic triad), this family of proteins shows little overall sequence similarity. Indeed, sequence alignment methods generally fail to provide an accurate alignment of these protein sequences. In situations like these, STAMP can be used to provide an accurate alignment of protein sequences based on a comparison of 3D structure. This can often reveal regions of weak sequence similarity that are not detectable during a comparison of sequence. The files for this example are in the directory examples/s_prot in the STAMP installation directory.

The procedure in this example is to create a multiple sequence alignment which is fed into the ALIGNFIT program to create an initial rough multiple structure alignment which can then be refine by STAMP.
The list of the domains to be aligned is given in the file s_prot.domains. The sequences are extracted from the PDB files by using the domain file with PDBSEQ:

pdbseq -f s_prot.domains > s_prot.seqs

This produces the file s_prot.seqs. This file is used to generate a multiple alignment using AMPS, the alignment being stored in the file s_prot_amps.align. This file is in AMPS format, which is the only format that ALIGNFIT can read. However, alignments in other formats can be converted to AMPS format using the Jalview (www.jalview.org) or ACONVERT program. For example, if the alignment had been in Clustal W format it could have been converted by running:

aconvert -in c -out b < sprot_clw.aln > s_prot_amps.align

Running:

aconvert -h

will list the command-line arguments that ACONVERT accepts.

Now that we have the multiple alignment, we can run ALIGNFIT on it:

alignfit -f s_prot_amps.align -d s_prot.domains -out s_prot_alignfit.trans

giving the output:

ALIGNFIT R.B. Russell 1995
Reading in block file...
Blocfile read: Length: 261
Reading in coordinate descriptions...
Reading coordinates...
Checking for inconsistencies...
Doing pairwise comparisons...
Doing treewise comparisons...
ALIGNFIT done.
Look in the file s_prot_alignfit.trans for output and details

The final transformation is in the file s_prot_alignfit.trans.

This provides an initial set of transformations for use by STAMP. To run STAMP type:

stamp -l s_prot_alignfit.trans -prefix s_prot

This should produce the following output on the terminal:

STAMP Structural Alignment of Multiple Proteins

Version 4.4 (May 2010)

 by Robert B. Russell & Geoffrey J. Barton 
 Please cite PROTEINS, v14, 309-323, 1992


    Sc = STAMP score, RMS = RMS deviation, Align = alignment length
    Len1, Len2 = length of domain, Nfit = residues fitted
    Secs = no. equivalent sec. strucs. Eq = no. equivalent residues
    %I = seq. identity, %S = sec. str. identity
    P(m)  = P value (p=1/10) calculated after Murzin (1993), JMB, 230, 689-694
            (NC = P value not calculated - potential FP overflow)

     No.  Domain1         Domain2         Sc     RMS    Len1 Len2  Align NFit Eq. Secs.   %I   %S   P(m)
Pair   1  4chaa           3est            7.74   1.15    239  240   242  217 214   20  41.59  84.58 1.32e-33 
Pair   2  4chaa           2ptn            7.81   0.97    239  223   234  206 203   20  47.78  91.63 8.32e-43 
Pair   3  4chaa           1ton            6.92   1.19    239  227   241  191 189   19  40.21  89.42 8.18e-28 
Pair   4  4chaa           3rp2a           7.46   1.09    239  224   235  203 199   18  37.19  85.43 6.38e-24 
Pair   5  4chaa           2pkaab          7.28   1.14    239  232   241  203 202   20  37.13  90.10 6.62e-25 
Pair   6  4chaa           1sgt            7.09   1.26    239  223   239  197 191   20  36.13  90.05 2.86e-22 
Pair   7  4chaa           2sga            3.64   1.66    239  181   240  109 101   15  28.71  84.16 8.84e-08 
Pair   8  4chaa           3sgbe           3.62   1.56    239  185   240  105  95   15  26.32  85.26 3.48e-06 

<etc.>
Reading in matrix file s_prot.mat...
Doing cluster analysis...
Cluster:  1 (    2ptn  &   2pkaab ) Sc  8.50 RMS   1.07 Len 232 nfit 216
 See file s_prot.1 for the alignment and transformations
Cluster:  2 (    2sga  &    3sgbe ) Sc  8.36 RMS   0.65 Len 191 nfit 166
 See file s_prot.2 for the alignment and transformations
Cluster:  3 (    1ton  &     2ptn   2pkaab ) Sc  9.03 RMS   0.73 Len 239 nfit 205
 See file s_prot.3 for the alignment and transformations
Cluster:  4 (   3rp2a  &     1ton     2ptn   2pkaab ) Sc  8.73 RMS   0.93 Len 242 nfit 206
 See file s_prot.4 for the alignment and transformations
Cluster:  5 (    3est  &    3rp2a     1ton     2ptn   2pkaab ) Sc  8.51 RMS   1.13 Len 258 nfit 208
 See file s_prot.5 for the alignment and transformations
Cluster:  6 (   4chaa  &     3est    3rp2a     1ton     2ptn   2pkaab ) Sc  8.18 RMS   1.01 Len 260 nfit 201
 See file s_prot.6 for the alignment and transformations
Cluster:  7 (    2alp  &     2sga    3sgbe ) Sc  8.35 RMS   1.06 Len 203 nfit 168
 See file s_prot.7 for the alignment and transformations
Cluster:  8 (    1sgt  &    4chaa     3est    3rp2a     1ton     2ptn   2pkaab ) Sc  7.70 RMS   1.11 Len 267 nfit 190
 See file s_prot.8 for the alignment and transformations
Cluster:  9 (    1sgt    4chaa     3est    3rp2a     1ton     2ptn   2pkaab  &     2alp     2sga    3sgbe ) Sc  4.78 RMS   1.82 Len 292 nfit 122
 See file s_prot.9 for the alignment and transformations

The various fields describe details of the pairwise and treewise comparisons: $S_{c}$, RMS deviation, the alignment length (Align), the length of each structure in residues (Len1, Len2), the number of atoms used in the RMS fit (Nfit), the number of equivalent secondary structure elements (Secs), and the number of equivalent residues (see above, Eq.).

STAMP will also produce several files:

s_prot.mat - a file containing the information used to derive the structural similarity tree (i.e. the output from the PAIRWISE) mode. This is an upper diagonal matrix containing the pairwise $S_{c}$ values.

s_prot.$N$ - a series of files containing transformations and alignments created by running the TREEWISE mode in STAMP. Each file corresponds to a node in the similarity tree (i.e. a cluster), where two groups of one or more structures have been combined to form an alignment and transformations. The higher the value of $N$ the more structurally dissimilar the proteins contained in the file are. Highly similar structures are clustered (aligned/superimposed) at an early stage in the program's run, with more distantly related structures being clustered towards the end.

The top of each s_prot.$N$ file contains the information needed to generate superimposed coordinates using TRANSFORM. For example, running:

transform -f s_prot.9 -g -o s_prot.pdb

will create a PDB file containing all of the structures from the alignment in s_prot.8.

After these details, various details of the similarity (RMS deviation, $S_{c}$ value, etc) are given. The bottom portion of the file contains the structural alignment in STAMP format. Positions that do not include gaps contain information as to the degree of local structural similarity, such as the distance between (averaged) ${\rm C}_{\alpha}$ atoms, and the $P_{ij}^{\prime}$ value.

Methods for displaying sequence alignments and structures are described below.


next up previous contents
Next: Database Scanning Up: Worked examples Previous: Setup of examples   Contents