Mammalian and Bacterial Serine Proteinases
(This example is discussed in Russell & Barton, (1992).)
Despite a pronounced functional similarity (a highly conserved catalytic
triad), this family of proteins shows little overall sequence similarity.
Indeed, sequence alignment methods generally fail to provide an accurate
alignment of these protein sequences. In situations like these, STAMP
can be used to provide an accurate alignment of protein sequences based
on a comparison of 3D structure. This can often reveal regions of weak
sequence similarity that are not detectable during a comparison of
sequence. The files for this example are in the directory examples/s_prot in the
STAMP installation directory.
The procedure in this example is to create a multiple sequence alignment which
is fed into the ALIGNFIT program to create an initial rough multiple
structure alignment which can then be refine by STAMP.
The list of the domains to be aligned is given in the file s_prot.domains.
The sequences are extracted from the PDB files by using the domain file
with PDBSEQ:
pdbseq -f s_prot.domains > s_prot.seqs
This produces the file s_prot.seqs. This file is used to generate a multiple alignment using AMPS, the alignment being stored in the file s_prot_amps.align. This file is in AMPS format, which is the only format that ALIGNFIT can read. However, alignments in other formats can be converted to AMPS format using the Jalview (www.jalview.org) or ACONVERT program. For example, if the alignment had been in Clustal W format it could have been converted by running:
aconvert -in c -out b < sprot_clw.aln > s_prot_amps.align
Running:
aconvert -h
will list the command-line arguments that ACONVERT accepts.
Now that we have the multiple alignment, we can run ALIGNFIT on it:
alignfit -f s_prot_amps.align -d s_prot.domains -out s_prot_alignfit.trans
giving the output:
ALIGNFIT R.B. Russell 1995 Reading in block file... Blocfile read: Length: 261 Reading in coordinate descriptions... Reading coordinates... Checking for inconsistencies... Doing pairwise comparisons... Doing treewise comparisons... ALIGNFIT done. Look in the file s_prot_alignfit.trans for output and details
The final transformation is in the file s_prot_alignfit.trans.
This provides an initial set of transformations for use by
STAMP. To run STAMP type:
stamp -l s_prot_alignfit.trans -prefix s_prot
This should produce the following output on the terminal:
STAMP Structural Alignment of Multiple Proteins Version 4.4 (May 2010) by Robert B. Russell & Geoffrey J. Barton Please cite PROTEINS, v14, 309-323, 1992 Sc = STAMP score, RMS = RMS deviation, Align = alignment length Len1, Len2 = length of domain, Nfit = residues fitted Secs = no. equivalent sec. strucs. Eq = no. equivalent residues %I = seq. identity, %S = sec. str. identity P(m) = P value (p=1/10) calculated after Murzin (1993), JMB, 230, 689-694 (NC = P value not calculated - potential FP overflow) No. Domain1 Domain2 Sc RMS Len1 Len2 Align NFit Eq. Secs. %I %S P(m) Pair 1 4chaa 3est 7.74 1.15 239 240 242 217 214 20 41.59 84.58 1.32e-33 Pair 2 4chaa 2ptn 7.81 0.97 239 223 234 206 203 20 47.78 91.63 8.32e-43 Pair 3 4chaa 1ton 6.92 1.19 239 227 241 191 189 19 40.21 89.42 8.18e-28 Pair 4 4chaa 3rp2a 7.46 1.09 239 224 235 203 199 18 37.19 85.43 6.38e-24 Pair 5 4chaa 2pkaab 7.28 1.14 239 232 241 203 202 20 37.13 90.10 6.62e-25 Pair 6 4chaa 1sgt 7.09 1.26 239 223 239 197 191 20 36.13 90.05 2.86e-22 Pair 7 4chaa 2sga 3.64 1.66 239 181 240 109 101 15 28.71 84.16 8.84e-08 Pair 8 4chaa 3sgbe 3.62 1.56 239 185 240 105 95 15 26.32 85.26 3.48e-06 <etc.> Reading in matrix file s_prot.mat... Doing cluster analysis... Cluster: 1 ( 2ptn & 2pkaab ) Sc 8.50 RMS 1.07 Len 232 nfit 216 See file s_prot.1 for the alignment and transformations Cluster: 2 ( 2sga & 3sgbe ) Sc 8.36 RMS 0.65 Len 191 nfit 166 See file s_prot.2 for the alignment and transformations Cluster: 3 ( 1ton & 2ptn 2pkaab ) Sc 9.03 RMS 0.73 Len 239 nfit 205 See file s_prot.3 for the alignment and transformations Cluster: 4 ( 3rp2a & 1ton 2ptn 2pkaab ) Sc 8.73 RMS 0.93 Len 242 nfit 206 See file s_prot.4 for the alignment and transformations Cluster: 5 ( 3est & 3rp2a 1ton 2ptn 2pkaab ) Sc 8.51 RMS 1.13 Len 258 nfit 208 See file s_prot.5 for the alignment and transformations Cluster: 6 ( 4chaa & 3est 3rp2a 1ton 2ptn 2pkaab ) Sc 8.18 RMS 1.01 Len 260 nfit 201 See file s_prot.6 for the alignment and transformations Cluster: 7 ( 2alp & 2sga 3sgbe ) Sc 8.35 RMS 1.06 Len 203 nfit 168 See file s_prot.7 for the alignment and transformations Cluster: 8 ( 1sgt & 4chaa 3est 3rp2a 1ton 2ptn 2pkaab ) Sc 7.70 RMS 1.11 Len 267 nfit 190 See file s_prot.8 for the alignment and transformations Cluster: 9 ( 1sgt 4chaa 3est 3rp2a 1ton 2ptn 2pkaab & 2alp 2sga 3sgbe ) Sc 4.78 RMS 1.82 Len 292 nfit 122 See file s_prot.9 for the alignment and transformations
The various fields describe details of the pairwise and treewise
comparisons: , RMS deviation, the alignment length (Align),
the length of each structure in residues (Len1, Len2), the number of
atoms used in the RMS fit (Nfit), the number of equivalent secondary
structure elements (Secs), and the number of equivalent residues
(see above, Eq.).
STAMP will also produce several files:
s_prot.mat - a file containing the information used to derive the
structural similarity tree (i.e. the output from the PAIRWISE) mode. This
is an upper diagonal matrix containing the pairwise values.
s_prot. - a series of files containing transformations and alignments
created by running the TREEWISE mode in STAMP. Each file corresponds to
a node in the similarity tree (i.e. a cluster), where two
groups of one or more structures have been combined to form an
alignment and transformations. The higher the value of
the more
structurally dissimilar the proteins contained in the file are. Highly
similar structures are clustered (aligned/superimposed) at an early stage
in the program's run, with more distantly related structures being clustered
towards the end.
The top of each s_prot. file contains the information needed to generate
superimposed coordinates using TRANSFORM. For example, running:
transform -f s_prot.9 -g -o s_prot.pdb
will create a PDB file containing all of the structures from the alignment in s_prot.8.
After these details, various details of the similarity (RMS deviation,
value, etc) are given. The bottom portion of the file contains
the structural alignment in STAMP format. Positions that do not include
gaps contain information as to the degree of local structural similarity,
such as the distance between (averaged)
atoms, and the
value.
Methods for displaying sequence alignments and structures are described below.