Index of /downloads/amas
-------------------------------------------------------------------------------
AMAS - Analysis of Multiply Aligned Sequences
Copyright: C. D. Livingstone and G. J. Barton (c) 1992,1997
C.D.Livingstone* & G.J.Barton
Contact: Geoff Barton: geoff@compbio.dundee.ac.uk
Please cite: Livingstone C.D. and Barton G.J. (1993)
Protein Sequence Alignments: A Strategy for the Hierarchical
Analysis of Residue Conservation
CABIOS Vol. 9 No. 6 (745-756)
***************************************
* AMAS User Guide * January 15th 1993 *
***************************************
Contents:
The AMAS Commandline.
A Brief Description.
Using AMAS.
Related Programs.
System Availability.
Conclusion.
Update History.
Appendices.
Files Included.
The AMAS Commandline:
--------------------
Requirements:
AMAS requires the following in order to proceed...
(see later text for details)
Alignment blocfile: A multiple sequence alignment in AMPS blocfile vertical
format (<alignment>). See appendices for description.
Property index: A table identifying each amino acid in the blocfile
in terms of its possession or lack of one or more
properties in binary format (<ptable>). See appendices
for description.
Sensible groups file: File defining sub-groups of aligned sequences within
the main alignment (<sgfile>). See appendices for
description.
Conservation threshold: A number describing, in terms of the property index,
how similar a set of amino acids must be before it is
considered to show conservation. See later text for a
full explanation.
Options:
These may be included if required:
Fontsize: Size in units of points (0.0148 inches) of the largest
typeface in the alignment (determines the dimensions of
the "pretty" output format.
File rootname: Name by which all of the output files will be called,
ie. the rootname "file" would lead to the production of
a file series "file.blc", "file.soa", "file.sum" and
"file.ps" in standard AMAS usage.
Options: A number of options exist to modify the standard output
of the AMAS program. See appendices for description.
If no options are set a ``-'' must be substituted in
this field.
The command line takes the form:
amas [options:a*cdefg*h*imnoprstuv]<alignment><ptable><sgfile><cons>[size]
[file rootname]
***For EXAMPLE: ***
1. Change directory to folder "first.light"
2. Enter the command line:
amas sf seq.aln intra.pt groups.sg 8 10 amasout
3. Output will be sent to the files:
amasout.blc amasout.soa amasout.sum amasout.ps
A number of further example command lines for the analysis of included alignments
are given in Appendix 3 with full explanations of their results. See below for
further explanation.
A Brief Description
-------------------
AMAS is a program which performs a systematic characterisation of the physico-
chemical properties seen at each position in a multiple protein sequence
alignment. A flexible set-based description of amino
acid properties is used to define the conservation between any group
of amino acids. Sequences in the alignment are gathered into
sub-groups on the basis of sequence similarity, functional similarity,
or other criteria. All pairs of sub-groups are then compared to
highlight positions that confer the unique features of each sub-group.
Graphical output using the Alscript program (Barton, 1993) is available for
rapid interpretation, while a more complex and detailed text summary is
available for more abstruse queries.
Setting Up AMAS:
---------------
Most of the installation process is straight forward. Choose a location for
the distributed folders and put them there! A makefile is included for Sun
Microsystems and Silicon Graphics machines, enter the c folder and use make.
A discrip.mms is shipped with the VMS version. Enter the [.amas.c] directory
and type "mms".
Unix:
++++
Two environment variables must be defined before using AMAS. One points to
the location of the default AMAS.DFS defaults file and the other points to
the location of the directory containing the property indices, ie.
setenv AMASDEFAULTS ~cdl/c/amas/distribute/AMAS.DFS
setenv AMASPTYPE ~cdl/c/amas/distribute/ptype
You could put these in your .cshrc. Change the environment variables to point
to your own property table directories/defaults files as necessary.
UPDATES:
=======
Version 1.65 and greater
The AMAS sequence analysis program now requires 3
environment variables to be set before use:
AMASDEFAULTS
AMASPTYPE
ALSCRIPTCOMMAND <----- the new one
suggested values:
/home/orac/cdl/c/amas/distribute/AMAS.DFS
/home/orac/cdl/c/amas/distribute/ptype
"nice -19 alscript -f"
respectively.
VAX/VMS(Alpha):
++++++++++++++
Include equivalents for the following lines in your login.com:
$! set up amas
$!
$ amas :== $ $c:[craig.amas.c]amas
$ AMASPTYPE == "$c:[craig.amas.ptype]"
$ AMASDEFAULTS == "$c:[craig.amas]amas.dfs"
$!
$!
$! set up alscript
$!
$ass $b:[geoff.alscript] alsdir
$alscript :== "run alsdir:alscript.exe"
$msf2blc :== "run alsdir:msf2blc.exe"
$clus2blc :== "run alsdir:clus2blc.exe"
$alsnum == "$lmb$dua3:[geoff.alscript]alsnum.exe"
$!
$!
Using AMAS:
----------
The first requirement of AMAS is a sequence alignment in vertical format (see
Appendix 1.1). This type of alignment may be created by the AMPS package of
Barton (1990) or existing alignments in other standard formats may be converted
using one of Alscript's accompanying programs (Barton, 1993).
The starting point for hierarchical conservation analysis is the
identification of two or more sub-sets of sequences within a multiple
sequence alignment. The subsets may be defined by grouping on the basis of
overall sequence similarity, by functional similarity, origin, or other
criteria. Given such groupings, the aim is to highlight which residue
positions define the unique properties of each group.
*******************************************************************************
A B
________________
______ ] | 1 | GTLYSILDIQ |
__| ]] | | |
__| |______ ]]] | 2 | GTAYPVMEVY |
| | ]]]] A | | |
__________| |_________ ]]] | 3 | GTAYLILDLA |
| | ]] | | |
| |____________ ] | 4 | GTTYFILDLR |
| |___|____________|
| ______________ ] | 5 | GTDTCVMELE |
| | ]] | | |
| | ______ ]]] | 6 | GGDTCCLDLA |
| | __| ]]]] B | | |
| ____| | |______ ]]] | 7 | GVDSVIMEFL |
| | |____| ]] | | |
| | |_________ ] | 8 | GGDSCLIDMS |
| | |___|____________|
|___| _______ ] | 9 | GTQYGLKRFV |
|___________| ]] C | | |
|_______ ] |10 | GTQFAIMQP |
________________________ |___|____________|
\ \ \ \
25 50 75 100 (%ID)
FIGURE 1
********
********************************************************************************
For example, Figure 1 shows a fragment of a multiple sequence alignment of 10
sequences (B) in which the percentage identity between the sequences has been
determined by single linkage cluster analysis, the results of which are
displayed using an unrooted dendrogram (B). This analysis clearly shows the
sequences to be split into (2 or) 3 clusters or sub-groups at around 30%
identity. If this grouping is consistent with other observations about the
family of proteins this would be a suitable sub-grouping scheme for an AMAS
analysis. Many sequence alignment packages (including the AMPS, program
ORDER) will performs this type of analysis. Once groups of sequences are
recorded in an "sg" file for use by the AMAS program (see appendix 1.2).
A property table appropriate to the alignment and to the question being posed
is then chosen. Property tables may be easily redefined by the user to
answer specific questions. (see appendix 1.3).
An appropriate conservation threshold is chosen for the analysis.
Useful values for the 10 property matrices intra.pt and extra.pt (included)
lie between 6 and 8 depending on the overall similarity of all of the
sequences in the alignment. If the overall sequence similarity is low, a
less stringent threshold is used, eg. 6 (six or more shared properties out of
10 define a set of amino acids as conserved). With similar sequences a more
stringent value, eg. 8, is chosen.
A pointsize (0.0148 inches = 1 point) must be defined for the default height
of each character in the Alscript output (10 points is about the height of
text in a paperback novel). Alscript output is selected by default but may
be turned-off by using the t(T) option.
With these criteria satisfied, a simple command line can be constructed in
order to execute AMAS:
amas - seq.aln intra.pt groups.sg 6 10 example 1a
or
amas t seq.aln intra.pt groups.sg 6 example 1b
These would perform an analysis of the sequence alignment ``seq.aln'' on the
basis of the property index ``intra.pt'' using the subgroupings of the
sequences in the alignment defined in ``groups.sg'' with a conservation
threshold of 6. The - indicates no additional program options have been set.
The files to perform this analysis are included. The output files will take
the names seq.aln... eg.
seq.aln.soa command file for program Alscript
seq.aln.blc new alignment blocfile containing sequences in new, sub-grouped
order
seq.aln.sum text summary of conservation in the alignment
seq.aln.ps Alscript output (PostScript file - can be printed directly on a
PostScript printer or previewed with an appropriate PostScript
previewer, eg. Sun Microsystems' pageview software)
AMAS automatically launches Alscript, if installed, producing the .ps file
automatically from the .soa file. The .soa file may be edited by experienced
Alscript users in order to introduce extra information. The .blc file is
required by the .soa file for operation. The .sum file contains a summary of
conservation as shown in the output of example 1 (see example.sum for
comparison, containing the results of the analysis in Appendix 3, example 9).
The summary file records the sequences in each sub-group, the number of gaps
and atypical residues ignored in an analysis, if any (see below), and the
conservation threshold set by the user. The file lists all positions which
show identities across all the sequences from the alignment included in the
analysis, all pairs of subgroups showing identities and all sub-groups
containing only one type of residue. Positions conserving properties across
all analysed sequences are detailed next, showing the properties conserved in
all subgroups and the positively conserved properties which differ between
subgroups. The percentage of sequences in each subgroup possessing a
"different" property is reported. Conservation and difference between pairs
of conserved subgroups at a position is reported in a similar manner.
Conserved pairs share at least the number of properties defined by the
threshold, while different pairs are themselves conserved, but which share
fewer properties than the threshold between them. Positions where individual
groups conserve properties are reported next, followed by a list of
unconserved groups and completely unconserved positions (by the threshold
criterion).
The Alscript output is very flexible. Output is in the form of a PostScript
file which may be printed on any common PostScript printer or visualised using
a PostScript previewer (eg. PageView V3, sun microsystems). A well formatted
copy of the sequences analysed in the program is shown in the form of an
alignment split into the groups defined by the "sg" file. By default, each
sequence position is presented in a font which indicates whether it is
identical across all analysed sequences (boldface white on black), an identity
in one group (Boldface), conserved in one group (plain), is unconserved in one
group (smaller italics) or is unconserved across all sequences (smaller,
different font). Conserved and identical regions within subgroups are boxed
by the program. These distinctions may be made more clear using shading or
colouring schemes.
A table may be presented at the foot of the alignment detailing the degree of
conservation between pairs of conserved sequences, the upper section showing
similarities, and the lower showing differences, or this information may be
presented in the form of a histogram (see the ``h'' option described in
Appx. 2). The title of the alignment is, by default, the title of the
original alignment file, or the file rootname supplied when using the ``f''
option (Appx. 2). The fonts, colours, shading levels and the title at the
base of the alignment may be changed by editing the AMAS.DEFAULTS file included
with the program.
The histogram of conservation has been found to coincide well with observed
secondary structure and may have predictive value (see SH2_analysis.aln series
included).
Related Programs:
----------------
The AMPS package (Barton, 1990). This performs multiple sequence
alignments and databank scanning. The package is currently only
available for Sun, Silicon Graphics and VAX/VMS systems. Contact the
author for details.
ALSCRIPT (Barton, 1993) allows shading, boxing and colouring to be applied to
an alignment in AMPS format. Routines are provided for conversion of other
alignment formats. In response to a command file containing a set of
formatting commands, ALSCRIPT produces a PostScript file which may be printed
on a PostScript laser printer or viewed using a PostScript previewer
(e.g. Sun Microsystem's PageView program). Alscript is NOT a multiple
sequence alignment program, nor is it an alignment editor.
System Availability:
-------------------
AMAS has been used on both Silicon Graphics Indigo and Sun SPARCstation
platforms under the UNIX operating system. A Digital ALPHA/VMS version is now
available and an IBM PC compatible version is under preparation.
AMAS, Alscript and AMPS are freely available to the academic community from the
authors under a (fairly) standard licensing agreement. Enquiries from
prospective industrial users are welcomed.
Conclusion
----------
Whilst all these features identified by AMAS can be
found by inspection of the alignment, the process is laborious
and error-prone. The strategy described here reduces the scope for
error, allows alternative sub-groupings to be rapidly investigated, and
provides structurally relevant shading and boxing.
For additional information on the method and the ideas behind it please see
Livingstone and Barton 1993.
Update History:
--------------
Version 1.0 - 30th June 1992
Version 1.1 - 3rd July 1992
Version 1.2 - 29th July 1992
Version 1.3 - 27th Oct 1992
Version 1.4 - 13th Jan 1993: Many older subroutines revised
for speed and clarity.
Colour and mask option introduced.
Option dealing with unusual residues added.
Version 1.5 - 10th Feb 1993: Minor bug fixes, changes to accord with new
version of Alscript.
Version 1.6 - 1st Apr 1993: Colour bug fixes, pair conservation calculation
modified, positive property conservation markings
modified, command line options modified, Alscript
labelling improved.
Version 1.66 - 1st May 1995 Previous versions of AMAS contain errors in the
property table reading routine. Only the first
eighty characters of each line are read initially.
Subsequent characters result in the prodiction of
additional *unintended* property descriptions.
Users of old versions should restrict descriptions
to less than eighty characters _including_ comments.
This version reads descriptions up to 500 characters
wide. The VMS version has NOT been upgraded yet!
Version 1.67 - 24th May 1995 Similar "bugfix" to 1.66. SG files accept a new
metacharacter ":" at the end of lines which indicates
that the following line is to be condisered part
of the same group. Lines in SG files retain an 80
character limit.
Version 2.0 - Available sometime...
Documentation modified 03 May 95.
VAX/VMS Version finally implemented 19 Oct 93 (Alphas only)!!!
Documentation modified 30 Nov 93.
APPENDIX 1
++++++++++
Appendix 1.1 Alignment Blocfile (<alignment>) Format:
----------------------------------------------------
Taken from AMPS (a users guide) by Geoff Barton (G.J.B. 1990):
This defines a multiple alignment in vertical format. The print_vertical
command produces a file in block_file format.
The minimum requirements for a block_file for N aligned sequences are
1. N '>comment line(s)'
2. '* iteration int'
3. 'N or more vertically aligned sequences'
4. '*'
1. The comment lines define the sequence identifiers and the number of
'>' characters preceding the first '* iteration int' line define the number of
sequences that are defined in the sequence lines.
2. This line specifies the beginning of the alignment to be read. The '*'
character specifies the column in which the alignment begins. The 'iteration
int' specifier identifies the particular alignment within this block_file.
Several alignments may follow each other providing they are identified by a
different iteration number (eg. 1,2,3).
3. The alignment is ended by a '*' character which should be in the same
column as the '*' character that started the alignment.
Simple example:
This is a block file containing an align of three sequences.
The comments that I an writing here may appear in the block file, but are
ignored by MULTALIGN when the file is read. The only proviso is that no
'greater than' or 'star' characters must be present.
Identifier: Title:
>first this is sequence A
>second this is sequence B
>third this is sequence C
* iteration 1
a
a p
avg
llg
lcr
g
pg
www
s
*
Appendix 1.2 Sensible Groups (sgfile) File Format:
-------------------------------------------------
Sub-group information is stored in an "sg" (sensible groups) file for use in
AMAS. Groups are defined by their sequence numbers, ie. the order in which
they appear in the original alignment blocfile. The format of an sg file
follows the pattern shown in Figure 1.2a:
*******************************************************************************
! sg file for alignment in Figure 1
!
! Comments may be entered on any line which begins with an exclamation mark.
! These lines may occur at ANY position in the sg file, ie. before any group
! definitions...
!
1-4
!
! between group definitions...
!
5-8
9,10
!
! or at the end.
FIGURE 1.2a
***********
*******************************************************************************
Ranges of sequences may be defined, as may lists of sequences with each
sequence separated by a comma. Lists and ranges of sequences need not be in
numerical order and may span other groups or contain members of other groups.
Ranges and lists may be combined in the same group definition as shown in
Figure 1.2a, an sg file for a fictitious alignment:
********************************************************************************
! sg file for imaginary alignment of 29 rhubarbase sequences
!
! first group: sequences 1,2,3 and 4:
!
1-4
!
! second group: sequences 5,7,10,11,12,13,17 and 21:
!
5,7,9,10-13,17,21
!
! third group: sequences 6,8,9,14,15,16,18,19,20
!
6,8,9,14-16,18-20
!
! fourth group (contains all sequences, including those called already):
1-29
!
! end of sg file
FIGURE 1.2b
***********
*******************************************************************************
Appendix 1.3 Property Index (ptype) File Format:
-----------------------------------------------
Property indices define each amino acid in an alignment in terms of a set of
user defined properties see Figure 1.3a). The file contains three main
fields: the single letter amino acid codes for each of the amino acids in the
alignment, a binary matrix showing property set membership (0 = non-member,
1= member), and a list of property names.
Comments may be added to the right of the colons defining the end of each
property name. Lines beginning with exclamation marks may be used for
comments with the exception of the line containing the three asterisks and the
single letter amino acid codes which MUST occur before the matrix.
The first asterisk occurs column five and defines the location of the first
single letter code. The column containing the first single letter amino acid
code is also the first column of the property set membership matrix. The
second asterisk occurs in the column following the last single letter code.
The column containing the last single letter code is also the last column of
the matrix. The third asterisk defines the column in which the first letter
of each property name occurs.
Each entry in the property set membership matrix defines the property
membership of the amino acid in whose column it occurs with respect to the
property in whose row it is placed.
The property names are given to the right of the row of matrix entries to which
they refer. 14 characters are allowed for each property name entry. Longer
names are allowed but these will disrupt the text summary output. Names may
contain any alphanumeric characters and punctuation but must not contain space.
Numbers at the left of the property matrix are disregarded but provide a useful
guide to the property numbering system used in the program when debugging.
The numbers may, therefore be omitted, may occur in non numerical order or be
replaced by any alphanumeric characters up to and including the fifth column.
As many symbols/property types as required may be defined, with the proviso
that the total number of characters per line does not exceed 80.
A number of property indices have been included with AMAS. ``intra.pt'' and
``extra.pt'' are general matrices based on the conservation matrix of Zvelebil
et al (1987) which may be used to calculate patterns of overall conservation
between a set of diverse sequence groups from an alignment. ``intra.pt'' is
used with alignment containing proteins which are thought not to contain
disulphide bonded cysteines, cysteine is defined as small, tiny, hydrophobic
and polar. ``extra.pt'' is used with proteins thought to contain
predominantly disulphide bonded cysteines, defining cysteine as small and
hydrophobic. ``ch.pt'' provides a means of detecting patterns of
conservation of charged groups. Using this matrix and the ignore negative
conservation option (n), one can detect changes in conserved charge at a
position between different subgroups using a conservation threshold of 2.
eg. amas -n seq.aln ch.pt groups.sg 2
An example property matrix is shown in Figure 1.3a:
*******************************************************************************
! Property matrix for proteins containing
! disulphide bonded cysteines.
!
! *ILVCAGMFYWHKREQDNSTP BZX**
!
1 111111111111000000101001 Hydrophobic :
2 000000001111111111101111 Polar : Extra comments may be placed
3 001111000000000111111001 Small : here.
4 000000000000000000011001 Proline :
5 000000000011100000001001 Positive :
6 000000000000010100001001 Negative :
7 000000000011110100001001 Charged :
8 000011000000000001001001 Tiny :
9 111000000000000000001001 Aliphatic :
10 000000011110000000001001 Aromatic :
!
!
FIGURE 1.3a
*******************************************************************************
Appendix 2
++++++++++
Optional commands are entered as a string composed of single letter codes
(listed below) as shown in the example command lines above. If no options
are required a '-' must be entered.
eg. chg1s and chr10g1s and ch10g2r10 and - are all legitimate option strings.
Optional commands:
a: Ignore _atypical_ or unusual residues. Like g and h, this command is
followed by an integer which, in this case, defines the percentage of
residues at any position which may be ignored. This count includes any
gaps which may be ignored.
eg. xxxr10xx (where x's represent other possible optional commands) up
to 10% of the residues in a subgroup or pair of subgroups
in the calculation of conservation values.
c: Alignments are _coloured_ by conservation.
RED Identity across all sub-group
GREEN Conserved in one sub-group
BLUE Identical in one sub-group
GREY Unconserved
ORANGE Similarities histogram (optional)
VIOLET Differences histogram (optional)
f: Enables a root _filename_ at the end of the command line to be used to name
output files from an AMAS run ie. the rootname "file" would lead to the
production of a file series "file.blc", "file.soa", "file.sum" and "file.ps"
in standard AMAS usage. Can be used with `a'.
g: Number of _gaps_ which may be ignored per sub-group. The number is entered
as an integer immediately after the g in the optional commands string.
eg. xxxxg2xxx (where x's represent other possible optional commands) would
lead to up to two gaps per sub-group being ignored.
h: This option causes a frequency _histogram_ to be produced in place of the
conservation number report at the foot of the highlighted alignment.
An integer entered after the h determines the maximum height of a bar on the
histogram in characters. The histogram is scaled to this height. The
default is 10 characters. A report of the mean of the pair conservation
values more than or equal to the conservation threshold is made for each
position together with a histogram showing the frequency with which pairs of
sub-groups at a position are conserved, the so called similarity plot. A
similar report of the mean of the pair conservation values less than the
threshold at each position is also made, together with a histogram showing
the frequency with which pairs of sub-groups at a position have dissimilar
properties, the so-called difference plot. The similarity plot
distinguishes between frequency of identical and conserved pairs by the use
of dark and light shading respectively. Using the option h with a
value of 0 will lead to the omission of the histograms. A full height bar
indicates a frequency of 100%, while an empty bar indicates 0%.
eg. xxxh6xx (where x's represent other possible optional commands) would
lead to the presentation of the two histograms, each with
maximum bar height of 6 characters.
i: Causes the sequence _identifiers_ contained in each group to be listed rather
than the sequence numbers in the summary file. This increases the length
of the file.
m: All amino acids appearing at unconserved sequence positions within an
alignment go unreported (_masked_) in the Alscript output.
n: Positive property conservation only is considered (think '_Not negative_' or
'_Negative ignored_')
o: _Other_ histogram option. Default shows frequency with which pairs of sub-
groups are conserved (similar) or different by the criteria of the
threshold set. With 'o', the mean pairwise conservation at each sequence
position is reported, split into sores for similar and different pairs.
p: Output Alscript in _portrait_ mode - default is landscape.
s: _Shading_ is used in addition to font changes to display conservation
patterns in Alscript output. Especially useful when large font size or high
quality (>300dpi.) output is possible.
*Background *Text Font Size
Black White Helvetica-Bold Default - Identical across all
sequences
Dark Grey White Helvetica-Bold Default - Identical in one sub-group
Light Grey Black Helvetica Default - Conserved in one sub-group
White Black Helvetica-Oblique 0.70 - Unconserved in one
sub-group
White Black Optima-Bold 0.65 - Unconserved across all
sequences
* made active by s option, otherwise background is white and test is black.
t: Only the _text_ summary of conservation is provided.
u: Use _user's_ own title from AMAS.DEFAULTS file at the base of the Alscript
output. This may be used in conjunction with `f'.
v: Only the highlighted alignment is produced in the Alscript output.
Appendix 3
++++++++++
The results of the following examples are contained in the files within the aln+sg file in the files 1-C.
Example commandlines:
1: amas - seq.aln intra.pt groups.sg 8 10
This example performs the simplest level of analysis on the alignment seq.aln.
The property index intra.pt is used because no disulphide bonded cysteines are
expected in the alignment. extra.sg would be used if disulphides were
suspected. The subgroupings chosen for the alignment are included in
groups.sg. The conservation threshold is set to 8 because the sequences are
fairly similar. The pointsize for the output is 10 because the alignment
is relatively small. The summary output will describe the general
conservation of amino acids in the sequence as described above. The Alscript
output will distinguish conservation state using only font changes and pair
similarity and difference are reported in their simplest form. The Alscript
plot will be given the title ``seq.aln''.
2: amas t seq.aln intra.pt groups.sg 8
This example is identical to example 1 with the exception that the t option is
used. No Alscript output will be produced, so no pointsize is required.
3: amas s seq.aln intra.pt groups.sg 8 10
This example is identical to example 1 with the exception that the s option is
used, meaning that conservation status in the Alscript file will be identified
by use of shading in addition to font changes.
4: amas c seq.aln intra.pt groups.sg 8 10
As example 3 but colour is used in place of shading. Shading and colour can
not be used together. Colour dominates if S and C are chosen.
5: amas f seq.aln intra.pt groups.sg 8 10 example1
As example 1 but f option is invoked. The file rootname ``example1'' will be
applied to the output filenames and will be used as the title of the Alscript
plot.
6: amas g1s seq.aln intra.pt groups.sg 8 10
In this example the ``ignore gaps'' option is requested with a value of 1.
Shading is also requested for the Alscript output. One gap per subgroup or
pair of sub-groups will be ignored in the calculation of conservation scores.
7: amas a14s seq.aln intra.pt groups.sg 8 10
Similarly, in this example the r option is chosen with a value of 14. Up to
14% of the residues appearing in a subgroup or pair of subgroups at a position
may be ignored in the calculation of the conservation scores.
8: amas a14g1s seq.aln intra.pt groups.sg 8 10
Here, both gaps and atypical residues are ignored. The gap count is included
in the calculation of the number of atypical residues being ignored.
9: amas a14h8fg1s seq.aln intra.pt groups.sg 6 7.5 example
As 8. The ``h'' option has been invoked with a value of 8. The pair
conservation scores will be replaced by a conservation histogram of maximum
bar height 8 characters, as described above.
A: amas a14vg1s seq.aln intra.pt groups.sg 8 10
Neither the pair conservation scores nor the histogram is produced when the v
option is used.
B: amas ma14vingp1s seq.aln intra.pt groups.sg 3 10
The sequence identifiers are used in the summary file to identify the sequences
in each group in place of the sequence numbers from the original alignment
file when the option ``i'' is used. The options m,n and p are explained
above. The order of appearance of options in the commandline is unimportant.
C: amas nsh10m charge.aln ch.pt ch.sg 2 10
This commandline demonstrates the use of the charge property index to find
incidences of charge change between different subgroups of an alignment. The
alignment charge.aln has been split into the groups described in ch.sg. The
charge index ch.pt is used to find groups which conserve both (2) the property
of being charged and the sign of that charge. Only these groups are displayed
in the Alscript output due to the use of the ``m'' option (shaded - ``s'').
Differences in charge are clearly identified by bars in the difference
histogram.
References
**********
1. Livingstone C.D. and Barton G.J. (1993)
Protein Sequence Alignments: A Strategy for the Hierarchical
Analysis of Residue Conservation
CABIOS Vol. 9 No. 6 (745-756)
2. Barton G.J. (1993)
Prot. Eng. [6], 37-40, (Protocols) Alscript: A Tool to Format
Multiple Sequence Alignments.
3. Barton G.J. (1990)
Methods Enzymol. [183], 403-428, Protein Multiple Sequence
Alignment and Flexible Pattern Matching.
4. Zvelebil M.J.J.M., Barton G.J., Taylor W.R. and Sternberg M.J.E. (1987)
J. Mol. Biol. [195], 957-961, Prediction of Protein Secondary
Structure and Active Sites using the Alignment of Homologous
Sequences.
Known Bugs
**********
1. The program does not check fully if an sg file matches the alignment on which
it is used. If the sg file is not correct, the program has a nasty habit of
producing the most awful segmentation errors. Watch it!
The next version of amas will have simplified command line usage and a much
more even coverage of error messages.