Index of /downloads/amas

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory   -  
[TXT]LICENSE 1999-04-12 11:20 5.5K 
[TXT]RELEASE_NOTES 2005-05-25 13:51 557  
[   ]amas.1_67d.tar.gz 2005-05-25 13:49 822K 
[DIR]old/ 2005-05-25 13:50 -  

-------------------------------------------------------------------------------

 AMAS - Analysis of Multiply Aligned Sequences

   Copyright: C. D. Livingstone and G. J. Barton (c)  1992,1997

   C.D.Livingstone* & G.J.Barton

   Contact:  Geoff Barton:  geoff@compbio.dundee.ac.uk

Please cite:  Livingstone C.D. and Barton G.J. (1993)
              Protein Sequence Alignments: A Strategy for the Hierarchical 
	      Analysis of Residue Conservation
              CABIOS Vol. 9 No. 6 (745-756)

***************************************
* AMAS User Guide * January 15th 1993 *
***************************************

Contents:
	The AMAS Commandline.
        A Brief Description.
        Using AMAS.
	Related Programs.
	System Availability.
        Conclusion.
 	Update History.
	Appendices.
        Files Included.


The AMAS Commandline:
--------------------

Requirements:

AMAS requires the following in order to proceed...
(see later text for details)

Alignment blocfile:	A multiple sequence alignment in AMPS blocfile vertical
			format (<alignment>). See appendices for description.
Property index:		A table identifying each amino acid in the blocfile
			in terms of its possession or lack of one or more 
			properties in binary format (<ptable>). See appendices 
			for description.
Sensible groups file:	File defining sub-groups of aligned sequences within 
                        the main alignment (<sgfile>). See appendices for 
			description.
Conservation threshold:	A number describing, in terms of the property index, 
			how similar a set of amino acids must be before it is
			considered to show conservation. See later text for a 
			full explanation.

Options:
These may be included if required:

Fontsize:		Size in units of points (0.0148 inches) of the largest
			typeface in the alignment (determines the dimensions of
			the "pretty" output format. 
File rootname:		Name by which all of the output files will be called,
			ie. the rootname "file" would lead to the production of
			a file series "file.blc", "file.soa", "file.sum" and 
			"file.ps" in standard AMAS usage.
Options:		A number of options exist to modify the standard output
			of the AMAS program. See appendices for description.   
			If no options are set a ``-'' must be substituted in 
			this field. 

The command line takes the form:

amas [options:a*cdefg*h*imnoprstuv]<alignment><ptable><sgfile><cons>[size]
     [file rootname]


***For EXAMPLE: ***

1.	Change directory to folder "first.light"

2.	Enter the command line:

	amas sf seq.aln intra.pt groups.sg 8 10 amasout

3.	Output will be sent to the files:

        amasout.blc	amasout.soa	amasout.sum	amasout.ps

A number of further example command lines for the analysis of included alignments
are given in Appendix 3 with full explanations of their results.   See below for
further explanation.



A Brief Description
-------------------

AMAS is a program which performs a systematic characterisation of the physico-
chemical properties seen at each position in a multiple protein sequence 
alignment.    A flexible set-based description of amino
acid properties is used to define the conservation between any group
of amino acids.  Sequences in the alignment are gathered into
sub-groups on the basis of sequence similarity, functional similarity,
or other criteria.  All pairs of sub-groups are then compared to
highlight positions that confer the unique features of each sub-group.   
Graphical output using the Alscript program (Barton, 1993) is available for 
rapid interpretation, while a more complex and detailed text summary is 
available for more abstruse queries.



Setting Up AMAS:
---------------

Most of the installation process is straight forward.   Choose a location for 
the distributed folders and put them there!   A makefile is included for Sun 
Microsystems and Silicon Graphics machines, enter the c folder and use make.
A discrip.mms is shipped with the VMS version.   Enter the [.amas.c] directory
and type "mms".

Unix:
++++

Two environment variables must be defined before using AMAS.   One points to 
the location of the default AMAS.DFS defaults file and the other points to
the location of the directory containing the property indices, ie.

setenv AMASDEFAULTS  ~cdl/c/amas/distribute/AMAS.DFS
setenv AMASPTYPE     ~cdl/c/amas/distribute/ptype

You could put these in your .cshrc.   Change the environment variables to point 
to your own property table directories/defaults files as necessary.


UPDATES:
=======

Version 1.65 and greater

The AMAS sequence analysis program now requires 3
environment variables to be set before use:

AMASDEFAULTS
AMASPTYPE
ALSCRIPTCOMMAND	<----- the new one

suggested values:

/home/orac/cdl/c/amas/distribute/AMAS.DFS
/home/orac/cdl/c/amas/distribute/ptype
"nice -19 alscript -f"

respectively.



VAX/VMS(Alpha):
++++++++++++++

Include equivalents for the following lines in your login.com:

$! set up amas
$!
$ amas :== $ $c:[craig.amas.c]amas
$ AMASPTYPE == "$c:[craig.amas.ptype]"
$ AMASDEFAULTS == "$c:[craig.amas]amas.dfs"
$!
$!
$! set up alscript
$!
$ass $b:[geoff.alscript] alsdir
$alscript :== "run alsdir:alscript.exe"
$msf2blc :== "run alsdir:msf2blc.exe"
$clus2blc :== "run alsdir:clus2blc.exe" 
$alsnum == "$lmb$dua3:[geoff.alscript]alsnum.exe"
$!
$!



Using AMAS:
----------

The first requirement of AMAS is a sequence alignment in vertical format (see 
Appendix 1.1).   This type of alignment may be created by the AMPS package of 
Barton (1990) or existing alignments in other standard formats may be converted
using one of Alscript's accompanying programs (Barton, 1993).

The starting point for hierarchical conservation analysis is the
identification of two or more sub-sets of sequences within a multiple
sequence alignment.  The subsets may be defined by grouping on the basis of 
overall sequence similarity, by functional similarity, origin, or other
criteria.  Given such groupings, the aim is to highlight which residue
positions define the unique properties of each group.

*******************************************************************************

	A					B
					         ________________
	                  ______ ]    	        | 1 | GTLYSILDIQ |
		       __|       ]]   	        |   |            |
		    __|  |______ ]]]  	        | 2 | GTAYPVMEVY |
		   |  |          ]]]] A		|   |            |
	 __________|  |_________ ]]]  		| 3 | GTAYLILDLA |
	|	   |             ]]   		|   |            |
	|	   |____________ ]    		| 4 | GTTYFILDLR |
	|	                		|___|____________|
	|	  ______________ ]    		| 5 | GTDTCVMELE |
	|        |               ]]   		|   |            |
	|        |        ______ ]]]  		| 6 | GGDTCCLDLA |
	|    	 |     __|       ]]]] B		|   |            |
	|    ____|    |  |______ ]]]  		| 7 | GVDSVIMEFL |
	|   |	 |____|          ]]   		|   |            |
	|   |	      |_________ ]    		| 8 | GGDSCLIDMS |
	|   |	                		|___|____________|
	|___|		 _______ ]    		| 9 | GTQYGLKRFV |
	    |___________|        ]]   C		|   |            |
			|_______ ]  		|10 | GTQFAIMQP  |
	________________________	        |___|____________|
	\	\	\	\
	25	50	75	100 (%ID)	

				FIGURE 1
				********
********************************************************************************

For example, Figure 1 shows a fragment of a multiple sequence alignment of 10 
sequences (B) in which the percentage identity between the sequences has been 
determined by single linkage cluster analysis, the results of which are 
displayed using an unrooted dendrogram (B).   This analysis clearly shows the 
sequences to be split into (2 or) 3 clusters or sub-groups at around 30% 
identity.   If this grouping is consistent with other observations about the 
family of proteins this would be a suitable sub-grouping scheme for an AMAS 
analysis.   Many sequence alignment packages (including the AMPS, program 
ORDER) will performs this type of analysis.   Once groups of sequences are 
recorded in an "sg" file for use by the AMAS program (see appendix 1.2).

A property table appropriate to the alignment and to the question being posed 
is then chosen.   Property tables may be easily redefined by the user to 
answer specific questions. (see appendix 1.3).

An appropriate conservation threshold is chosen for the analysis.   
Useful values for the 10 property matrices intra.pt and extra.pt (included) 
lie between 6 and 8 depending on the overall similarity of all of the 
sequences in the alignment.   If the overall sequence similarity is low, a 
less stringent threshold is used, eg. 6 (six or more shared properties out of 
10 define a set of amino acids as conserved).   With similar sequences a more 
stringent value, eg. 8, is chosen.   

A pointsize (0.0148 inches = 1 point) must be defined for the default height 
of each character in the Alscript output (10 points is about the height of 
text in a paperback novel).   Alscript output is selected by default but may 
be turned-off by using the t(T) option.

With these criteria satisfied, a simple command line can be constructed in 
order to execute AMAS:

 amas - seq.aln intra.pt groups.sg 6 10		example 1a

or

 amas t seq.aln intra.pt groups.sg 6		example 1b
 

These would perform an analysis of the sequence alignment ``seq.aln'' on the 
basis of the property index ``intra.pt'' using the subgroupings of the 
sequences in the alignment defined in ``groups.sg'' with a conservation 
threshold of 6.   The - indicates no additional program options have been set.  
The files to perform this analysis are included.   The output files will take 
the names seq.aln... eg.

seq.aln.soa	command file for program Alscript
seq.aln.blc	new alignment blocfile containing sequences in new, sub-grouped
                order
seq.aln.sum	text summary of conservation in the alignment
seq.aln.ps	Alscript output (PostScript file - can be printed directly on a
		PostScript printer or previewed with an appropriate PostScript
		previewer, eg. Sun Microsystems' pageview software)

AMAS automatically launches Alscript, if installed, producing the .ps file 
automatically from the .soa file.   The .soa file may be edited by experienced 
Alscript users in order to introduce extra information.   The .blc file is 
required by the .soa file for operation.   The .sum file contains a summary of 
conservation as shown in the output of example 1 (see example.sum for 
comparison, containing the results of the analysis in Appendix 3, example 9).

The summary file records the sequences in each sub-group, the number of gaps 
and atypical residues ignored in an analysis, if any (see below), and the 
conservation threshold set by the user.   The file lists all positions which 
show identities across all the sequences from the alignment included in the 
analysis, all pairs of subgroups showing identities and all sub-groups 
containing only one type of residue.   Positions conserving properties across 
all analysed sequences are detailed next, showing the properties conserved in 
all subgroups and the positively conserved properties which differ between 
subgroups.   The percentage of sequences in each subgroup possessing a 
"different" property is reported.   Conservation and difference between pairs 
of conserved subgroups at a position is reported in a similar manner.   
Conserved pairs share at least the number of properties defined by the 
threshold, while different pairs are themselves conserved, but which share 
fewer properties than the threshold between them.   Positions where individual 
groups conserve properties are reported next, followed by a list of 
unconserved groups and completely unconserved positions (by the threshold 
criterion).

The Alscript output is very flexible.   Output is in the form of a PostScript 
file which may be printed on any common PostScript printer or visualised using 
a PostScript previewer (eg. PageView V3, sun microsystems).   A well formatted 
copy of the sequences analysed in the program is shown in the form of an 
alignment split into the groups defined by the "sg" file.   By default, each 
sequence position is presented in a font which indicates whether it is 
identical across all analysed sequences (boldface white on black), an identity 
in one group (Boldface), conserved in one group (plain), is unconserved in one 
group (smaller italics) or is unconserved across all sequences (smaller, 
different font).   Conserved and identical regions within subgroups are boxed 
by the program.   These distinctions may be made more clear using shading or 
colouring schemes.   

A table may be presented at the foot of the alignment detailing the degree of 
conservation between pairs of conserved sequences, the upper section showing 
similarities, and the lower showing differences, or this information may be 
presented in the form of a histogram (see the ``h'' option described in     
Appx. 2).   The title of the alignment is, by default, the title of the 
original alignment file, or the file rootname supplied when using the ``f'' 
option   (Appx. 2).   The fonts, colours, shading levels and the title at the 
base of the alignment may be changed by editing the AMAS.DEFAULTS file included
with the program.   

The histogram of conservation has been found to coincide well with observed 
secondary structure and may have predictive value (see SH2_analysis.aln series 
included).

Related Programs:
----------------

The AMPS package (Barton, 1990).  This performs multiple sequence
alignments and databank scanning.  The package is currently only
available for Sun, Silicon Graphics and VAX/VMS systems.  Contact the
author for details.

ALSCRIPT (Barton, 1993) allows shading, boxing and colouring to be applied to 
an alignment in AMPS format.   Routines are provided for conversion of other 
alignment formats.   In response to a command file containing a set of 
formatting commands, ALSCRIPT produces a PostScript file which may be printed 
on a PostScript laser printer or viewed using a PostScript previewer 
(e.g. Sun Microsystem's PageView program).   Alscript is NOT a multiple 
sequence alignment program, nor is it an alignment editor.

System Availability:
-------------------

AMAS has been used on both Silicon Graphics Indigo and Sun SPARCstation 
platforms under the UNIX operating system.   A Digital ALPHA/VMS version is now
available and an IBM PC compatible version is under preparation.

AMAS, Alscript and AMPS are freely available to the academic community from the
authors under a (fairly) standard licensing agreement.   Enquiries from 
prospective industrial users are welcomed.

Conclusion
----------

Whilst all these features identified by AMAS can be
found by inspection of the alignment, the process is laborious
and error-prone.  The strategy described here reduces the scope for
error, allows alternative sub-groupings to be rapidly investigated, and
provides structurally relevant shading and boxing.

For additional information on the method and the ideas behind it please see 
Livingstone and Barton 1993.



Update History:
--------------

Version 1.0 - 30th June 1992
Version 1.1 -  3rd July 1992
Version 1.2 - 29th July 1992
Version 1.3 - 27th Oct  1992
Version 1.4 - 13th Jan  1993: Many older subroutines revised 
			      for speed and clarity.
                              Colour and mask option introduced.
                              Option dealing with unusual residues added.
Version 1.5 - 10th Feb  1993: Minor bug fixes, changes to accord with new
                              version of Alscript. 
Version 1.6 - 1st  Apr  1993: Colour bug fixes, pair conservation calculation 
                              modified, positive property conservation markings
			      modified, command line options modified, Alscript
			      labelling improved.
Version 1.66 - 1st May  1995  Previous versions of AMAS contain errors in the 
                              property table reading routine. Only the first 
                              eighty characters of each line are read initially.
                              Subsequent characters result in the prodiction of 
                              additional *unintended* property descriptions.    
                              Users of old versions should restrict descriptions 
                              to less than eighty characters _including_ comments.
                              This version reads descriptions up to 500 characters
                              wide.   The VMS version has NOT been upgraded yet!
Version 1.67 - 24th May 1995  Similar "bugfix" to 1.66.   SG files accept a new
                              metacharacter ":" at the end of lines which indicates
                              that the following line is to be condisered part 
                              of the same group.   Lines in SG files retain an 80
                              character limit.

Version 2.0 - Available sometime...

Documentation modified 03 May 95.
VAX/VMS Version finally implemented 19 Oct 93 (Alphas only)!!!
Documentation modified 30 Nov 93.



APPENDIX 1
++++++++++

Appendix 1.1 Alignment Blocfile (<alignment>) Format:
----------------------------------------------------

Taken from AMPS (a users guide) by Geoff Barton (G.J.B. 1990):

This defines a multiple alignment in vertical format.  The print_vertical
command produces a file in block_file format.

The minimum requirements for a block_file for N aligned sequences are
1.  N  '>comment line(s)'
2.  '* iteration int'
3.  'N or more vertically aligned sequences'
4.  '*'

1.  The comment lines define the sequence identifiers and the number of
'>' characters preceding the first '* iteration int' line define the number of
sequences that are defined in the sequence lines.

2.  This line specifies the beginning of the alignment to be read.  The '*'
character specifies the column in which the alignment begins.  The 'iteration
int' specifier identifies the particular alignment within this block_file.
Several alignments may follow each other providing they are identified by a
different iteration number (eg. 1,2,3).

3.  The alignment is ended by a '*' character which should be in the same
column as the '*' character that started the alignment.

Simple example:

This is a block file containing an align of three sequences.
The comments that I an writing here may appear in the block file, but are
ignored by MULTALIGN when the file is read.  The only proviso is that no
'greater than' or 'star' characters must be present.
Identifier:  Title:
>first       this is sequence A
>second      this is sequence B
>third       this is sequence C
* iteration 1
a  
a p
avg
llg
lcr
g
 pg
www
s	
*



Appendix 1.2 Sensible Groups (sgfile) File Format: 
-------------------------------------------------

Sub-group information is stored in an "sg" (sensible groups) file for use in 
AMAS.   Groups are defined by their sequence numbers, ie. the order in which 
they appear in the original alignment blocfile.   The format of an sg file 
follows the pattern shown in Figure 1.2a:

*******************************************************************************
! sg file for alignment in Figure 1
!
! Comments may be entered on any line which begins with an exclamation mark.
! These lines may occur at ANY position in the sg file, ie. before any group  
! definitions...
!
1-4
!
! between group definitions...
!
5-8
9,10
!
! or at the end.
				FIGURE 1.2a
				***********
*******************************************************************************

Ranges of sequences may be defined, as may lists of sequences with each 
sequence separated by a comma.   Lists and ranges of sequences need not be in 
numerical order and may span other groups or contain members of other groups.  
Ranges and lists may be combined in the same group definition as shown in 
Figure 1.2a, an sg file for a fictitious alignment:

********************************************************************************

! sg file for imaginary alignment of 29 rhubarbase sequences
!
! first group: sequences 1,2,3 and 4:
!
1-4
!
! second group: sequences 5,7,10,11,12,13,17 and 21:
!
5,7,9,10-13,17,21
!
! third group: sequences 6,8,9,14,15,16,18,19,20
!
6,8,9,14-16,18-20
!
! fourth group (contains all sequences, including those called already):
1-29
!
! end of sg file 	
				FIGURE 1.2b
				***********
*******************************************************************************



Appendix 1.3 Property Index (ptype) File Format: 
-----------------------------------------------

Property indices define each amino acid in an alignment in terms of a set of 
user defined properties see Figure 1.3a).   The file contains three main 
fields: the single letter amino acid codes for each of the amino acids in the 
alignment, a binary matrix showing property set membership (0 = non-member, 
1= member), and a list of property names.   

Comments may be added to the right of the colons defining the end of each 
property name.   Lines beginning with exclamation marks may be used for 
comments with the exception of the line containing the three asterisks and the 
single letter amino acid codes which MUST occur before the matrix.   

The first asterisk occurs column five and defines the location of the first 
single letter code.   The column containing the first single letter amino acid
code is also the first column of the property set membership matrix.   The 
second asterisk occurs in the column following the last single letter code.   
The column containing the last single letter code is also the last column of 
the matrix.   The third asterisk defines the column in which the first letter 
of each property name occurs.

Each entry in the property set membership matrix defines the property 
membership of the amino acid in whose column it occurs with respect to the 
property in whose row it is placed.

The property names are given to the right of the row of matrix entries to which
they refer.   14 characters are allowed for each property name entry.   Longer 
names are allowed but these will disrupt the text summary output.   Names may 
contain any alphanumeric characters and punctuation but must not contain space.

Numbers at the left of the property matrix are disregarded but provide a useful
guide to the property numbering system used in the program when debugging.   
The numbers may, therefore be omitted, may occur in non numerical order or be 
replaced by any alphanumeric characters up to and including the fifth column.

As many symbols/property types as required may be defined, with the proviso 
that the total number of characters per line does not exceed 80.

A number of property indices have been included with AMAS.   ``intra.pt'' and 
``extra.pt'' are general matrices based on the conservation matrix of Zvelebil 
et al (1987) which may be used to calculate patterns of overall conservation 
between a set of diverse sequence groups from an alignment.   ``intra.pt'' is 
used with alignment containing proteins which are thought not to contain 
disulphide bonded cysteines, cysteine is defined as small, tiny, hydrophobic 
and polar.   ``extra.pt'' is used with proteins thought to contain 
predominantly disulphide bonded cysteines, defining cysteine as small and 
hydrophobic.   ``ch.pt'' provides a means of detecting patterns of 
conservation of charged groups.   Using this matrix and the ignore negative 
conservation option (n), one can detect changes in conserved charge at a 
position between different subgroups using a conservation threshold of 2.

eg. amas -n seq.aln ch.pt groups.sg 2

An example property matrix is shown in Figure 1.3a:

*******************************************************************************
! Property matrix for proteins containing 
! disulphide bonded cysteines.
!
!    *ILVCAGMFYWHKREQDNSTP BZX**
! 
1     111111111111000000101001 Hydrophobic :  
2     000000001111111111101111 Polar       : Extra comments may be placed 
3     001111000000000111111001 Small       : here. 
4     000000000000000000011001 Proline     : 
5     000000000011100000001001 Positive    :  
6     000000000000010100001001 Negative    :       
7     000000000011110100001001 Charged     :  
8     000011000000000001001001 Tiny        :  
9     111000000000000000001001 Aliphatic   :  
10    000000011110000000001001 Aromatic    :  
!
!
				FIGURE 1.3a
*******************************************************************************



Appendix 2
++++++++++

Optional commands are entered as a string composed of single letter codes 
(listed below) as shown in the example command lines above.   If no options 
are required a '-' must be entered.

eg. chg1s and chr10g1s and ch10g2r10 and - are all legitimate option strings.  

Optional commands:

a: Ignore _atypical_ or unusual residues.   Like g and h, this command is 
   followed by an integer which, in this case, defines the percentage of 
   residues at any position which may be ignored.   This count includes any 
   gaps which may be ignored.
   eg.    xxxr10xx  (where x's represent other possible optional commands) up 
                    to 10% of the residues in a subgroup or pair of subgroups 
                    in the calculation of conservation values.

c: Alignments are _coloured_ by conservation.   

	        RED Identity across all sub-group
	      GREEN Conserved in one sub-group
	       BLUE Identical in one sub-group
               GREY Unconserved 
             ORANGE Similarities histogram (optional)
             VIOLET Differences histogram  (optional)

f: Enables a root _filename_ at the end of the command line to be used to name 
   output files from an AMAS run ie. the rootname "file" would lead to the   
   production of a file series "file.blc", "file.soa", "file.sum" and "file.ps"
   in standard AMAS usage. Can be used with `a'.

g: Number of _gaps_ which may be ignored per sub-group.   The number is entered 
   as an integer immediately after the g in the optional commands string.
   eg.  xxxxg2xxx (where x's represent other possible optional commands) would
		   lead to up to two gaps per sub-group being ignored. 

h: This option causes a frequency _histogram_ to be produced in place of the 
   conservation number report at the foot of the highlighted alignment. 
   An integer entered after the h determines the maximum height of a bar on the 
   histogram in characters.   The histogram is scaled to this height.   The 
   default is 10 characters.   A report of the mean of the pair conservation 
   values more than or equal to the conservation threshold is made for each
   position together with a histogram showing the frequency with which pairs of
   sub-groups at a position are conserved, the so called similarity plot.   A
   similar report of the mean of the pair conservation values less than the 
   threshold at each position is also made, together with a histogram showing 
   the frequency with which pairs of sub-groups at a position have dissimilar 
   properties, the so-called difference plot.   The similarity plot    
   distinguishes between frequency of identical and conserved pairs by the use 
   of dark and light shading respectively.    Using the option h with a
   value of 0 will lead to the omission of the histograms.   A full height bar 
   indicates a frequency of 100%, while an empty bar indicates 0%.
   eg.   xxxh6xx  (where x's represent other possible optional commands) would
		   lead to the presentation of the two histograms, each with
		   maximum bar height of 6 characters. 

i: Causes the sequence _identifiers_ contained in each group to be listed rather
   than the sequence numbers in the summary file.   This increases the length 
   of the file.

m: All amino acids appearing at unconserved sequence positions within an 
   alignment go unreported (_masked_) in the Alscript output.

n: Positive property conservation only is considered (think '_Not negative_' or
   '_Negative ignored_') 

o: _Other_ histogram option.   Default shows frequency with which pairs of sub-
   groups are conserved (similar) or different by the criteria of the 
   threshold set.   With 'o', the mean pairwise conservation at each sequence 
   position is reported, split into sores for similar and different pairs. 

p: Output Alscript in _portrait_ mode - default is landscape.

s: _Shading_ is used in addition to font changes to display conservation 
   patterns in Alscript output.   Especially useful when large font size or high
   quality (>300dpi.) output is possible.

     *Background  *Text	  Font		    Size

       Black      White   Helvetica-Bold   Default - Identical across all
					             sequences
     Dark Grey	  White   Helvetica-Bold   Default - Identical in one sub-group
     Light Grey	  Black	  Helvetica        Default - Conserved in one sub-group
     White	  Black	  Helvetica-Oblique 0.70   - Unconserved in one 
						      sub-group
     White	  Black	  Optima-Bold	    0.65   - Unconserved across all
					             sequences

   * made active by s option, otherwise background is white and test is black.
                 

t: Only the _text_ summary of conservation is provided.

u: Use _user's_ own title from AMAS.DEFAULTS file at the base of the Alscript
   output.   This may be used in conjunction with `f'.

v: Only the highlighted alignment is produced in the Alscript output.



Appendix 3
++++++++++

The results of the following examples are contained in the files within the aln+sg file in the files 1-C.

Example commandlines:

1:    	amas - seq.aln intra.pt groups.sg 8 10

This example performs the simplest level of analysis on the alignment seq.aln. 
The property index intra.pt is used because no disulphide bonded cysteines are 
expected in the alignment.   extra.sg would be used if disulphides were 
suspected.   The subgroupings chosen for the alignment are included in 
groups.sg.   The conservation threshold is set to 8 because the sequences are 
fairly similar.   The pointsize for the output is 10 because the alignment 
is relatively small.   The summary output will describe the general 
conservation of amino acids in the sequence as described above.   The Alscript 
output will distinguish conservation state using only font changes and pair 
similarity and difference are reported in their simplest form.   The Alscript 
plot will be given the title ``seq.aln''.

2:	amas t seq.aln intra.pt groups.sg 8

This example is identical to example 1 with the exception that the t option is 
used.   No Alscript output will be produced, so no pointsize is required.

3:      amas s seq.aln intra.pt groups.sg 8 10

This example is identical to example 1 with the exception that the s option is 
used, meaning that conservation status in the Alscript file will be identified 
by use of shading in addition to font changes.

4:	amas c seq.aln intra.pt groups.sg 8 10

As example 3 but colour is used in place of shading.   Shading and colour can 
not be used together.   Colour dominates if S and C are chosen.

5:      amas f seq.aln intra.pt groups.sg 8 10 example1

As example 1 but f option is invoked.   The file rootname ``example1'' will be 
applied to the output filenames and will be used as the title of the Alscript 
plot.

6:	amas g1s seq.aln intra.pt groups.sg 8 10

In this example the ``ignore gaps'' option is requested with a value of 1.   
Shading is also requested for the Alscript output.   One gap per subgroup or 
pair of sub-groups will be ignored in the calculation of conservation scores.

7:      amas a14s seq.aln intra.pt groups.sg 8 10

Similarly, in this example the r option is chosen with a value of 14.   Up to 
14% of the residues appearing in a subgroup or pair of subgroups at a position 
may be ignored in the calculation of the conservation scores. 

8:	amas a14g1s seq.aln intra.pt groups.sg 8 10

Here, both gaps and atypical residues are ignored.   The gap count is included 
in the calculation of the number of atypical residues being ignored.

9:      amas a14h8fg1s seq.aln intra.pt groups.sg 6 7.5 example

As 8.   The ``h'' option has been invoked with a value of 8.   The pair 
conservation scores will be replaced by a conservation histogram of maximum 
bar height 8 characters, as described above.

A:	amas a14vg1s seq.aln intra.pt groups.sg 8 10

Neither the pair conservation scores nor the histogram is produced when the v 
option is used.

B:	amas ma14vingp1s seq.aln intra.pt groups.sg 3 10

The sequence identifiers are used in the summary file to identify the sequences
in each group in place of the sequence numbers from the original alignment 
file when the option ``i'' is used.   The options m,n and p are explained 
above.   The order of appearance of options in the commandline is unimportant.

C:	amas nsh10m charge.aln ch.pt ch.sg 2 10	

This commandline demonstrates the use of the charge property index to find 
incidences of charge change between different subgroups of an alignment.   The 
alignment charge.aln has been split into the groups described in ch.sg.   The 
charge index ch.pt is used to find groups which conserve both (2) the property 
of being charged and the sign of that charge.   Only these groups are displayed
in the Alscript output due to the use of the ``m'' option (shaded - ``s'').    
Differences in charge are clearly identified by bars in the difference 
histogram. 




References
**********

1. Livingstone C.D. and Barton G.J. (1993)
              Protein Sequence Alignments: A Strategy for the Hierarchical 
	      Analysis of Residue Conservation
              CABIOS Vol. 9 No. 6 (745-756)


2. Barton G.J. (1993) 
	      Prot. Eng. [6], 37-40, (Protocols) Alscript: A Tool to Format 
	      Multiple Sequence Alignments.

3. Barton G.J. (1990)
	      Methods Enzymol. [183], 403-428, Protein Multiple Sequence 
	      Alignment and Flexible Pattern Matching.

4. Zvelebil M.J.J.M., Barton G.J., Taylor W.R. and Sternberg M.J.E. (1987)
	      J. Mol. Biol. [195], 957-961, Prediction of Protein Secondary 
	      Structure and Active Sites using the Alignment of Homologous 
	      Sequences.



Known Bugs
**********

1. The program does not check fully if an sg file matches the alignment on which
   it is used.   If the sg file is not correct, the program has a nasty habit of
   producing the most awful segmentation errors.   Watch it!

   The next version of amas will have simplified command line usage and a much
   more even coverage of error messages.