------------------------------------------------------------------------------- AMAS - Analysis of Multiply Aligned Sequences (c) University of Oxford (UK.) 1992 C.D.Livingstone* & G.J.Barton Laboratory of Molecular Biophysics The Rex Richards Building South Parks Road Oxford OX1 3QU UK Tel: (+44) 865 275368 Fax: (+44) 865 510454 CRAIG@BIOP.OX.AC.UK GEOFF@BIOP.OX.AC.UK or cdl@bioch.ox.ac.uk gjb@bioch.ox.ac.uk *Queries regarding AMAS should be directed to CDL, e-mail queries are more likely to be promptly dealt with. Please cite: Livingstone C.D. and Barton G.J. (1993) Protein Sequence Alignments: A Strategy for the Hierarchical Analysis of Residue Conservation CABIOS Vol. 9 No. 6 (745-756) *************************************** * AMAS User Guide * January 15th 1993 * *************************************** Contents: The AMAS Commandline. A Brief Description. Using AMAS. Related Programs. System Availability. Conclusion. Update History. Appendices. Files Included. The AMAS Commandline: -------------------- Requirements: AMAS requires the following in order to proceed... (see later text for details) Alignment blocfile: A multiple sequence alignment in AMPS blocfile vertical format (). See appendices for description. Property index: A table identifying each amino acid in the blocfile in terms of its possession or lack of one or more properties in binary format (). See appendices for description. Sensible groups file: File defining sub-groups of aligned sequences within the main alignment (). See appendices for description. Conservation threshold: A number describing, in terms of the property index, how similar a set of amino acids must be before it is considered to show conservation. See later text for a full explanation. Options: These may be included if required: Fontsize: Size in units of points (0.0148 inches) of the largest typeface in the alignment (determines the dimensions of the "pretty" output format. File rootname: Name by which all of the output files will be called, ie. the rootname "file" would lead to the production of a file series "file.blc", "file.soa", "file.sum" and "file.ps" in standard AMAS usage. Options: A number of options exist to modify the standard output of the AMAS program. See appendices for description. If no options are set a ``-'' must be substituted in this field. The command line takes the form: amas [options:a*cdefg*h*imnoprstuv][size] [file rootname] ***For EXAMPLE: *** 1. Change directory to folder "first.light" 2. Enter the command line: amas sf seq.aln intra.pt groups.sg 8 10 amasout 3. Output will be sent to the files: amasout.blc amasout.soa amasout.sum amasout.ps A number of further example command lines for the analysis of included alignments are given in Appendix 3 with full explanations of their results. See below for further explanation. A Brief Description ------------------- AMAS is a program which performs a systematic characterisation of the physico- chemical properties seen at each position in a multiple protein sequence alignment. A flexible set-based description of amino acid properties is used to define the conservation between any group of amino acids. Sequences in the alignment are gathered into sub-groups on the basis of sequence similarity, functional similarity, or other criteria. All pairs of sub-groups are then compared to highlight positions that confer the unique features of each sub-group. Graphical output using the Alscript program (Barton, 1993) is available for rapid interpretation, while a more complex and detailed text summary is available for more abstruse queries. Setting Up AMAS: --------------- Most of the installation process is straight forward. Choose a location for the distributed folders and put them there! A makefile is included for Sun Microsystems and Silicon Graphics machines, enter the c folder and use make. A discrip.mms is shipped with the VMS version. Enter the [.amas.c] directory and type "mms". Unix: ++++ Two environment variables must be defined before using AMAS. One points to the location of the default AMAS.DFS defaults file and the other points to the location of the directory containing the property indices, ie. setenv AMASDEFAULTS ~cdl/c/amas/distribute/AMAS.DFS setenv AMASPTYPE ~cdl/c/amas/distribute/ptype You could put these in your .cshrc. Change the environment variables to point to your own property table directories/defaults files as necessary. UPDATES: ======= Version 1.65 and greater The AMAS sequence analysis program now requires 3 environment variables to be set before use: AMASDEFAULTS AMASPTYPE ALSCRIPTCOMMAND <----- the new one suggested values: /home/orac/cdl/c/amas/distribute/AMAS.DFS /home/orac/cdl/c/amas/distribute/ptype "nice -19 alscript -f" respectively. VAX/VMS(Alpha): ++++++++++++++ Include equivalents for the following lines in your login.com: $! set up amas $! $ amas :== $ $c:[craig.amas.c]amas $ AMASPTYPE == "$c:[craig.amas.ptype]" $ AMASDEFAULTS == "$c:[craig.amas]amas.dfs" $! $! $! set up alscript $! $ass $b:[geoff.alscript] alsdir $alscript :== "run alsdir:alscript.exe" $msf2blc :== "run alsdir:msf2blc.exe" $clus2blc :== "run alsdir:clus2blc.exe" $alsnum == "$lmb$dua3:[geoff.alscript]alsnum.exe" $! $! Using AMAS: ---------- The first requirement of AMAS is a sequence alignment in vertical format (see Appendix 1.1). This type of alignment may be created by the AMPS package of Barton (1990) or existing alignments in other standard formats may be converted using one of Alscript's accompanying programs (Barton, 1993). The starting point for hierarchical conservation analysis is the identification of two or more sub-sets of sequences within a multiple sequence alignment. The subsets may be defined by grouping on the basis of overall sequence similarity, by functional similarity, origin, or other criteria. Given such groupings, the aim is to highlight which residue positions define the unique properties of each group. ******************************************************************************* A B ________________ ______ ] | 1 | GTLYSILDIQ | __| ]] | | | __| |______ ]]] | 2 | GTAYPVMEVY | | | ]]]] A | | | __________| |_________ ]]] | 3 | GTAYLILDLA | | | ]] | | | | |____________ ] | 4 | GTTYFILDLR | | |___|____________| | ______________ ] | 5 | GTDTCVMELE | | | ]] | | | | | ______ ]]] | 6 | GGDTCCLDLA | | | __| ]]]] B | | | | ____| | |______ ]]] | 7 | GVDSVIMEFL | | | |____| ]] | | | | | |_________ ] | 8 | GGDSCLIDMS | | | |___|____________| |___| _______ ] | 9 | GTQYGLKRFV | |___________| ]] C | | | |_______ ] |10 | GTQFAIMQP | ________________________ |___|____________| \ \ \ \ 25 50 75 100 (%ID) FIGURE 1 ******** ******************************************************************************** For example, Figure 1 shows a fragment of a multiple sequence alignment of 10 sequences (B) in which the percentage identity between the sequences has been determined by single linkage cluster analysis, the results of which are displayed using an unrooted dendrogram (B). This analysis clearly shows the sequences to be split into (2 or) 3 clusters or sub-groups at around 30% identity. If this grouping is consistent with other observations about the family of proteins this would be a suitable sub-grouping scheme for an AMAS analysis. Many sequence alignment packages (including the AMPS, program ORDER) will performs this type of analysis. Once groups of sequences are recorded in an "sg" file for use by the AMAS program (see appendix 1.2). A property table appropriate to the alignment and to the question being posed is then chosen. Property tables may be easily redefined by the user to answer specific questions. (see appendix 1.3). An appropriate conservation threshold is chosen for the analysis. Useful values for the 10 property matrices intra.pt and extra.pt (included) lie between 6 and 8 depending on the overall similarity of all of the sequences in the alignment. If the overall sequence similarity is low, a less stringent threshold is used, eg. 6 (six or more shared properties out of 10 define a set of amino acids as conserved). With similar sequences a more stringent value, eg. 8, is chosen. A pointsize (0.0148 inches = 1 point) must be defined for the default height of each character in the Alscript output (10 points is about the height of text in a paperback novel). Alscript output is selected by default but may be turned-off by using the t(T) option. With these criteria satisfied, a simple command line can be constructed in order to execute AMAS: amas - seq.aln intra.pt groups.sg 6 10 example 1a or amas t seq.aln intra.pt groups.sg 6 example 1b These would perform an analysis of the sequence alignment ``seq.aln'' on the basis of the property index ``intra.pt'' using the subgroupings of the sequences in the alignment defined in ``groups.sg'' with a conservation threshold of 6. The - indicates no additional program options have been set. The files to perform this analysis are included. The output files will take the names seq.aln... eg. seq.aln.soa command file for program Alscript seq.aln.blc new alignment blocfile containing sequences in new, sub-grouped order seq.aln.sum text summary of conservation in the alignment seq.aln.ps Alscript output (PostScript file - can be printed directly on a PostScript printer or previewed with an appropriate PostScript previewer, eg. Sun Microsystems' pageview software) AMAS automatically launches Alscript, if installed, producing the .ps file automatically from the .soa file. The .soa file may be edited by experienced Alscript users in order to introduce extra information. The .blc file is required by the .soa file for operation. The .sum file contains a summary of conservation as shown in the output of example 1 (see example.sum for comparison, containing the results of the analysis in Appendix 3, example 9). The summary file records the sequences in each sub-group, the number of gaps and atypical residues ignored in an analysis, if any (see below), and the conservation threshold set by the user. The file lists all positions which show identities across all the sequences from the alignment included in the analysis, all pairs of subgroups showing identities and all sub-groups containing only one type of residue. Positions conserving properties across all analysed sequences are detailed next, showing the properties conserved in all subgroups and the positively conserved properties which differ between subgroups. The percentage of sequences in each subgroup possessing a "different" property is reported. Conservation and difference between pairs of conserved subgroups at a position is reported in a similar manner. Conserved pairs share at least the number of properties defined by the threshold, while different pairs are themselves conserved, but which share fewer properties than the threshold between them. Positions where individual groups conserve properties are reported next, followed by a list of unconserved groups and completely unconserved positions (by the threshold criterion). The Alscript output is very flexible. Output is in the form of a PostScript file which may be printed on any common PostScript printer or visualised using a PostScript previewer (eg. PageView V3, sun microsystems). A well formatted copy of the sequences analysed in the program is shown in the form of an alignment split into the groups defined by the "sg" file. By default, each sequence position is presented in a font which indicates whether it is identical across all analysed sequences (boldface white on black), an identity in one group (Boldface), conserved in one group (plain), is unconserved in one group (smaller italics) or is unconserved across all sequences (smaller, different font). Conserved and identical regions within subgroups are boxed by the program. These distinctions may be made more clear using shading or colouring schemes. A table may be presented at the foot of the alignment detailing the degree of conservation between pairs of conserved sequences, the upper section showing similarities, and the lower showing differences, or this information may be presented in the form of a histogram (see the ``h'' option described in Appx. 2). The title of the alignment is, by default, the title of the original alignment file, or the file rootname supplied when using the ``f'' option (Appx. 2). The fonts, colours, shading levels and the title at the base of the alignment may be changed by editing the AMAS.DEFAULTS file included with the program. The histogram of conservation has been found to coincide well with observed secondary structure and may have predictive value (see SH2_analysis.aln series included). Related Programs: ---------------- The AMPS package (Barton, 1990). This performs multiple sequence alignments and databank scanning. The package is currently only available for Sun, Silicon Graphics and VAX/VMS systems. Contact the author for details. ALSCRIPT (Barton, 1993) allows shading, boxing and colouring to be applied to an alignment in AMPS format. Routines are provided for conversion of other alignment formats. In response to a command file containing a set of formatting commands, ALSCRIPT produces a PostScript file which may be printed on a PostScript laser printer or viewed using a PostScript previewer (e.g. Sun Microsystem's PageView program). Alscript is NOT a multiple sequence alignment program, nor is it an alignment editor. System Availability: ------------------- AMAS has been used on both Silicon Graphics Indigo and Sun SPARCstation platforms under the UNIX operating system. A Digital ALPHA/VMS version is now available and an IBM PC compatible version is under preparation. AMAS, Alscript and AMPS are freely available to the academic community from the authors under a (fairly) standard licensing agreement. Enquiries from prospective industrial users are welcomed. Conclusion ---------- Whilst all these features identified by AMAS can be found by inspection of the alignment, the process is laborious and error-prone. The strategy described here reduces the scope for error, allows alternative sub-groupings to be rapidly investigated, and provides structurally relevant shading and boxing. For additional information on the method and the ideas behind it please see Livingstone and Barton 1993. Update History: -------------- Version 1.0 - 30th June 1992 Version 1.1 - 3rd July 1992 Version 1.2 - 29th July 1992 Version 1.3 - 27th Oct 1992 Version 1.4 - 13th Jan 1993: Many older subroutines revised for speed and clarity. Colour and mask option introduced. Option dealing with unusual residues added. Version 1.5 - 10th Feb 1993: Minor bug fixes, changes to accord with new version of Alscript. Version 1.6 - 1st Apr 1993: Colour bug fixes, pair conservation calculation modified, positive property conservation markings modified, command line options modified, Alscript labelling improved. Version 1.66 - 1st May 1995 Previous versions of AMAS contain errors in the property table reading routine. Only the first eighty characters of each line are read initially. Subsequent characters result in the prodiction of additional *unintended* property descriptions. Users of old versions should restrict descriptions to less than eighty characters _including_ comments. This version reads descriptions up to 500 characters wide. The VMS version has NOT been upgraded yet! Version 1.67 - 24th May 1995 Similar "bugfix" to 1.66. SG files accept a new metacharacter ":" at the end of lines which indicates that the following line is to be condisered part of the same group. Lines in SG files retain an 80 character limit. Version 2.0 - Available sometime... Documentation modified 03 May 95. VAX/VMS Version finally implemented 19 Oct 93 (Alphas only)!!! Documentation modified 30 Nov 93. APPENDIX 1 ++++++++++ Appendix 1.1 Alignment Blocfile () Format: ---------------------------------------------------- Taken from AMPS (a users guide) by Geoff Barton (G.J.B. 1990): This defines a multiple alignment in vertical format. The print_vertical command produces a file in block_file format. The minimum requirements for a block_file for N aligned sequences are 1. N '>comment line(s)' 2. '* iteration int' 3. 'N or more vertically aligned sequences' 4. '*' 1. The comment lines define the sequence identifiers and the number of '>' characters preceding the first '* iteration int' line define the number of sequences that are defined in the sequence lines. 2. This line specifies the beginning of the alignment to be read. The '*' character specifies the column in which the alignment begins. The 'iteration int' specifier identifies the particular alignment within this block_file. Several alignments may follow each other providing they are identified by a different iteration number (eg. 1,2,3). 3. The alignment is ended by a '*' character which should be in the same column as the '*' character that started the alignment. Simple example: This is a block file containing an align of three sequences. The comments that I an writing here may appear in the block file, but are ignored by MULTALIGN when the file is read. The only proviso is that no 'greater than' or 'star' characters must be present. Identifier: Title: >first this is sequence A >second this is sequence B >third this is sequence C * iteration 1 a a p avg llg lcr g pg www s * Appendix 1.2 Sensible Groups (sgfile) File Format: ------------------------------------------------- Sub-group information is stored in an "sg" (sensible groups) file for use in AMAS. Groups are defined by their sequence numbers, ie. the order in which they appear in the original alignment blocfile. The format of an sg file follows the pattern shown in Figure 1.2a: ******************************************************************************* ! sg file for alignment in Figure 1 ! ! Comments may be entered on any line which begins with an exclamation mark. ! These lines may occur at ANY position in the sg file, ie. before any group ! definitions... ! 1-4 ! ! between group definitions... ! 5-8 9,10 ! ! or at the end. FIGURE 1.2a *********** ******************************************************************************* Ranges of sequences may be defined, as may lists of sequences with each sequence separated by a comma. Lists and ranges of sequences need not be in numerical order and may span other groups or contain members of other groups. Ranges and lists may be combined in the same group definition as shown in Figure 1.2a, an sg file for a fictitious alignment: ******************************************************************************** ! sg file for imaginary alignment of 29 rhubarbase sequences ! ! first group: sequences 1,2,3 and 4: ! 1-4 ! ! second group: sequences 5,7,10,11,12,13,17 and 21: ! 5,7,9,10-13,17,21 ! ! third group: sequences 6,8,9,14,15,16,18,19,20 ! 6,8,9,14-16,18-20 ! ! fourth group (contains all sequences, including those called already): 1-29 ! ! end of sg file FIGURE 1.2b *********** ******************************************************************************* Appendix 1.3 Property Index (ptype) File Format: ----------------------------------------------- Property indices define each amino acid in an alignment in terms of a set of user defined properties see Figure 1.3a). The file contains three main fields: the single letter amino acid codes for each of the amino acids in the alignment, a binary matrix showing property set membership (0 = non-member, 1= member), and a list of property names. Comments may be added to the right of the colons defining the end of each property name. Lines beginning with exclamation marks may be used for comments with the exception of the line containing the three asterisks and the single letter amino acid codes which MUST occur before the matrix. The first asterisk occurs column five and defines the location of the first single letter code. The column containing the first single letter amino acid code is also the first column of the property set membership matrix. The second asterisk occurs in the column following the last single letter code. The column containing the last single letter code is also the last column of the matrix. The third asterisk defines the column in which the first letter of each property name occurs. Each entry in the property set membership matrix defines the property membership of the amino acid in whose column it occurs with respect to the property in whose row it is placed. The property names are given to the right of the row of matrix entries to which they refer. 14 characters are allowed for each property name entry. Longer names are allowed but these will disrupt the text summary output. Names may contain any alphanumeric characters and punctuation but must not contain space. Numbers at the left of the property matrix are disregarded but provide a useful guide to the property numbering system used in the program when debugging. The numbers may, therefore be omitted, may occur in non numerical order or be replaced by any alphanumeric characters up to and including the fifth column. As many symbols/property types as required may be defined, with the proviso that the total number of characters per line does not exceed 80. A number of property indices have been included with AMAS. ``intra.pt'' and ``extra.pt'' are general matrices based on the conservation matrix of Zvelebil et al (1987) which may be used to calculate patterns of overall conservation between a set of diverse sequence groups from an alignment. ``intra.pt'' is used with alignment containing proteins which are thought not to contain disulphide bonded cysteines, cysteine is defined as small, tiny, hydrophobic and polar. ``extra.pt'' is used with proteins thought to contain predominantly disulphide bonded cysteines, defining cysteine as small and hydrophobic. ``ch.pt'' provides a means of detecting patterns of conservation of charged groups. Using this matrix and the ignore negative conservation option (n), one can detect changes in conserved charge at a position between different subgroups using a conservation threshold of 2. eg. amas -n seq.aln ch.pt groups.sg 2 An example property matrix is shown in Figure 1.3a: ******************************************************************************* ! Property matrix for proteins containing ! disulphide bonded cysteines. ! ! *ILVCAGMFYWHKREQDNSTP BZX** ! 1 111111111111000000101001 Hydrophobic : 2 000000001111111111101111 Polar : Extra comments may be placed 3 001111000000000111111001 Small : here. 4 000000000000000000011001 Proline : 5 000000000011100000001001 Positive : 6 000000000000010100001001 Negative : 7 000000000011110100001001 Charged : 8 000011000000000001001001 Tiny : 9 111000000000000000001001 Aliphatic : 10 000000011110000000001001 Aromatic : ! ! FIGURE 1.3a ******************************************************************************* Appendix 2 ++++++++++ Optional commands are entered as a string composed of single letter codes (listed below) as shown in the example command lines above. If no options are required a '-' must be entered. eg. chg1s and chr10g1s and ch10g2r10 and - are all legitimate option strings. Optional commands: a: Ignore _atypical_ or unusual residues. Like g and h, this command is followed by an integer which, in this case, defines the percentage of residues at any position which may be ignored. This count includes any gaps which may be ignored. eg. xxxr10xx (where x's represent other possible optional commands) up to 10% of the residues in a subgroup or pair of subgroups in the calculation of conservation values. c: Alignments are _coloured_ by conservation. RED Identity across all sub-group GREEN Conserved in one sub-group BLUE Identical in one sub-group GREY Unconserved ORANGE Similarities histogram (optional) VIOLET Differences histogram (optional) f: Enables a root _filename_ at the end of the command line to be used to name output files from an AMAS run ie. the rootname "file" would lead to the production of a file series "file.blc", "file.soa", "file.sum" and "file.ps" in standard AMAS usage. Can be used with `a'. g: Number of _gaps_ which may be ignored per sub-group. The number is entered as an integer immediately after the g in the optional commands string. eg. xxxxg2xxx (where x's represent other possible optional commands) would lead to up to two gaps per sub-group being ignored. h: This option causes a frequency _histogram_ to be produced in place of the conservation number report at the foot of the highlighted alignment. An integer entered after the h determines the maximum height of a bar on the histogram in characters. The histogram is scaled to this height. The default is 10 characters. A report of the mean of the pair conservation values more than or equal to the conservation threshold is made for each position together with a histogram showing the frequency with which pairs of sub-groups at a position are conserved, the so called similarity plot. A similar report of the mean of the pair conservation values less than the threshold at each position is also made, together with a histogram showing the frequency with which pairs of sub-groups at a position have dissimilar properties, the so-called difference plot. The similarity plot distinguishes between frequency of identical and conserved pairs by the use of dark and light shading respectively. Using the option h with a value of 0 will lead to the omission of the histograms. A full height bar indicates a frequency of 100%, while an empty bar indicates 0%. eg. xxxh6xx (where x's represent other possible optional commands) would lead to the presentation of the two histograms, each with maximum bar height of 6 characters. i: Causes the sequence _identifiers_ contained in each group to be listed rather than the sequence numbers in the summary file. This increases the length of the file. m: All amino acids appearing at unconserved sequence positions within an alignment go unreported (_masked_) in the Alscript output. n: Positive property conservation only is considered (think '_Not negative_' or '_Negative ignored_') o: _Other_ histogram option. Default shows frequency with which pairs of sub- groups are conserved (similar) or different by the criteria of the threshold set. With 'o', the mean pairwise conservation at each sequence position is reported, split into sores for similar and different pairs. p: Output Alscript in _portrait_ mode - default is landscape. s: _Shading_ is used in addition to font changes to display conservation patterns in Alscript output. Especially useful when large font size or high quality (>300dpi.) output is possible. *Background *Text Font Size Black White Helvetica-Bold Default - Identical across all sequences Dark Grey White Helvetica-Bold Default - Identical in one sub-group Light Grey Black Helvetica Default - Conserved in one sub-group White Black Helvetica-Oblique 0.70 - Unconserved in one sub-group White Black Optima-Bold 0.65 - Unconserved across all sequences * made active by s option, otherwise background is white and test is black. t: Only the _text_ summary of conservation is provided. u: Use _user's_ own title from AMAS.DEFAULTS file at the base of the Alscript output. This may be used in conjunction with `f'. v: Only the highlighted alignment is produced in the Alscript output. Appendix 3 ++++++++++ The results of the following examples are contained in the files within the aln+sg file in the files 1-C. Example commandlines: 1: amas - seq.aln intra.pt groups.sg 8 10 This example performs the simplest level of analysis on the alignment seq.aln. The property index intra.pt is used because no disulphide bonded cysteines are expected in the alignment. extra.sg would be used if disulphides were suspected. The subgroupings chosen for the alignment are included in groups.sg. The conservation threshold is set to 8 because the sequences are fairly similar. The pointsize for the output is 10 because the alignment is relatively small. The summary output will describe the general conservation of amino acids in the sequence as described above. The Alscript output will distinguish conservation state using only font changes and pair similarity and difference are reported in their simplest form. The Alscript plot will be given the title ``seq.aln''. 2: amas t seq.aln intra.pt groups.sg 8 This example is identical to example 1 with the exception that the t option is used. No Alscript output will be produced, so no pointsize is required. 3: amas s seq.aln intra.pt groups.sg 8 10 This example is identical to example 1 with the exception that the s option is used, meaning that conservation status in the Alscript file will be identified by use of shading in addition to font changes. 4: amas c seq.aln intra.pt groups.sg 8 10 As example 3 but colour is used in place of shading. Shading and colour can not be used together. Colour dominates if S and C are chosen. 5: amas f seq.aln intra.pt groups.sg 8 10 example1 As example 1 but f option is invoked. The file rootname ``example1'' will be applied to the output filenames and will be used as the title of the Alscript plot. 6: amas g1s seq.aln intra.pt groups.sg 8 10 In this example the ``ignore gaps'' option is requested with a value of 1. Shading is also requested for the Alscript output. One gap per subgroup or pair of sub-groups will be ignored in the calculation of conservation scores. 7: amas a14s seq.aln intra.pt groups.sg 8 10 Similarly, in this example the r option is chosen with a value of 14. Up to 14% of the residues appearing in a subgroup or pair of subgroups at a position may be ignored in the calculation of the conservation scores. 8: amas a14g1s seq.aln intra.pt groups.sg 8 10 Here, both gaps and atypical residues are ignored. The gap count is included in the calculation of the number of atypical residues being ignored. 9: amas a14h8fg1s seq.aln intra.pt groups.sg 6 7.5 example As 8. The ``h'' option has been invoked with a value of 8. The pair conservation scores will be replaced by a conservation histogram of maximum bar height 8 characters, as described above. A: amas a14vg1s seq.aln intra.pt groups.sg 8 10 Neither the pair conservation scores nor the histogram is produced when the v option is used. B: amas ma14vingp1s seq.aln intra.pt groups.sg 3 10 The sequence identifiers are used in the summary file to identify the sequences in each group in place of the sequence numbers from the original alignment file when the option ``i'' is used. The options m,n and p are explained above. The order of appearance of options in the commandline is unimportant. C: amas nsh10m charge.aln ch.pt ch.sg 2 10 This commandline demonstrates the use of the charge property index to find incidences of charge change between different subgroups of an alignment. The alignment charge.aln has been split into the groups described in ch.sg. The charge index ch.pt is used to find groups which conserve both (2) the property of being charged and the sign of that charge. Only these groups are displayed in the Alscript output due to the use of the ``m'' option (shaded - ``s''). Differences in charge are clearly identified by bars in the difference histogram. References ********** 1. Livingstone C.D. and Barton G.J. (1993) Protein Sequence Alignments: A Strategy for the Hierarchical Analysis of Residue Conservation CABIOS Vol. 9 No. 6 (745-756) 2. Barton G.J. (1993) Prot. Eng. [6], 37-40, (Protocols) Alscript: A Tool to Format Multiple Sequence Alignments. 3. Barton G.J. (1990) Methods Enzymol. [183], 403-428, Protein Multiple Sequence Alignment and Flexible Pattern Matching. 4. Zvelebil M.J.J.M., Barton G.J., Taylor W.R. and Sternberg M.J.E. (1987) J. Mol. Biol. [195], 957-961, Prediction of Protein Secondary Structure and Active Sites using the Alignment of Homologous Sequences. Known Bugs ********** 1. The program does not check fully if an sg file matches the alignment on which it is used. If the sg file is not correct, the program has a nasty habit of producing the most awful segmentation errors. Watch it! The next version of amas will have simplified command line usage and a much more even coverage of error messages.