ALSCRIPT takes a multiple sequence alignment in the simple AMPS
[Barton \& Sternberg, 1987][Barton, 1990] block-file format and a set of formatting
commands, and produces a PostScript file that may be printed on a
PostScript laser printer or viewed using a PostScript previewer (e.g.
Sun Microsystem's PageView program). GCG ``MSF'' format files and
CLUSTAL format files (PIR) are also supported. ALSCRIPT is strictly a
formatting, display and annotation tool. It is not a program for
multiple sequence alignment, or editing. The previous programs that
offer the closest functionality to ALSCRIPT are PRETTYPLOT and
PRETTYBOX as supplied with the GCG package [Devereux et al., 1984], and the latest
colour version of SOMAP [Parry-Smith \& Attwood, 1991]. Whilst these programs do not
provide the same degree of flexibility in display as offered by
ALSCRIPT, they do allow the calculation of consensus sequences and
automatic shading/boxing according to defined rules. The aim of
ALSCRIPT is to allow the user total control over the display of the
sequence alignment; such control is essential if non-sequence
information is to be used to highlight features of the sequence. For
example, the location of active-site residues, the positions of
secondary structures (- helices and
- strands), and domain or
intron/exon boundaries. The flexibility of ALSCRIPT also permits it to be
used as a ``front-end'' display tool for programs that offer
sophisticated alignment analysis features. For example, the AMAS
(Analysis of Multiply Aligned Sequences; Livingstone &Barton, in
preparation) system which highlights structurally important regions of
a multiple alignment, generates ALSCRIPT commands as one output
option. Similarly, the consensus algorithms encoded in the GCG program PRETTY
could be used to generate ALSCRIPT shading, boxing or font-changing
commands for graphical output.
Given a block-file and text pointsize, ALSCRIPT calculates how many residues can be fitted across the page, and how many sequences will fit down the page, it then prints the alignment at the chosen pointsize on as many pages as are needed. Running ALSCRIPT with a smaller or larger pointsize will automatically re-scale the alignment to fit on fewer or more pages as appropriate. The actual page dimensions may be re-set to any value, so if an A3 PostScript printer or typesetting machine is available, alignments can readily be scaled to make best use of the extra space.
Each output page has three regions. The left hand edge contains identifying text for each sequence, the main part of the page holds the alignment, and the top part, the position numbers and optional tick marks. ALSCRIPT commands make use of a character coordinate system for font changes, and other formatting commands. Thus, any residue in the alignment may be referred to by its sequence position number (x-axis) and sequence number (y-axis), ranges of residue positions, or sequences may also be defined in the character coordinate system.
The basic ALSCRIPT commands allow the following functionality:
Fonts: Any PostScript font at any size may be defined and used on individual residues, regions or identifier codes.
Boxing: Simple rectangular boxes may be drawn around any part of the alignment. Particular residue types may be selected and automatically ``surrounded'' by lines. For example, if the characters ``G'' and ``P'' are selected, then lines will not be drawn between G and P characters, but only where G and P border with other characters.
Shading: Grey shading of any level from black to white may be applied to any region of the alignment, either as a rectangular region, or as residue specific shading. e.g. to ``shade all Cys residues between positions 6 and 30''
Text: Specific text strings may be added to the alignment at any position and in any font or font size.
Lines: Horizontal or vertical lines may be drawn to the left, right, top or bottom of any residue position or group of positions.
Defaults: All defaults e.g maximum number/length of sequences, character spacing etc. may be modified using ALSCRIPT commands without the need to recompile the program.
Figure 1a Illustrates a small section of a multiple sequence alignment in AMPS block-file format. The sequence identifier codes stored above the alignment have been deleted for brevity. In this example, the block-file also contains character-based histograms representing the prediction of helix, strand and turn. Figure 1c illustrates the result of running ALSCRIPT on this file using the commands shown in Figure 1b. It is not suggested that this combination of options gives the clearest representation of the data, rather, the options have been chosen to illustrate many of the capabilities of the program.
Although written with the aim of producing figures for publication, ALSCRIPT is a useful research tool for interpreting multiple sequence alignments. For example, the boxing, shading and font changing facilities can be applied to highlight amino acids of a particular type and thus draw attention to clusters of positive or negative charge, hydrophobicity, and so on. Furthermore, computer programs for the automatic analysis of alignments can be made to produce ALSCRIPT formatting commands and a block file, thus simplifying the task of generating graphical representations of such analyses.