Next: System Requirements and Up: ALSCRIPT Previous: Introduction

Description of ALSCRIPT

ALSCRIPT takes a multiple sequence alignment in the simple AMPS [Barton \& Sternberg, 1987][Barton, 1990] block-file format and a set of formatting commands, and produces a PostScript file that may be printed on a PostScript laser printer or viewed using a PostScript previewer (e.g. Sun Microsystem's PageView program). GCG ``MSF'' format files and CLUSTAL format files (PIR) are also supported. ALSCRIPT is strictly a formatting, display and annotation tool. It is not a program for multiple sequence alignment, or editing. The previous programs that offer the closest functionality to ALSCRIPT are PRETTYPLOT and PRETTYBOX as supplied with the GCG package [Devereux et al., 1984], and the latest colour version of SOMAP [Parry-Smith \& Attwood, 1991]. Whilst these programs do not provide the same degree of flexibility in display as offered by ALSCRIPT, they do allow the calculation of consensus sequences and automatic shading/boxing according to defined rules. The aim of ALSCRIPT is to allow the user total control over the display of the sequence alignment; such control is essential if non-sequence information is to be used to highlight features of the sequence. For example, the location of active-site residues, the positions of secondary structures (- helices and - strands), and domain or intron/exon boundaries. The flexibility of ALSCRIPT also permits it to be used as a ``front-end'' display tool for programs that offer sophisticated alignment analysis features. For example, the AMAS (Analysis of Multiply Aligned Sequences; Livingstone &Barton, in preparation) system which highlights structurally important regions of a multiple alignment, generates ALSCRIPT commands as one output option. Similarly, the consensus algorithms encoded in the GCG program PRETTY could be used to generate ALSCRIPT shading, boxing or font-changing commands for graphical output.

Given a block-file and text pointsize, ALSCRIPT calculates how many residues can be fitted across the page, and how many sequences will fit down the page, it then prints the alignment at the chosen pointsize on as many pages as are needed. Running ALSCRIPT with a smaller or larger pointsize will automatically re-scale the alignment to fit on fewer or more pages as appropriate. The actual page dimensions may be re-set to any value, so if an A3 PostScript printer or typesetting machine is available, alignments can readily be scaled to make best use of the extra space.

Each output page has three regions. The left hand edge contains identifying text for each sequence, the main part of the page holds the alignment, and the top part, the position numbers and optional tick marks. ALSCRIPT commands make use of a character coordinate system for font changes, and other formatting commands. Thus, any residue in the alignment may be referred to by its sequence position number (x-axis) and sequence number (y-axis), ranges of residue positions, or sequences may also be defined in the character coordinate system.

The basic ALSCRIPT commands allow the following functionality:

Fonts: Any PostScript font at any size may be defined and used on individual residues, regions or identifier codes.

Boxing: Simple rectangular boxes may be drawn around any part of the alignment. Particular residue types may be selected and automatically ``surrounded'' by lines. For example, if the characters ``G'' and ``P'' are selected, then lines will not be drawn between G and P characters, but only where G and P border with other characters.

Shading: Grey shading of any level from black to white may be applied to any region of the alignment, either as a rectangular region, or as residue specific shading. e.g. to ``shade all Cys residues between positions 6 and 30''

Text: Specific text strings may be added to the alignment at any position and in any font or font size.

Lines: Horizontal or vertical lines may be drawn to the left, right, top or bottom of any residue position or group of positions.

Defaults: All defaults e.g maximum number/length of sequences, character spacing etc. may be modified using ALSCRIPT commands without the need to recompile the program.

Figure 1a Illustrates a small section of a multiple sequence alignment in AMPS block-file format. The sequence identifier codes stored above the alignment have been deleted for brevity. In this example, the block-file also contains character-based histograms representing the prediction of helix, strand and turn. Figure 1c illustrates the result of running ALSCRIPT on this file using the commands shown in Figure 1b. It is not suggested that this combination of options gives the clearest representation of the data, rather, the options have been chosen to illustrate many of the capabilities of the program.

Although written with the aim of producing figures for publication, ALSCRIPT is a useful research tool for interpreting multiple sequence alignments. For example, the boxing, shading and font changing facilities can be applied to highlight amino acids of a particular type and thus draw attention to clusters of positive or negative charge, hydrophobicity, and so on. Furthermore, computer programs for the automatic analysis of alignments can be made to produce ALSCRIPT formatting commands and a block file, thus simplifying the task of generating graphical representations of such analyses.



Next: System Requirements and Up: ALSCRIPT Previous: Introduction


gjb@bioch.ox.ac.uk