Next: Program limits Up: No Title Previous: Flexible Pattern Matching

Introduction

This section describes how to use AMPS to perform flexible pattern matching by the method of Barton and Sternberg (1990). This approach can help identify weaker similarities between proteins that would otherwise be missed by conventional sequence comparison methods. AMPS may also be used to scan the database with a multiple sequence alignment using the Needleman-Wunsch algorithm - normally this is not as effective as deriving a pattern and scanning with that.

PLEASE NOTE: It is important that you are familiar with the AMPS alignment features described above, before attempting flexible pattern matching.

AMPS allows the following operations to be performed:

  1. Definition of a pattern representative of a particular protein fold including the explicit description of allowed flexibility in gap length between defined regions.

  2. A variety of scoring systems for each element of the pattern. eg. based on frequency weights, Dayhoff's matrix, conservation or fully user defined weights.

  3. Scanning of the pattern against a database of protein sequences, subsequennt rank ordering and display of the results of the scan.

  4. Detailed analysis of a single sequence for the presence of multiple occurences of the pattern. Calculation of the significance of the best matching pattern by reference to randomized sequences.

These points will be illustrated by reference to examples.

A typical pattern analysis might follow the following steps:

  1. Define a pattern and choose a scoring scheme.
  2. Scan the pattern against the database (PROGRAM MULTALIGN).
  3. Sort the results (PROGRAM SORTER).
  4. Get ID's of interesting proteins (PROGRAM SORTER).
  5. Extract sequences of interesting proteins from the database (PROGRAM SELECT).
  6. Align pattern to the proteins - including alternative alignments (PROGRAM MULTALIGN).
  7. Produce compressed output for inspection (PROGRAM PATT)

Not every stage need be performed. For example, if we already know the subset of the protein database that is interesting, then steps 1-4 can be avoided.




Next: Program limits Up: No Title Previous: Flexible Pattern Matching


gjb@bioch.ox.ac.uk