Next: Identity scoring Up: No Title Previous: Introduction

Amino acid scoring schemes

All algorithms to compare protein sequences rely on some scheme to score the equivalencing of each of the 210 possible pairs of amino acids. (i.e. 190 pairs of different amino acids + 20 pairs of identical amino acids). Most scoring schemes represent the 210 pairs of scores as a matrix of similarities where identical amino acids and those of similar character (e.g. I, L) give higher scores compared to those of different character (e.g. I, D). Since the first protein sequences were obtained, many different types of scoring scheme have been devised. The most commonly used are those based on observed substitution and of these, the 1976 Dayhoff matrix for 250 PAMS [1] has until recently dominanted. This and other schemes are discussed in the following sections.

Identity scoring
Genetic code scoring
Chemical similarity scoring
Observed substitutions
Which matrix should I use?

geoff.barton@ox.ac.uk