Next: Using the PDB Up: No Title Previous: ChainResidue and

Simple Prolog Clauses Describing Brookhaven Entries

A Fortran program (BRKSEQ), reads the PDB files and processes the necessary information to describe each protein entry by up to eleven different types of Prolog clauses:



header(Ident,List).
compnd(Ident,List).
source(Ident,List).
resolution(Ident,R).
chain(Ident,Chcode).
nchains(Ident,N).
chain_range(Chcode,[Cstart,CstartIN],[Cend,CendIN]).
chain_length(Chcode,Len).
residues(Chcode).
no_mainchain(Chcode).
no_sidechains(Chcode).

not all clauses need be present for a particular entry, as shown for the Immunoglobulin structure 2fb4.



header(2fb4,[immunoglobulin,18-apr-89,2fb4]).
compnd(2fb4,[immunoglobulin,fab]).
source(2fb4,[human,(homo,sapiens),myeloma,patient,kol,serum]).
resolution(2fb4,   1.900).
nchains(2fb4,    2).
chain(2fb4,2fb4l).
chain_range(2fb4l,[   1,-],[ 214,-]).
chain_length(2fb4l,  216).
residues(2fb4l).
chain(2fb4,2fb4h).
chain_range(2fb4h,[   1,-],[ 221,-]).
chain_length(2fb4h,  229).
residues(2fb4h).

The header, compnd, source and resolution clauses are extracted directly from the information stored at the beginning of every PDB file. The Ident is the PDB identification code for the protein (e.g. 9lyz), List is a Prolog list containing textual information, and R is the resolution of the structure in Ångstroms. The remaining seven clauses are derived from an analysis of the PDB ATOM records. The chain clauses link the PDB identification code to the chain code Chcode whilst nchains simply lists how many chains are present in the PDB entry. The nchains clause is included for simplicity, though is strictly unnecessary since a Prolog rule could be used to count the number of chain clauses present for each protein. For every chain clause, there is one chain_range clause which specifies the starting and ending residue numbers of the chain. Similarly, there is a chain_length clause that states the number of residues present in the chain (this clause is essential due to the alphanumeric residue numbering scheme used by the PDB). The residues clause identifies a chain as having amino acid residues other than UNK (or X), whilst the presence of no_mainchain or no_sidechains clauses for a chain shows that the protein entry is incomplete (some PDB entries only contain mainchain, or atoms).



gjb@bioch.ox.ac.uk