There are 7102 clauses defining the November 1990 release of the Brookhaven PDB. These provide a very simple database that may be read directly into the Prolog system and interrogated by writing simple Prolog queries. For example, the following query, also known as a goal could be typed in at the Prolog prompt (| ?-).
| ?- resolution(PID,R),R < 2, R > 0,chain(PID,CID),residues(CID).
The query will return the protein identifier (PID), chain identifier (CID) and crystallographic resolution (R) for each chain in the databank whose entry is less than 2 Å resolution and has amino acid residues deposited.
The Prolog interpreter attempts to satisfy this query as follows. Firstly Prolog looks in its database for facts called resolution. The first fact is found and the variables PID and R unified with the arguments of the clause. The value of R is then tested to see if it is less than 2. If it is, then the test is made to see if R is greater than zero. If this succeeds, then a chain clause is looked up in the database that unifies with the current value of PID. Finally, if the chain clause is found, a residues clause is looked up in the database that unifies with the value of CID.
The query can fail at any stage. For example, if no residues fact is found that contains the current CID, then the goal fails. Prolog then starts a process of backtracking to search for a possible solution. The interpreter would first look for another chain fact. If present, then this would unify CID with the value shown in the chain fact, again the database would be checked for a corresponding residues clause. If all elements of the query succeed, then the values of PID,CID and R are displayed. The entire query may be forced to search for alternative solutions by typing a semicolon. For example, the following are the first three solutions to the query shown.
| ?- resolution(PID,R),R < 2, R > 0,chain(PID,CID),residues(CID). PID = CID = 1alc, R = 1.7 ; %First solution found - type ';' % to force backtracking PID = 1amt, R = 1.5, CID = 1amta ; %Second solution - type ';' again for next solution PID = 1amt, R = 1.5, CID = 1amtb % ... and so on ...
If we often want to select protein chains by the criteria shown in this query, it is simple to build the query into a general purpose Prolog rule. For example, the rule called select_chains:
Comments start with a % select_chains(Rmin,Rmax,CID):- resolution(PID,R), % look up resolution R < Rmax, % resolution below Rmax R > Rmin, % resolution above Rmin chain(PID,CID), % find a chain identifier for this PID residues(CID). % check the chain has residues
We can now type:
| ?- select_chains(0,2,CID).
at the prolog prompt to find out which chains satisfy our criteria. Having established this new rule, we can then use it in further queries. For example:
| ?- select_chains(2,3,CID),chain_length(CID,Len),Len >= 150.
will return the chain identifier and length for chains belonging to entries that are of between 2 and 3 Å resolution and where the chain length is at least 150 amino acids.