Although many questions may be answered by regarding the protein structure at the residue level, some analyses require access to the individual atomic coordinates. For example, the location of close approaches between residue sidechains to identify hydrophobic or electrostatic interactions. The analysis of all atoms creates several additional complications:
A simple strategy for the representation of all-atom sets in Prolog was adopted, whereby each atom is represented by a Prolog fact of the form:
brk(I,RN,IN,ATYPE,CID,RTYPE,ATTYPE,XYZ)
where I is the atom number, RN is the residue number (eg. 2), IN is the residue number insertion code (eg. "-" for no insertion code); ATYPE is either atm, or het , for protein ATOM or HETATM records; CID is the chain identifier code (eg. "1fb4l"); RTYPE is the amino acid type in three letter code (eg. val); ATTYPE is the atom type as a list including the atom insertion code (eg. [cg1,-]) and XYZ is the atomic coordinates as a list. In the current implementation, the temperature factor, occupancy and footnote fields are not included.
The PDB CONECT records are converted to bond clauses where each clause has the form:
bond(I,J,Type)
signifying a bond between atoms I and J of type Type. Type may be one of the following:
covalent
hbond_da(I is donor, J is acceptor in hydrogen bond)
saltb_neg(I is negative partner in salt bridge)
hbond_ad(I is acceptor)
saltb_pos(I is positive partner)
This format of a PDB entry may be used directly for analysis in Prolog. For
example, given the rule rdist/3 which returns the linear distance between
two points in space, we can readily calculate distances between any pair of
atoms, simply by typing:
| ?- brk(I,RN1,IN1,ATYPE1,CID1,RTYPE1,ATTYPE1,XYZ1), brk(J,RN2,IN2,ATYPE2,CID2,RTYPE2,ATTYPE2,XYZ2), J > I, rdist(XYZ1,XYZ2,Distance).
which returns as the first solution:
I = RN1 = RN2 = 1, IN1 = IN2 = -, ATYPE1 = ATYPE2 = atm, CID1 = CID2 = 5chaa, RTYPE1 = RTYPE2 = cys, ATTYPE1 = [n,-], XYZ1 = [40.935,13.504,1.417], J = 2, ATTYPE2 = [ca,-], XYZ2 = [40.345,14.599,2.14], Distance = 1.43871
It is a simple matter to restrict the distance search to all atoms of a particular type. For example, to search for close approaches between cys sulphur atoms:
| ?- brk(I,RN1,IN1,ATYPE1,CID1,cys,[sg,_],XYZ1), brk(J,RN2,IN2,ATYPE2,CID2,cys,[sg,_],XYZ2), J > I, rdist(XYZ1,XYZ2,Distance), Distance < 5. I = 6, RN1 = 1, IN1 = IN2 = -, ATYPE1 = ATYPE2 = atm, CID1 = CID2 = 5chaa, XYZ1 = [37.649,15.819,1.913], J = 893, RN2 = 122, XYZ2 = [36.339,14.497,2.687], Distance = 2.01565
or perhaps, to identify close approaches between water molecules and glutamate residues and write out the findings in a Prolog clausal form.
brk(I,RN1,IN1,het,CID1,hoh,ATTYPE1,XYZ1), brk(J,RN2,IN2,atm,CID2,glu,ATTYPE2,XYZ2), rdist(XYZ1,XYZ2,Distance),Distance < 3, writeq(water_glu(water(I,[RN1,IN1],ATTYPE1,CID1), glu(J,[RN2,IN2],ATTYPE2,CID2),Distance)), nl,fail.
water_glu(water(3603,[554,-],[o,-],5chaa),glu(123,[20,-],[oe1,-],5chaa),2.93873) water_glu(water(3606,[557,-],[o,-],5chaa),glu(2264,[70,-],[cb,-],5chab),2.98971) water_glu(water(3638,[589,-],[o,-],5chaa),glu(1898,[21,-],[ca,-],5chab),2.76785) water_glu(water(3638,[589,-],[o,-],5chaa),glu(1899,[21,-],[c,-],5chab),2.90702) water_glu(water(3638,[589,-],[o,-],5chaa),glu(1902,[21,-],[cg,-],5chab),2.49055) water_glu(water(3644,[595,-],[o,-],5chaa),glu(492,[70,-],[cb,-],5chaa),2.85485) water_glu(water(3648,[599,-],[o,-],5chaa),glu(551,[78,-],[cb,-],5chaa),2.77389) water_glu(water(3663,[614,-],[o,-],5chaa),glu(2263,[70,-],[o,-],5chab),2.87087) water_glu(water(3680,[631,-],[o,-],5chaa),glu(1895,[20,-],[oe1,-],5chab),2.3201) water_glu(water(3723,[674,-],[o,-],5chaa),glu(120,[20,-],[cb,-],5chaa),2.8711) water_glu(water(3723,[674,-],[o,-],5chaa),glu(125,[21,-],[n,-],5chaa),2.8872) water_glu(water(3724,[675,-],[o,-],5chaa),glu(2121,[49,-],[oe2,-],5chab),2.20559)
Consulting (loading into the Prolog system) the 3719 brk/8 clauses for protein 5cha took 46 seconds. The query then required 75 seconds to run. When the brk/8 clauses were compiled into the Prolog system, the execution time was reduced to 30 seconds. Unfortunately compilation required 162 seconds, leading to a net loss in overall execution time.
The ease with which these simple queries can be executed in Prolog, belies the complications that would be necessary to provide such flexibility in a conventional Fortran or C program. As for Prolog, the conventional program would first have to read in the complete dataset into the chosen internal representation of the data. A general purpose command parser would need to be written to enable the operator to tell the program which comparison was required. A general selection routine would also be required to enable the operator to choose which subset of atoms are required for the comparison. Whilst all these routines could certainly be provided in a Fortran program, Prolog provides a far more concise route to such analyses.