SNAPPI-DB API (Application Programming Interface)
A Java 5 API has been developed to allow rapid development and
fast database queries without any requirement for complex SQL
queries. Java 5 is employed
as it provides many features that were not available in previous
versions of Java such as generics, enhanced for loops and
auto-boxing/unboxing.
The same notation as the MSD is employed so
that users familiar with the MSD can seamlessly learn the SNAPPI
API. The same serial numbering is also used so that structures can
be mapped back to the MSD.
As the database is object-oriented it must be accessed through this API. The API comes fully documented and should be easy to use and understand. The API allows immediate use of SNAPPI-DB (which can be downloaded here) or the ability to generate your own version of the database from a local version of the MSD.
API Structure

The simplified UML diagram above shows the overall structure of the API. Although there are many different ways of navigating through the
data in SNAPPI, the database is optimised for searching from 1 of the 4 roots shown at the top of the tree:-- Entries (contains each PDB structure), Domains (domains classified by their domain Family/Superfamily), DomainInteractions (domain-domain interactions classified by their domain Family/Superfamily pair) or OrientationSimilarInteractions (domain-domain interactions classified by their interaction interface). Each of these methods is described below.
Methods for Accessing the Data
Entries
Navigation through each PDB Entry is straight forward as the data is stored in a hierarchical structure as shown in the uml diagram above. The database stores a list of each PDB Entry in Entries. Each Entry contains one or more Assemblies (PQS predicted
structures), each Assembly contains one or more Chains. Each Chain
contains one or more Residues and in turn each Residue contains one or more Atoms. Each level of the hierarchy also contains other information
relevant to the item. For example each Atom contains the co-ordinate
positions of the Atom. The Assemblies also contain domains and
domain interactions for SCOP, CATH and Pfam.
Pseudo code
//This iterates through all of the PDB entries stored in the database
for (Entry e : Entries.getEntries())
{
//This iterates through all of the PQS Assemblies within an Entry
for (Assembly ass : e.getAssemblies())
{
//This iterates through all of the Chains within an Assembly
for (Chain c : ass.getChains())
{ //This iterates through all of the Residues within a Chain
for (Residue r : c.getResidues())
{
//This iterates through all of the Atoms within a Residue
for (Atom a : r.getAtoms())
{
//This gets the coordinates for an Atom
float[] coordinates = a.getCoordinates();
//This prints out the coordinates for an Atom
System.out.println(coordinates[0] + "," + coordinates[1] + "," + coordinates[2]);
}
}
}
}
}
Domains
Domains can be easily accessed by their domain classification to any
level of the domain hierarchy for SCOP and CATH and at the family
level for Pfam. For example for the SCOP domain definition, at the
family level of similarity, there is a map which stores the name of
the SCOP family (e.g. a.1.2.3) as the key and a list of all of the
domains with this classification as the value. A non-redundant set
of domains using the SCOP family level classification can easily be
generated by taking a random example of each domain from each SCOP
family employing this map structure.
Pseudo code
//This gets the Domains classified by their SCOP family (denoted by 4) class
Map<Family,Collection<Domain>>domainsHashedByFamily=Domains.getDomainsHashedByFamily(SCOP.class, 4);
for (Map.Entry<Family, Collection<Domain>> map : domainsHashedByFamily.entrySet())
{
//The Domain Family Classification is obtained by map.getKey() e.g. SCOP family a.1.2.3,
System.out.println("SCOP family = " + map.getKey());
//This iterates through all the Domains with the same Family classification
for (Domain domain : map.getValue())
{
//This iterates through all of the Residues within a Domain
for (Residue r : c.getResidues())
{
//This iterates through all of the Atoms within a Residue
for (Atom a : r.getAtoms())
{
//This gets the coordinates for an Atom
float[] coordinates = a.getCoordinates();
//This prints out the coordinates for an Atom
System.out.println(coordinates[0] + "," + coordinates[1] + "," + coordinates[2]);
}
}
}
}
Domain Interactions
Each pair of interacting domains can
be accessed by their pairwise domain classification to any level of
the domain hierarchy for SCOP and CATH and at the family level for
Pfam in a symmetric way. For example for the SCOP domain definition
at the family level of similarity there is a map which stores the
name of the pairwise SCOP family (e.g. a.1.1.1-b.1.4.7) as the key
and a list of all of the domain interactions with this
classification as the value. A non-redundant set of domain
interactions using the SCOP family level classification can easily
be generated by taking a random example of each domain interaction
from each pairwise SCOP family employing this map structure.
Pseudo code
//This gets the Domain-Domain Interactions classified by their SCOP family pair (denoted by 4) class
Map<Pair<Family>,Collection<DomainInteraction>>domIntsHashedByFamilyPair
=DomainInteractions.getDomainInteractionsHashedByFamilyPair(SCOP.class, 4)
for (Map.Entry<Pair<Family>,Collection<DomainInteraction>> map : domIntsHashedByFamilyPair.entrySet())
{
//The Domain Interaction Family Classification is obtained by map.getKey() e.g. SCOP pairwise family a.1.2.3 interacting
// with b.1.4.7. The print statement below would give "SCOP pairwise family = a.1.2.3,b.1.4.7"
System.out.println("SCOP pairwise family = " + map.getKey());
//This iterates through all the Domain Interactions with the same Family classification
for (DomainInteraction domainInteraction : map.getValue())
{
//Do Something
}
}
OrientationSimilarInteractions
Each pair of interacting domains can
be accessed by their interaction orientation. In a similar way to the DomainInteractions above each domain-domain interaction is classified by their family pair but in addition to this they are then further classified by the orientation of the interaction giving a list of lists of domain-domain interactions for each pairwise family. For example for the SCOP domain definition
at the family level of similarity there is a map which stores the
name of the pairwise SCOP family (e.g. a.1.1.1-b.1.4.7) as the key
and a list of lists all of the domain interactions with this
family classification and classified by orientation as the value. Rather than storing many DomainInteraction (s) in these lists many OrientatedDomInt (s) are stored. A OrientatedDomInt contains a DomainInteraction and additional information regarding the transform and alignment of the DomainInteraction.
Pseudo code
//This gets the Domain-Domain Interactions classified by their SCOP family pair (denoted by 4) class
Map<Pair<Family>, Collection<Collection<OrientatedDomInt>>> interactionsHashedByFamilyPair =
OrientationSimilarInteractions.getDomainInteractionsHashedByFamilyPair(Scop.class, 4);
for (Map.Entry<Pair<Family>,Collection<Collection<OrientatedDomInt>> map : domIntsHashedByFamilyPair.entrySet())
{
//The Domain Interaction Family Classification is obtained by map.getKey() e.g. SCOP pairwise family a.1.2.3 interacting
// with b.1.4.7. The print statement below would give "SCOP pairwise family = a.1.2.3,b.1.4.7"
System.out.println("SCOP pairwise family = " + map.getKey());
//This iterates through all the Collections Domain Interactions classified by orientation with the same Family classification for (Collection<OrientatedDomInt> collection : map.getValue())
{
for (OrientatedDomInt orientatedDomInt : collection)
{
DomainInteraction domainInteraction = orientatedDomInt.getDomainInteraction();
//Do Something
}
}
}
Java Data Objects Technology (JDO)
JDO is an
object persistence framework for the Java language which allows the
storage, retrieval and querying of objects. JDO for biological data
was extensively investigated in Srdanovic et al. In essence the JDO interface provides an automatic mapping between a
data-store and a Java object. This approach has many benefits:
- Reduces development time as performing complex queries
using this technology easier than accessing a relational database
directly via SQL.
- Employing JDO removes the difficulty of mapping objects to
a relational database. The problem of mapping between objects and
relational databases is commonly known as the "object-relational
impedance mismatch", or simply "impedance mismatch". The difficulty is caused by the fact
that in the object-oriented programming paradigm data is traversed
via the relationships between objects whereas in the relational
database paradigm data is traversed by joining table rows.
- The JDO specification is intentionally data-store agnostic
and so the JDO interface is the same regardless of the database
back-end. Possible data-stores include relational databases, object
databases, file systems and XML documents. The choice of data-store
will depend upon the user requirements. For example, a relational
database is preferable if queries are to be performed by another
application. In the case of high performance data mining an object
oriented data-store has many advantages over other data-store
mechanisms such as lack of SQL overhead, speed and direct two way
references. SNAPPI currently uses an object-oriented data-store,
however, if required the data could be ported to a relational
database and the same API used.
- JDO allows flexibility by storing only the objects that are
need to be persistent. The objects that are to be made persistent
are described in an XML document. This enables implementation of the
JDO to determine which objects are to be stored and which objects
are transient.
- Biological data is more suited to the object model than the relational model
Accordingly, SNAPPI-DB employs the JDO interface with an
object-oriented database as the data store (FastObjects community
edition implementation).
Portability
The API and database are available for both Linux and Windows operating systems. Some of the programs which are used to generate SNAPPI-DB need to work through cygwin for Windows and so cygwin needs to be installed if generating SNAPPI-DB from scratch.
To download the API and documentation click here. |