SNAPPI-DB API (Application Programming Interface)
A Java 5 API has been developed to allow rapid development and fast database queries without any requirement for complex SQL queries. Java 5 is employed as it provides many features that were not available in previous versions of Java such as generics, enhanced for loops and auto-boxing/unboxing.

The same notation as the MSD is employed so that users familiar with the MSD can seamlessly learn the SNAPPI API. The same serial numbering is also used so that structures can be mapped back to the MSD.

As the database is object-oriented it must be accessed through this API. The API comes fully documented and should be easy to use and understand. The API allows immediate use of SNAPPI-DB (which can be downloaded here) or the ability to generate your own version of the database from a local version of the MSD.

API Structure

The simplified UML diagram above shows the overall structure of the API. Although there are many different ways of navigating through the data in SNAPPI, the database is optimised for searching from 1 of the 4 roots shown at the top of the tree:-- Entries (contains each PDB structure), Domains (domains classified by their domain Family/Superfamily), DomainInteractions (domain-domain interactions classified by their domain Family/Superfamily pair) or OrientationSimilarInteractions (domain-domain interactions classified by their interaction interface). Each of these methods is described below.

Methods for Accessing the Data

Entries

Navigation through each PDB Entry is straight forward as the data is stored in a hierarchical structure as shown in the uml diagram above. The database stores a list of each PDB Entry in Entries. Each Entry contains one or more Assemblies (PQS predicted structures), each Assembly contains one or more Chains. Each Chain contains one or more Residues and in turn each Residue contains one or more Atoms. Each level of the hierarchy also contains other information relevant to the item. For example each Atom contains the co-ordinate positions of the Atom. The Assemblies also contain domains and domain interactions for SCOP, CATH and Pfam.

Pseudo code

//This iterates through all of the PDB entries stored in the database
for (Entry e : Entries.getEntries())
{

     //This iterates through all of the PQS Assemblies within an Entry
     for (Assembly ass : e.getAssemblies())
     {

          //This iterates through all of the Chains within an Assembly
           for (Chain c : ass.getChains())
          {

               //This iterates through all of the Residues within a Chain
               for (Residue r : c.getResidues())
                {

                    //This iterates through all of the Atoms within a Residue
                     for (Atom a : r.getAtoms())
                     {

                         //This gets the coordinates for an Atom
                          float[] coordinates = a.getCoordinates();
                         //This prints out the coordinates for an Atom
                         System.out.println(coordinates[0] + "," + coordinates[1] + "," + coordinates[2]);
                    }
               }
          }
     }
}

Domains

Domains can be easily accessed by their domain classification to any level of the domain hierarchy for SCOP and CATH and at the family level for Pfam. For example for the SCOP domain definition, at the family level of similarity, there is a map which stores the name of the SCOP family (e.g. a.1.2.3) as the key and a list of all of the domains with this classification as the value. A non-redundant set of domains using the SCOP family level classification can easily be generated by taking a random example of each domain from each SCOP family employing this map structure.

Pseudo code

//This gets the Domains classified by their SCOP family (denoted by 4) class
Map<Family,Collection<Domain>>domainsHashedByFamily=Domains.getDomainsHashedByFamily(SCOP.class, 4);
for (Map.Entry<Family, Collection<Domain>> map : domainsHashedByFamily.entrySet())
{
     //The Domain Family Classification is obtained by map.getKey() e.g. SCOP family a.1.2.3,
     System.out.println("SCOP family = " + map.getKey());

     //This iterates through all the Domains with the same Family classification
     for (Domain domain : map.getValue())
     {
                //This iterates through all of the Residues within a Domain
               for (Residue r : c.getResidues())
                {

                    //This iterates through all of the Atoms within a Residue
                     for (Atom a : r.getAtoms())
                     {

                         //This gets the coordinates for an Atom
                          float[] coordinates = a.getCoordinates();
                         //This prints out the coordinates for an Atom
                         System.out.println(coordinates[0] + "," + coordinates[1] + "," + coordinates[2]);
                    }
               }

     }
}

Domain Interactions

Each pair of interacting domains can be accessed by their pairwise domain classification to any level of the domain hierarchy for SCOP and CATH and at the family level for Pfam in a symmetric way. For example for the SCOP domain definition at the family level of similarity there is a map which stores the name of the pairwise SCOP family (e.g. a.1.1.1-b.1.4.7) as the key and a list of all of the domain interactions with this classification as the value. A non-redundant set of domain interactions using the SCOP family level classification can easily be generated by taking a random example of each domain interaction from each pairwise SCOP family employing this map structure.

Pseudo code

//This gets the Domain-Domain Interactions classified by their SCOP family pair (denoted by 4) class
Map<Pair<Family>,Collection<DomainInteraction>>domIntsHashedByFamilyPair
                                        =DomainInteractions.getDomainInteractionsHashedByFamilyPair(SCOP.class, 4)
for (Map.Entry<Pair<Family>,Collection<DomainInteraction>> map : domIntsHashedByFamilyPair.entrySet())
{
     //The Domain Interaction Family Classification is obtained by map.getKey() e.g. SCOP pairwise family a.1.2.3 interacting      // with b.1.4.7. The print statement below would give "SCOP pairwise family = a.1.2.3,b.1.4.7"
     System.out.println("SCOP pairwise family = " + map.getKey());

     //This iterates through all the Domain Interactions with the same Family classification
     for (DomainInteraction domainInteraction : map.getValue())
     {
          //Do Something
     }
}

OrientationSimilarInteractions

Each pair of interacting domains can be accessed by their interaction orientation. In a similar way to the DomainInteractions above each domain-domain interaction is classified by their family pair but in addition to this they are then further classified by the orientation of the interaction giving a list of lists of domain-domain interactions for each pairwise family. For example for the SCOP domain definition at the family level of similarity there is a map which stores the name of the pairwise SCOP family (e.g. a.1.1.1-b.1.4.7) as the key and a list of lists all of the domain interactions with this family classification and classified by orientation as the value. Rather than storing many DomainInteraction (s) in these lists many OrientatedDomInt (s) are stored. A OrientatedDomInt contains a DomainInteraction and additional information regarding the transform and alignment of the DomainInteraction.

Pseudo code

//This gets the Domain-Domain Interactions classified by their SCOP family pair (denoted by 4) class
Map<Pair<Family>, Collection<Collection<OrientatedDomInt>>> interactionsHashedByFamilyPair =
                              OrientationSimilarInteractions.getDomainInteractionsHashedByFamilyPair(Scop.class, 4);

for (Map.Entry<Pair<Family>,Collection<Collection<OrientatedDomInt>> map : domIntsHashedByFamilyPair.entrySet())
{
      //The Domain Interaction Family Classification is obtained by map.getKey() e.g. SCOP pairwise family a.1.2.3 interacting      
      // with b.1.4.7. The print statement below would give "SCOP pairwise family = a.1.2.3,b.1.4.7"

     System.out.println("SCOP pairwise family = " + map.getKey());

     //This iterates through all the Collections Domain Interactions classified by orientation with the same Family classification
     for (Collection<OrientatedDomInt> collection : map.getValue())
     {
          for (OrientatedDomInt orientatedDomInt : collection)
          {
               DomainInteraction domainInteraction = orientatedDomInt.getDomainInteraction();
               //Do Something
          }
     }
}


Java Data Objects Technology (JDO)

JDO is an object persistence framework for the Java language which allows the storage, retrieval and querying of objects. JDO for biological data was extensively investigated in Srdanovic et al. In essence the JDO interface provides an automatic mapping between a data-store and a Java object. This approach has many benefits:

  • Reduces development time as performing complex queries
    using this technology easier than accessing a relational database directly via SQL.
  • Employing JDO removes the difficulty of mapping objects to a relational database. The problem of mapping between objects and relational databases is commonly known as the "object-relational impedance mismatch", or simply "impedance mismatch". The difficulty is caused by the fact that in the object-oriented programming paradigm data is traversed via the relationships between objects whereas in the relational database paradigm data is traversed by joining table rows.
  • The JDO specification is intentionally data-store agnostic and so the JDO interface is the same regardless of the database back-end. Possible data-stores include relational databases, object databases, file systems and XML documents. The choice of data-store will depend upon the user requirements. For example, a relational database is preferable if queries are to be performed by another application. In the case of high performance data mining an object oriented data-store has many advantages over other data-store mechanisms such as lack of SQL overhead, speed and direct two way references. SNAPPI currently uses an object-oriented data-store, however, if required the data could be ported to a relational database and the same API used.
  • JDO allows flexibility by storing only the objects that are need to be persistent. The objects that are to be made persistent are described in an XML document. This enables implementation of the JDO to determine which objects are to be stored and which objects are transient.
  • Biological data is more suited to the object model than the relational model

Accordingly, SNAPPI-DB employs the JDO interface with an object-oriented database as the data store (FastObjects community edition implementation).

Portability

The API and database are available for both Linux and Windows operating systems. Some of the programs which are used to generate SNAPPI-DB need to work through cygwin for Windows and so cygwin needs to be installed if generating SNAPPI-DB from scratch.

To download the API and documentation click here.