next up previous contents
Next: Generating transformed coordinates using Up: Worked examples Previous: Alignment without an initial   Contents

Protein domain databases

The program PDBC may be used to output a set of STAMP readable domain descriptions. Given a list of four letter brookhaven codes and an optional set of chains. This will only work if you have a suitable `pdb.directories' file. See the chapter on installation for details on how to do this.

pdbc -d 2hhba >! globin_fold.domains
pdbc -d 2hhbb >> globin_fold.domains
pdbc -d 4mbn  >> globin_fold.domains
pdbc -d 1lh1  >> globin_fold.domains
pdbc -d 1cola >> globin_fold.domains
pdbc -d 1cpca >> globin_fold.domains

will produce the following output (ignoring comments, which are specified by a `%` in column 0):

/(PDB PATH)/pdb2hhb.ent 2hhba { CHAIN A }
/(PDB PATH)/pdb2hhb.ent 2hhbb { CHAIN B }
/(PDB PATH)/pdb4mbn.ent 4mbn { ALL }
/(PDB PATH)/pdb1lh1.ent 1lh1 { ALL }
/(PDB PATH)/pdb1col.ent 1cola { CHAIN A }

Where (PDB PATH) denotes the location of the relevant PDB file on your system. Note that your PDB files may be called (code).pdb instead, or may follow some other convention. This is OK, see Chapter 5 (installation) for details as to setting this up.

Note that there doesn't need to be a filename in the domain file. One can merely leave it as `Unknown` or some other string (i.e. not empty spaces), and the programs will try and find where the file corresonding to the four letter code is one your system. In other words, the files given in this distribution should work on your system, provided that you have all the PDB files.

Note that PDBC can be used to probe information about a PDB entry by using the `-q' option. Try it and see. This is a good test of whether STAMP has been set up properly on your system. If you just want to test where STAMP is looking for PDB and DSSP files, then use the `-m' (minimal) options. This just reports PDB/DSSP files if found and exits.

STAMP database comparisons are computationally intensive, so it is prudent to avoid comparisons that are redundant (e.g. multiple mutants or binding studies of the same protein, T4 lysozyme for example).

The STAMP distribution contains a series of non-redundant databases derived by a parsing of the SCOP database. These are located in the `STAMPDIR' directory. The files are derived from SCOP release 1.75. The files were created using the scop2stamp program, which can be found in the `bin/' directory of the STAMP installation. Running `scop2stamp' without arguments will list the options that this program accepts.

Domain database N Description
scop.dom 109747 All PDB domains classified in SCOP
scop_species.dom 13816 One representative per species of each SCOP protein.
scop_prot.dom 9621 One representative of each SCOP protein
scop_fam.dom 3883 One representative of each SCOP family
scop_supf.dom 1950 One representative of each SCOP superfamily
scop_fold.dom 1190 One representative of each SCOP fold

The complete set of SCOP domains contains a high degree of redundancy. The amount of time required to search it will depend on your particular system but you should expect it to take on the order of 20 hours of CPU time on the current generation of processors. If you have access to multiple CPUs, it is possible to divide the database into subsets, search the individual subsets in parallel on multiple CPUs, and then aggregate the search outputs into a single results file which can be filtered using SORTTRANS in the usual way.


next up previous contents
Next: Generating transformed coordinates using Up: Worked examples Previous: Alignment without an initial   Contents