Next: Generating transformed coordinates Up: Worked examples Previous: Using scans as

Protein domain databases

The program PDBC may be used to output a set of STAMP readable domain descriptions. Given a list of four letter brookhaven codes and an optional set of chains. This will only work if you have a suitable `pdb.directories' file. See the chapter on installation for details on how to do this.

pdbc -d 2hhba >! globin_fold.domains
pdbc -d 2hhbb >> globin_fold.domains
pdbc -d 4mbn  >> globin_fold.domains
pdbc -d 1lh1  >> globin_fold.domains
pdbc -d 1cola >> globin_fold.domains
pdbc -d 1cpca >> globin_fold.domains

will produce the following output (ignoring comments, which are specified by a `%` in column 0):

/(PDB PATH)/pdb2hhb.ent 2hhba { CHAIN A }
/(PDB PATH)/pdb2hhb.ent 2hhbb { CHAIN B }
/(PDB PATH)/pdb4mbn.ent 4mbn { ALL }
/(PDB PATH)/pdb1lh1.ent 1lh1 { ALL }
/(PDB PATH)/pdb1col.ent 1cola { CHAIN A }

Where (PDB PATH) denotes the location of the relevant PDB file on your system. Note that your PDB files may be called (code).pdb instead, or may follow some other convention. This is OK, see Chapter 5 (installation) for details as to setting this up.

Note that there doesn't need to be a filename in the domain file. One can merely leave it as `Unknown` or some other string (i.e. not empty spaces), and the programs will try and find where the file corresonding to the four letter code is one your system. In other words, the files given in this distribution should work on your system, provided that you have all the PDB files.

Sensitive STAMP database comparisons can take a long time. For this reason we have compared the current PDB database to itself based on sequence, and clustered the data such that only one member of each sequence family is in our domain database. We have also split these structures into domains using author definitions. This database, even when a high degree of sequence similarity is required for clustering, reduces the size of the brookhaven database drastically (from over 4000 independent chains down to just under 600 protein domains). It is probably sensible to scan this database, which contains just one representative of each sequence family, then if something interesting is found one can scan other structures related to the representative by sequence. A copy of this representative database is in the file brookhaven_subset.domains.

Note that PDBC can be used to probe information about a PDB entry by using the `-q' option. Try it and see. This is a good test of whether STAMP has been set up properly on your system.

Next: Generating transformed coordinates Up: Worked examples Previous: Using scans as

Rob Russell and Geoff Barton