Given the complexity of the network ensemble used for the final predictions (Figure 1), it was essential that the prediction method was `blind tested' on a new set of proteins. The CASP3 proteins would have made for a good blind test. However, at the time this work was completed, the CASP3 experiment was 6 months old and the alignments would have contained new sequences. In addition, the CASP3 set only contained 17 structures and so has limited statistical value.
The 480 proteins used to test and train the new prediction method were derived from the 1996 version of the PDB databank. Since then, more than 3,378 new protein structures have been released. This new set of structures provides the base for a separate non-redundant test set. Chains from the 3,378 proteins were compared pairwise by AMPS  and the set screened such that no pair of sequences shared more than 5SD significance score. The new non redundant sequences of known structure were then compared to the 480 proteins used to train the prediction method. The same 5SD score cutoff was applied. This resulted in a set of 406 protein sequences with which to blind test the prediction methods.