About the polyAdb project

The polyAdb dataset browser originated as a joint project between the the Barton Group and the Simpson lab that aimed to provide easy access to the complex polyadenylation-related datasets generated as part of the ongoing collaboration between the two labs. In particular, the labs needed a way to disseminate the published A. thaliana Direct RNA Sequencing (DRS) datasets generated for the collaboration by Helicos Biosciences.

To make these data as useful, and usable, as possible for the wider community we developed the polyAdb dataset browser to present both the raw data and a variety of ready-to-go processed data formats. The raw sequencing data for these experiments are available in public archives (such as ENA and SRA) however the DRS data and its file format are very different from those of other types of Next Generation Sequencing data making the data-processing and interpretation difficult for scientists without specialised knowledge. In addition, this issue was exacerbated when Helicos Biosciences filed for chapter 11 bankruptcy in November 2012. As a result, the open-source software for processing DRS data (named helisphere) ceased development and support from the company stopped. Installing this legacy software and its dependencies without support is not straightforward making the ready-to-go processed data formats available from polyAdb particularly useful.

The polyAdb browser exposes a range of useful processed formats, along with extensive metadata on the steps used to generate the datasets, datafiles and other resources related to individual datasets, such as links to the papers where the data have been published. Of course, these data are not used in isolation and so we extended the same principle to any other large datasets that are used alongside the DRS data our in publications. The datasets are presented as an interactive data-table that includes links for quick-viewing the data in the IGB/ ensembl genome browsers, links to download the processed data files directly and links to other resources (figures, papers, scripts, etc) related to each dataset and datafile.

The polyAdb now includes data from several other collaborations between bioinformaticians in the Barton Group, and wet-lab biologists at the School of Life Sciences in Dundee, notably the Mclean Lab, the Storey lab. We hope that the processed datasets and associated metadata available here will make these data accessible to a wide audience, including non-specialist scientists, lowering the barrier to data reuse and effectively bridging the gap between the data in other archives and the datasets that are actually used to make figures and draw conclusions in papers.