9 March 2015
"The distinction between replication and reproducibility is, from what I understand, that
'replicable' means 'other people get exactly the same results when doing exactly the same thing',
while
'reproducible' means 'something similar happens in other people's hands'.
The latter is far stronger, in general, because it indicates that your results are not merely some quirk of your setup and may actually be right."
"Statisticians and computer scientists - if there is no code, there is no paper
So I have a new policy when evaluating CV's of candidates for jobs, or when I'm reading a paper as a referee. If the paper is about a new statistical method or machine learning algorithm and there is no software available for that method - I simply mentally cross it off the CV. If I'm reading a data analysis and there isn't code that reproduces their analysis - I mentally cross it off."
Myth 3: We need new platforms for reproducible computational science.
Engineers like building stuff. It sure is easier (and hence more fun, at least in the short term) than doing science. But what we need right now is scientists actually using stuff that already exists, not engineers building new stuff that no one will ever use.
…to a first approximation, IPython Notebook and knitr have won.
Transparent scientific analysis - distributing analysis/code and data
\[ versus \]
"Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do."
devtools Wickham and Chang (2015)
packagename | |- README |- DESCRIPTION |- NAMSPACE |- R | ` - This is where you commented R scripts live | |- man | ` - This is where the autogenerated help goes | |- tests | ` - This is where the teststructure and code goes | |- vignettes : ` - This is where the how-tos and tutorials go : :..scr `.. This is potentially where cpp (Rcpp) code would live
Package: SpikeNorm Type: Package Title: A package to normalise RNA-seq data using Spike-in information Version: 1.0 Date: 2014-11-20 Authors@R: c( person("Pieta","Schofield",email="pschofield@dundee.ac.uk",role=c("aut","cre")), person("Nick","Schurch",email="nschurc@dundee.ac.uk",role="aut")) Description: This package uses expression values for spike-ins of known concentration to normalise RNA-seq data License: GPL2 Imports: MASS, matrixStats, robust, plyr, ggplot2, limma, edgeR Suggests: testthat, BiocStyle, knitr VignetteBuilder: knitr LazyData: true
#' subScript will submit a script to the cluster #' #' calls subJob to submit a script to the cluster as a temporary file it relies on the temporary #' directory where the temporary file will be written to being mounted to the local machine. #' I could get round this by writing it locally and then copying it with scp but at the moment #' this is not worth the effort. #' #' @param scriptstub the stub for the temporary file name #' @param script the content of the script as a vector of strings #' @param tmpdir location for the temporary file #' @param scriptext extention for the temporary file #' @param logdir location for the batch job logs #' @param cores number of cores #' #' @export subScript <- function(scriptstub="ssh",script=c("#!/bin/bash","hostname"), tmpdir="/homes/pschofield/tmp/",scriptext=".sh",logdir="",cores=8) { batchJob <- tempfile(pattern=scriptstub,tmpdir=tmpdir,fileext=scriptext) filecon <- file(batchJob) writeLines(script, filecon) close(filecon) subJob(scriptfile=batchJob,logdir=logdir,mcCores=cores) }
edit your functions then rinse and repeat
# create the documentation from the roxygen comments in the R sources devtools::document() # load the package for testing devtools::load_all() # run the test scripts stored in the tests subdirectory devtools::test() # eventually install the packages so it can be used outwith the source directory devtools::install(reload=T)
knitr, Xie (2013) embed the code for the analysis within a natural language description of the analysis.
evolution of sweave
knitr permits
rmarkdown file starts with a header
--- title: "how i R(oll)" author: "Pietà Schofield" date: "9 March 2015" output: ioslides_presentation: fig_caption: true fig_width: 10 fig_height: 7 wide: true css: presentation.css ---
rmarkdown, write natural language descriptive text in a Markdown dialect interspesed with chunks of R code
for example to list the content of the SpikeNorm packages DESCRIPTION file
```{r , eval=FALSE, comment=NA} # # Specify the file to open # desFile <- "/Users/pschofield/git_hub/SpikeNorm/DESCRIPTION" # # read the file and write it to the stdio this will be sent to # the chuck output # writeLines(readLines(desFile)) # ```
or to generate and display a graph
```{r , fig.caption="Some Random Stuff", eval=FALSE} # # Plot anything R can plot # plot(1:10+rnorm(10),1:10+rnorm(10),pch="x", xlab="expected", ylab="measured",main="Demo Plot") # # One option is to just send it to the default device and knitr # captures it and put it in a temporary place # abline(0,1,col="red") # # alternatively write it to a file and link to the file in the markdown # ```
Some Random Stuff
knitr and hence rmarkdown interface with a program called pandoc by MacFarlane
pandoc will convert the markdown or latex generated by knitr into
(NB: pandoc is written in haskell! which makes it sort of cool in itself)
It is possible to include references from a bibtex library with knitcitations
I prefer RefManageR McLean (2014)
```{r , eval=FALSE} # load the packsge require(RefManageR) mybibfile <- "/Users/pschofield/git_tree/biblio/bioinf.bib" # specify the bibliography options BibOptions(check.entries = FALSE, style = "markdown", cite.style = "authoryear", bib.style = "authoryear") # load the file bib <- ReadBib(mybibfile, check=FALSE) ```
Then you can include a citations with `r TextCite(bib,"refkey")`
in your text as type Finally you add a code chuck
```{r , eval=FALSE} PrintBibliography(bib) ```
Normally code chunks appear without the options and ticks
# # load the RefManageR package so I can have a central bib # file rather than it have to be in the same directory as # the markdown file # require(RefManageR) mybibfile <- "/Users/pschofield/git_tree/biblio/bioinf.bib" # # specify the bibliography options # BibOptions(check.entries = FALSE, style = "markdown", cite.style = "authoryear", bib.style = "authoryear") # bib <- ReadBib(mybibfile, check=FALSE)
but I have been showing the decorations for illustrative purposes
Normally they are also syntax highlighted
Brown, T. (2014). Some myths of reproducible computational research. URL: http://ivory.idyll.org/blog/2014-myths-of-computational-reproducibility.html (visited on 2014).
Brown, T. (2015). Our approach to replication in computational science. URL: http://ivory.idyll.org/blog/replication-i.html (visited on 2015).
Knuth, D. E. (1984). "Literate Programming". In: Comput. J. 27.2, pp. 97-111. ISSN: 0010-4620. DOI: 10.1093/comjnl/27.2.97. URL: http://dx.doi.org/10.1093/comjnl/27.2.97.
Leek, J. (2014). rpackages. URL: https://github.com/jtleek/rpackages (visited on 2014).
Leek, J. (2015). Statisticians and computer scientists - if there is no code, there is no paper. URL: http://simplystatistics.org/2013/01/23/statisticians-and-computer-scientists-if-there-is-no-code-there-is-no-paper (visited on 2015).
McLean, M. W. (2014). Straightforward Bibliography Management in R Using the RefManager Package. arXiv: 1403.2036 [cs.DL]. Submitted. URL: http://arxiv.org/abs/1403.2036.
Popper, K. (1935). Logik der Forschung. Verlag von Julius Springer, Vienna, Austria.
Wickham, H. (2015). R Packages. URL: http://r-pkgs.had.co.nz (visited on 2015).
Wickham, H. and W. Chang (2015). devtools: Tools to Make Developing R Packages Easier. R package version 1.7.0. URL: http://CRAN.R-project.org/package=devtools.
Xie, Y. (2013). Dynamic Documents with R and knitr. Boca Raton, Florida: Chapman and Hall/CRC. URL: http://yihui.name/knitr/.
This is work in progress I hope I am getting better at it
The code for this presentation can be found here
The image of I.R.Baboon is copyright to Cartoon Network