RStudio is designed to make literate programming and producing documented analysis relatively simple and painless. There are a few options to achieve this the main one is provided by the package knitr, which enable the writing of documents in a markup language such as HTML , LaTeX or markdown, with embedded code chunks, that can then be compiled into a final document in a range of formats such as portable document format (PDF), Microsoft Word (docx) or HTML. During the compilation process code chunks are executed and output such as figures are generated and incorporated into the final document.
In this course we will focus on the use of Rmarkdown, there is extensive documentation available as http://rmarkdown.rstudio.com. We will also be using RStudio via a web browser interface to RStudio Server.
In a web browser such as Chrome, FireFox or Safari navigate to http://rstudio.compbio.dundee.ac.uk the Life Sciences instance of RStudio Server and log in with your cluster id and password.
You can start a new rmarkdown
document from with-in RStudio from the File - New File - R Markdown… menu option
R markdown menu
that opens the dialogue box below
R markdown dialogue box in RStudio
Select Document option on the left hand side and the PDF option on the right hand side, fill in a title of your choice and your name and a to create a demo rmarkdown document, that will open in the RStudio editor panel. Without making any changes to this file you can compile it into a PDF by clicking the Knit PDF button on the editor panel button bar. You will be prompted to save the file, so you should chose and name and location for the file.
R markdown knit PDF button
If you examine the file in the editor you will see it is structured in a particular ways.
In recursive geek speak YAML is an acronym that stands for YAML Ain’t Markup Language. It is a header block of parameters used through out the document compilation process. It uses a standard format for many programming and markup languages, though the permissable option vary. It is separated from the rest of the document being surrounded by three dashes ---
For example the YAML block for this document is
---
title: 'RNA-seq Analysis 1: Introduction to Bioconductor'
author: "Pietà Schofield"
output:
html_document:
fig_caption: yes
toc: yes
toc_depth: 2
toc_float:
collapsed: no
smooth_scroll: yes
code_folding: show
css: workshop.css
---
There are lines in the output
section of the above YAML that are not going to explain be explained at here, however RStudio will create a simple YAML for the output type you selected during the dialogue to generate the document. RStudio also will alter the YAML if you select to Knit the document into a different output format. So indepth understanding of YAML is not needed to get going with rmarkdown. **Note Side floating
Sections of R (and other language) code can be embedded in the document surrounded by the three back tick delimites as in the example below.
```{r chuckName, fig.caption="Some Random Stuff", eval=FALSE}
#
# Plot anything R can plot
#
plot(1:10+rnorm(10),1:10+rnorm(10),pch="x",
xlab="expected", ylab="measured",main="Demo Plot")
#
# One option is to just send it to the default device and knitr
# captures it and put it in a temporary place
#
abline(0,1,col="red")
#
# alternatively write it to a file and link to the file in the markdown
#
```
The first line holds paramters or chunk options between the parentheses that control how the chunk will be processed during compilation and how the output from the chuck if it is executed will be incorporated and formated in the document. For example whether the chunk is executed is controlled by the eval option and whether the chunk is displayed in the output document is controlled by the echo option
The rest of the document is in markdown and certain codes will produce various formatting styles in the output document. For example various numbers of # characters at the start of a line control heading levels * or _ will control italic and bold text.
The code chunk above produces a graph in the output. It is possible to change the code in the chunk to change the figure. It is also possible to set chunk options so that only the graph is displayed and not the code, or only the code and not the graph.
#
# Plot anything R can plot
#
plot(1:10+rnorm(10),1:10+rnorm(10),pch="x",
xlab="expected", ylab="measured",main="Demo Plot")
#
# One option is to just send it to the default device and knitr
# captures it and put it in a temporary place
#
abline(0,1,col="red")
Some Random Stuff
There are several useful R commands that will be used extensively over the course of the workshops, as always with R there are possibly too many ways of doing these things, but these are my favourite ways and so these are the ones I have used in the workshop rmarkdown scripts so I introduce them here.
fileName <- "/homes/pschofield/public_html/Projects/teaching_pg/index.html"
# parsing filenames
basename(fileName)
[1] "index.html"
dirname(fileName)
[1] "/homes/pschofield/public_html/Projects/teaching_pg"
# constructing strings
paste0("The file name is '",basename(fileName),
"' and the full directory name is '",dirname(fileName),"'")
[1] "The file name is 'index.html' and the full directory name is '/homes/pschofield/public_html/Projects/teaching_pg'"
# listing files in directories
fileNames <- list.files(dirname(fileName),pattern=".*")
indexFile <- list.files(dirname(fileName),pattern=".*html$")
fullIndexFile <- list.files(dirname(fileName),pattern=".*html$",full.names = T)
# finding items in vectors
grepl("html$",fileNames)
[1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
grep("html$",fileNames)
[1] 5
which(!grepl("html$",fileNames))
[1] 1 2 3 4 6 7 8
# editing text programmatically
gsub("^index","myindex",fileNames)
[1] "code" "data" "figures" "myindex_files"
[5] "myindex.html" "notebook" "presentations" "workshops"
Create an rmarkdown document that will compile to PDF.
head()
function to display the first 10 items in the pressure data frame.sessionInfo()
function to include the R configuration information in the document.R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.12.3 RefManageR_0.10.13 pietalib_0.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.4 XVector_0.10.0 magrittr_1.5
[4] GenomicRanges_1.22.4 BiocGenerics_0.16.1 zlibbioc_1.16.0
[7] IRanges_2.4.8 R6_2.1.2 bibtex_0.4.0
[10] plyr_1.8.3 httr_1.1.0 stringr_1.0.0
[13] GenomeInfoDb_1.6.3 tools_3.2.4 parallel_3.2.4
[16] htmltools_0.3.5 yaml_2.1.13 digest_0.6.9
[19] RJSONIO_1.3-0 formatR_1.3 S4Vectors_0.8.11
[22] bitops_1.0-6 RCurl_1.95-4.8 evaluate_0.8.3
[25] rmarkdown_0.9.5 stringi_1.0-1 XML_3.98-1.4
[28] stats4_3.2.4 lubridate_1.5.6