RStudio

RStudio is designed to make literate programming and producing documented analysis relatively simple and painless. There are a few options to achieve this the main one is provided by the package knitr, which enable the writing of documents in a markup language such as HTML , LaTeX or markdown, with embedded code chunks, that can then be compiled into a final document in a range of formats such as portable document format (PDF), Microsoft Word (docx) or HTML. During the compilation process code chunks are executed and output such as figures are generated and incorporated into the final document.

In this course we will focus on the use of Rmarkdown, there is extensive documentation available as http://rmarkdown.rstudio.com. We will also be using RStudio via a web browser interface to RStudio Server.

RStudio Server

In a web browser such as Chrome, FireFox or Safari navigate to http://rstudio.compbio.dundee.ac.uk the Life Sciences instance of RStudio Server and log in with your cluster id and password.

Rmarkdown Document

You can start a new rmarkdown document from with-in RStudio from the File - New File - R Markdown… menu option

R markdown menu

R markdown menu

that opens the dialogue box below

R markdown dialogue box in RStudio

R markdown dialogue box in RStudio

Select Document option on the left hand side and the PDF option on the right hand side, fill in a title of your choice and your name and a to create a demo rmarkdown document, that will open in the RStudio editor panel. Without making any changes to this file you can compile it into a PDF by clicking the Knit PDF button on the editor panel button bar. You will be prompted to save the file, so you should chose and name and location for the file.

R markdown knit PDF button

R markdown knit PDF button

If you examine the file in the editor you will see it is structured in a particular ways.

Rmarkdown Document Structure

YAML header

In recursive geek speak YAML is an acronym that stands for YAML Ain’t Markup Language. It is a header block of parameters used through out the document compilation process. It uses a standard format for many programming and markup languages, though the permissable option vary. It is separated from the rest of the document being surrounded by three dashes ---

For example the YAML block for this document is

---
title: 'RNA-seq Analysis 1: Introduction to Bioconductor'
author: "Pietà Schofield"
output: 
  html_document:
    fig_caption: yes
    toc: yes
    toc_depth: 2
    toc_float:
      collapsed: no
      smooth_scroll: yes
    code_folding: show
    css: workshop.css
---

There are lines in the output section of the above YAML that are not going to explain be explained at here, however RStudio will create a simple YAML for the output type you selected during the dialogue to generate the document. RStudio also will alter the YAML if you select to Knit the document into a different output format. So indepth understanding of YAML is not needed to get going with rmarkdown. **Note Side floating

Code Chunks

Sections of R (and other language) code can be embedded in the document surrounded by the three back tick delimites as in the example below.

```{r chuckName, fig.caption="Some Random Stuff", eval=FALSE}
#
# Plot anything R can plot
#
plot(1:10+rnorm(10),1:10+rnorm(10),pch="x", 
     xlab="expected", ylab="measured",main="Demo Plot")
#
# One option is to just send it to the default device and knitr
# captures it and put it in a temporary place
#
abline(0,1,col="red")
# 
# alternatively write it to a file and link to the file in the markdown
#
```

The first line holds paramters or chunk options between the parentheses that control how the chunk will be processed during compilation and how the output from the chuck if it is executed will be incorporated and formated in the document. For example whether the chunk is executed is controlled by the eval option and whether the chunk is displayed in the output document is controlled by the echo option

Markdown Text

The rest of the document is in markdown and certain codes will produce various formatting styles in the output document. For example various numbers of # characters at the start of a line control heading levels * or _ will control italic and bold text.

Graphics

The code chunk above produces a graph in the output. It is possible to change the code in the chunk to change the figure. It is also possible to set chunk options so that only the graph is displayed and not the code, or only the code and not the graph.

#
# Plot anything R can plot
#
plot(1:10+rnorm(10),1:10+rnorm(10),pch="x", 
     xlab="expected", ylab="measured",main="Demo Plot")
#
# One option is to just send it to the default device and knitr
# captures it and put it in a temporary place
#
abline(0,1,col="red")
Some Random Stuff

Some Random Stuff

Useful R Commands

There are several useful R commands that will be used extensively over the course of the workshops, as always with R there are possibly too many ways of doing these things, but these are my favourite ways and so these are the ones I have used in the workshop rmarkdown scripts so I introduce them here.

fileName <- "/homes/pschofield/public_html/Projects/teaching_pg/index.html"
# parsing filenames
basename(fileName)
[1] "index.html"
dirname(fileName)
[1] "/homes/pschofield/public_html/Projects/teaching_pg"
# constructing strings
paste0("The file name is '",basename(fileName),
       "' and the full directory name is '",dirname(fileName),"'")
[1] "The file name is 'index.html' and the full directory name is '/homes/pschofield/public_html/Projects/teaching_pg'"
# listing files in directories
fileNames <- list.files(dirname(fileName),pattern=".*")
indexFile <- list.files(dirname(fileName),pattern=".*html$")
fullIndexFile <- list.files(dirname(fileName),pattern=".*html$",full.names = T)

# finding items in vectors
grepl("html$",fileNames)
[1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
grep("html$",fileNames)
[1] 5
which(!grepl("html$",fileNames))
[1] 1 2 3 4 6 7 8
# editing text programmatically
gsub("^index","myindex",fileNames)
[1] "code"          "data"          "figures"       "myindex_files"
[5] "myindex.html"  "notebook"      "presentations" "workshops"    

Task

Create an rmarkdown document that will compile to PDF.

  • Alter the document to hide the code that displays the summary of the car dataset.
  • Alter the document to add a table of contents.
  • Add a code chunk to the document to use the head() function to display the first 10 items in the pressure data frame.
  • Add a final section heading and code chunk using the sessionInfo() function to include the R configuration information in the document.

Session Info

R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.12.3       RefManageR_0.10.13 pietalib_0.1      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4          XVector_0.10.0       magrittr_1.5        
 [4] GenomicRanges_1.22.4 BiocGenerics_0.16.1  zlibbioc_1.16.0     
 [7] IRanges_2.4.8        R6_2.1.2             bibtex_0.4.0        
[10] plyr_1.8.3           httr_1.1.0           stringr_1.0.0       
[13] GenomeInfoDb_1.6.3   tools_3.2.4          parallel_3.2.4      
[16] htmltools_0.3.5      yaml_2.1.13          digest_0.6.9        
[19] RJSONIO_1.3-0        formatR_1.3          S4Vectors_0.8.11    
[22] bitops_1.0-6         RCurl_1.95-4.8       evaluate_0.8.3      
[25] rmarkdown_0.9.5      stringi_1.0-1        XML_3.98-1.4        
[28] stats4_3.2.4         lubridate_1.5.6