Blog Archives

Knitting beautiful documents in RStudio

Title: Knitting Beautiful Documents in RStudio

Header: Markdown

This document was written using Markdown in RStudio. RStudio is a wonderful IDE for writing R scripts and keeping track of variables, dataframes, history, packages and much more. Markdown is a simple formatting syntax for authoring web pages. It allows for easy formatting, for example

# Header 1
## Header 2
### Header 3
#### Header 4
#### Header 5

Header 1

Header 2

Header 3

Header 4

Header 5

Code can be embedded in documents with nice formatting using backticks (symbol below the tilda in the upper left of an American qwerty keyboard). For example, if you were to type the following (without the >)

>```{r}
>install.packages("lme4", repos = 'http://cran.us.r-project.org')
>```

you could run that block in R and install the lme4 package. In the markdown document it would show up as:

install.packages("lme4", repos = "http://cran.us.r-project.org")

## 
## The downloaded binary packages are in
##  /var/folders/r5/l8t2qxpj3m1_gnlf6ys7fwdm0000gq/T//RtmpiqoAcq/downloaded_packages

You should try this with the knitr package [UPDATE: You will likely have to restart RStudio after installing and loading the knitr library because it changes options in RStudio]. knitr is a fantastic package for converting the R markdown file (.Rmd) into an HTML document. The design of knitr allows any input languages (e.g. R, Python and awk) and any output markup languages (e.g. LaTeX, HTML, Markdown, AsciiDoc, and reStructuredText). I find Markdown to be the easies language to use to create documents for coursework/assignments. knitr is dynamic, so it will run your code and print the results below it. The best part is if you get new data or find an error in your code, you just re-knit the file and it’s updated a few seconds later.

In RStudio, once you install knitr a button shows up that you can click to knit the markdown file (produce your html document). When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist    
##  Min.   : 4.0   Min.   :  2  
##  1st Qu.:12.0   1st Qu.: 26  
##  Median :15.0   Median : 36  
##  Mean   :15.4   Mean   : 43  
##  3rd Qu.:19.0   3rd Qu.: 56  
##  Max.   :25.0   Max.   :120

These will be embedded and run in the document, but you can also run single lines or blocks (chunks) of code in RStudio to make sure they work before knitting the document. You can also embed plots, for example:

plot(cars)

unnamed-chunk-3

This is great but after kniting you are left with an HTML document. What if you want a PDF or Word Document? There may be other ways to do this, but I use pandoc to covert the HTML file into other forms such as PDF or MS Word Documents. After installing pandoc, you use it via the command line. I use a Mac, so I just open the Terminal to get to my Bash Shell and cd to the folder containing my document, then just run

pandoc Knitting_Beautiful_Documents_in_RStudio.md -o 
Knitting_Beautiful_Documents_in_RStudio.pdf

and I have a PDF. Unfortunately, the pandoc default is to have huge margins that look ridiculous. To correct for that you can save a style file in the folder with the document you’re converting. To do that you just create a text file with the line:

\usepackage[vmargin=1in,hmargin=1in]{geometry}

and save it as something like margins.sty. Then in the command line you run

pandoc Knitting_Beautiful_Documents_in_RStudio.md -o 
Knitting_Beautiful_Documents_in_RStudio.pdf -H margins.sty

Now when you view the PDF it should have 1 inch margins and look much nicer. That is about it for making attractive, dynamic documents using R, RStudio, knitr, and pandoc. It seems like a lot but after doing it once or twice it becomes very quick and easy. I have yet to use it but there is also a package pander, which allows you to skip the command line step and do the file conversion right in R (RStudio).

Using Markdown in RStudio with knitr is wonderful for creating reports, class assignments, and homework solutions (as a student, you’d really impress your instructor), but other people take it even further. Karthik Ram of rOpenSci fame has a great blog post of how to exend this system to include citations and bibliographies that allow you to ditch word.

For a nice Markdown reference list, check out this useful cheat sheet

Finally, if you want to see my Markdown code and everything that went into making this, you can find in my GitHub repository. Specifically, you can find the .Rmd, .md, margins.sty, and built PDF files.

[UPDATE: On a Mac R/RStudio/pandoc may have trouble finding your LaTeX distribution to use to build the PDF (even though using Markdown not LaTeX directly). A potential solution I found when building an R package was to specify the path to LaTeX: http://danieljhocking.wordpress.com/2012/12/15/missing-pdflatex/]

 

[UPDATE 2: From RProgramming.net I found a nice way to create a PDF without leaving R (calling pandoc from R)]

# Create .md, .html, and .pdf files
knit("File.Rmd")
markdownToHTML('File.md', 'File.html', options=c("use_xhml"))
system("pandoc -s File.html -o File.pdf")

Building R packages: missing path to pdflatex

Recently whiling trying to build an R package for generalized estimating equation model selection (QICpack on github), I was getting an error related to latex creating the PDF package manuals. It seems like this is a relatively common problem on some versions of Mac OS X, but I did not find it easy to find an answer so I thought I’d describe my ultimate solution.

The specific error I was getting was:

error in texi2dvi(file = file, pdf = true, clean = clean, quiet = quiet, : 
pdflatex is not available

I was getting this error regardless of whether building the package using R CMD packagename in the Terminal or using build in R or Rstudio via the devtools package.

I tried reinstalling MacTex and restarting my computer to no avail. It turned out that it was a problem with not having a path to the Tex installation. You can check using the following code in the Terminal or in R:

Sys.which("pdflatex")
Sys.getenv("PATH")

In my case, I got something like:

Sys.which(“pdflatex”)

 ""

Sys.getenv(“PATH”)

"/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin"

I don’t know much about setting paths, profiles, and environments, but this indicated that there is no path to latex and so the build function can’t find pdflatex needed to convert the .Rd files (LaTeX code of manuals in R packages) to PDF files for the R package. I was able to fix the problem by adding a path to the latex binaries using:

Sys.setenv(PATH=paste(Sys.getenv("PATH"),"/usr/texbin",sep=":"))

Now when I run Sys.getenv("PATH") I get

"/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/texbin"

Now R is able to find the LaTeX binaries and when I run Sys.which("pdflatex") I get

              pdflatex 
"/usr/texbin/pdflatex"

That should fix the problem and now the build function can turn the .Rd files into PDF documents of the R functions.

R Workflow


When working with R you end up using a large number of datasets, packages, functions, objects, output files, workspaces, etc.  It can get a bit overwhelming trying to keep everything organized.  That is why a consistent, well-organized workflow is needed.  I definitely do not have one yet.  I’ll post more on script editors and IDEs another time but for now I just wanted to share this video on R Workflow that includes Eclipse, Sweave, LaTeX, and R.

http://www.vcasmo.com/video/drewconway/10362

Good luck and feel free to comment with any personal experiences or suggestions.

Follow

Get every new post delivered to your Inbox.

Join 55 other followers