Blog Archives

Knitting beautiful documents in RStudio

Title: Knitting Beautiful Documents in RStudio

Header: Markdown

This document was written using Markdown in RStudio. RStudio is a wonderful IDE for writing R scripts and keeping track of variables, dataframes, history, packages and much more. Markdown is a simple formatting syntax for authoring web pages. It allows for easy formatting, for example

# Header 1
## Header 2
### Header 3
#### Header 4
#### Header 5

Header 1

Header 2

Header 3

Header 4

Header 5

Code can be embedded in documents with nice formatting using backticks (symbol below the tilda in the upper left of an American qwerty keyboard). For example, if you were to type the following (without the >)

>```{r}
>install.packages("lme4", repos = 'http://cran.us.r-project.org')
>```

you could run that block in R and install the lme4 package. In the markdown document it would show up as:

install.packages("lme4", repos = "http://cran.us.r-project.org")

## 
## The downloaded binary packages are in
##  /var/folders/r5/l8t2qxpj3m1_gnlf6ys7fwdm0000gq/T//RtmpiqoAcq/downloaded_packages

You should try this with the knitr package [UPDATE: You will likely have to restart RStudio after installing and loading the knitr library because it changes options in RStudio]. knitr is a fantastic package for converting the R markdown file (.Rmd) into an HTML document. The design of knitr allows any input languages (e.g. R, Python and awk) and any output markup languages (e.g. LaTeX, HTML, Markdown, AsciiDoc, and reStructuredText). I find Markdown to be the easies language to use to create documents for coursework/assignments. knitr is dynamic, so it will run your code and print the results below it. The best part is if you get new data or find an error in your code, you just re-knit the file and it’s updated a few seconds later.

In RStudio, once you install knitr a button shows up that you can click to knit the markdown file (produce your html document). When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist    
##  Min.   : 4.0   Min.   :  2  
##  1st Qu.:12.0   1st Qu.: 26  
##  Median :15.0   Median : 36  
##  Mean   :15.4   Mean   : 43  
##  3rd Qu.:19.0   3rd Qu.: 56  
##  Max.   :25.0   Max.   :120

These will be embedded and run in the document, but you can also run single lines or blocks (chunks) of code in RStudio to make sure they work before knitting the document. You can also embed plots, for example:

plot(cars)

unnamed-chunk-3

This is great but after kniting you are left with an HTML document. What if you want a PDF or Word Document? There may be other ways to do this, but I use pandoc to covert the HTML file into other forms such as PDF or MS Word Documents. After installing pandoc, you use it via the command line. I use a Mac, so I just open the Terminal to get to my Bash Shell and cd to the folder containing my document, then just run

pandoc Knitting_Beautiful_Documents_in_RStudio.md -o 
Knitting_Beautiful_Documents_in_RStudio.pdf

and I have a PDF. Unfortunately, the pandoc default is to have huge margins that look ridiculous. To correct for that you can save a style file in the folder with the document you’re converting. To do that you just create a text file with the line:

\usepackage[vmargin=1in,hmargin=1in]{geometry}

and save it as something like margins.sty. Then in the command line you run

pandoc Knitting_Beautiful_Documents_in_RStudio.md -o 
Knitting_Beautiful_Documents_in_RStudio.pdf -H margins.sty

Now when you view the PDF it should have 1 inch margins and look much nicer. That is about it for making attractive, dynamic documents using R, RStudio, knitr, and pandoc. It seems like a lot but after doing it once or twice it becomes very quick and easy. I have yet to use it but there is also a package pander, which allows you to skip the command line step and do the file conversion right in R (RStudio).

Using Markdown in RStudio with knitr is wonderful for creating reports, class assignments, and homework solutions (as a student, you’d really impress your instructor), but other people take it even further. Karthik Ram of rOpenSci fame has a great blog post of how to exend this system to include citations and bibliographies that allow you to ditch word.

For a nice Markdown reference list, check out this useful cheat sheet

Finally, if you want to see my Markdown code and everything that went into making this, you can find in my GitHub repository. Specifically, you can find the .Rmd, .md, margins.sty, and built PDF files.

[UPDATE: On a Mac R/RStudio/pandoc may have trouble finding your LaTeX distribution to use to build the PDF (even though using Markdown not LaTeX directly). A potential solution I found when building an R package was to specify the path to LaTeX: http://danieljhocking.wordpress.com/2012/12/15/missing-pdflatex/]

 

[UPDATE 2: From RProgramming.net I found a nice way to create a PDF without leaving R (calling pandoc from R)]

# Create .md, .html, and .pdf files
knit("File.Rmd")
markdownToHTML('File.md', 'File.html', options=c("use_xhml"))
system("pandoc -s File.html -o File.pdf")

High Resolution Figures in R

LogisticAs I was recently preparing a manuscript for PLOS ONE, I realized the default resolution of R and RStudio images are insufficient for publication. PLOS ONE requires 300 ppi images in TIFF or EPS (encapsulated postscript) format. In R plots are exported at 72 ppi by default. I love RStudio but was disappointed to find that there was no options for exporting figures at high resolution.

PLOS ONE has extensive instructions for scaling, compressing, and converting image files to meet their standards. Unfortunately, there is no good way to go from low resolution to high resolution (i.e. rescaling in Photoshop) as my friend Liam Revell, and phytools author, pointed out with entertaining illustration from PhD comics (upper right panel). The point is if the original image isn’t created at a high enough resolution, then using Photoshop to artificially increase the resolution has problems of graininess because Photoshop can only guess how to interpolate between the points. This might not be a big problem with simple plots created in R because interpolation between points in a line shouldn’t be difficult, particularly when starting with a PDF.

Even if scaling up from a low resolution PDF would work, it would be better to have a direct solution in R.

[UPDATE: In the original post, I wrote about my trial and error when first trying this out. It was helpful for me but less helpful for others. Since this post gets a fair amount of traffic, I now have the solution first and the rest of the original post below that]

If you need to use a font not included in R, such as the Arial family of fonts for a publisher like PLOS, the extrafont package is very useful but takes a long time to run (but should only have to run once – except maybe when you update R you’ll have to do it again).

install.packages("extrafont")
library(extrafont)
font_import() # this gets fonts installed anywhere on your computer, most commonly from MS Office install fonts. It takes a LONG while.

Now to make a nice looking plot with a higher resolution postscript is the best but you may need to download another program to view it on you computer. TIFF is good for Raster data including photos and color gradients. PDF should be good for line and point based plots. JPEG is not going to be good for high resolution figures due to compression and detail loss but is easy to view and use for presentations, and PNG is lower quality that is useful for websites. Here are some high resolution figure examples. They all start by telling R to open a connection (device) to make a PDF or TIFF or EPS rather than just print to the R/RStudio plot default device. Then making the plot, then closing the device, at which point the file is saved. It won’t show up on the screen but will be saved to your working directory.

x = 1:20
y = x * 2

setwd('/Users/.../Folder/') # place to save the file - can be over-ridden by putting a path in the 
                              file = “ “ part of the functions below.

pdf(file = "FileName.pdf", width = 12, height = 17, family = "Helvetica") # defaults to 7 x 7 inches
plot(x, y, type = "l")
dev.off()

postscript(“FileName.eps", width = 12, height = 17, horizontal = FALSE, 
           onefile = FALSE, paper = "special", colormodel = "cmyk", 
           family = "Courier")
plot()
dev.off()

bitmap(“FileName.tiff", height = 12, width = 17, units = 'cm', 
       type = "tifflzw", res = 300)
plot()
dev.off()

tiff(“FileName.tiff", height = 12, width = 17, units = 'cm', 
     compression = "lzw", res = 300)
plot()
dev.off()

[ORIGINAL POST Follows]

It took some time to figure out but here are some trials and the ultimate solution I came up with:

x <- 1:100
y <- 1:100
plot(x, y) # Make plot
tiff("Plot1.tiff")
dev.off()
[/code]

Nothing happens in this case. However, by setting up the tiff file first, then making the plot, the resulting TIFF file is saved to your working directory and is 924 KB, 72 ppi, 480 x 480 pixels.

To increase the resolution I tried the following:
tiff("Plot2.tif", res = 300)
plot(x, y) # Make plot
dev.off()

but in RStudio the plot could not be printed and hence not saved because it was too large for the print area. Therefore, I had to open up R directly and run the code. Interestingly, a blank TIFF file was created of the same size as Plot1.tiff. This is where I got hung up for a while. I eventually found that R can’t figure out the other image parameters when resolution is changes or because the default is too big to print, so they have to be specified directly as such:

tiff("Plot3.tiff", width = 4, height = 4, units = 'in', res = 300)
plot(x, y) # Make plot
dev.off()

This creates a TIFF file that is 5,800 KB, 300 ppi, 4 inches by 4 inches. Surprisingly, on a Mac it still indicates that it’s only 72 ppi when viewed in Preview. The larger size indicates that it is actually 300 ppi. I ran the same code but specifically specified res = 72 and the file was only 334 KB, suggesting that Preview is incorrect and the file is really 300 ppi. I played with various compressions but lzw and none were the same while rle resulted in a larger file (less compression). That seems odd again.

Finally, I tried using the bitmap function to create a TIFF:

bitmap("Plot7.tiff", height = 4, width = 4, units = 'in', type="tifflzw", res=300)
plot(x, y)
dev.off()
par(mfrow = c(1,1))

Interestingly, this file is only 9 KB but is listed as 300 dpi, 1200 x 1200 pixels. I’m really not sure why these functions don’t seem to be working as smoothly as expected but hopefully this can help get high resolution images directly from R for publication. I plan to use the bitmap function in the future to create high resolution TIFF files for publication. This is what is desired by outlets such as PLOS ONE. It’s easier than dealing with postscript. I also don’t know if EPS files from R or RStudio are created with LaTeX. I know that can be a problem for PLOS ONE.

[UPDATE: Here is the code for my final figure for PLOS ONE with both postscript and tiff options plus it uses the extrafonts package to allow Arial fonts in postscript figures as required by PLOS ONE]

install.packages("extrafont")
library(extrafont)
font_import() # this gets fonts installed anywhere on your computer, 
# most commonly from MS Office install fonts. 
# It takes a long while.

bitmap("CummulativeCaptures.tiff", height = 12, width = 17, 
units = 'cm', type="tifflzw", res=300)
postscript("CummulativeCaptures.eps", width = 8, height = 8, 
horizontal = FALSE, onefile = FALSE, paper = "special", 
colormodel = "cmyk", family = "Arial")
par(mar = c(3.5, 3.5, 1, 1), mgp = c(2, 0.7, 0), tck = -0.01)
plot(refmCumm$date, refmCumm$cumm, type = "n", xaxt = "n",
xlab = "Date",
ylab = "Cumulative number of salamanders per plot",
xlim = c(r[1], r[2]),
ylim = ylims)
axis.POSIXct(1, at = seq(r[1], r[2], by = "year"), format = "%b %Y")
lines(refmCumm$date, refmCumm$cumm, lty = 1, pch = 19)
lines(depmCumm$date, depmCumm$cumm, lty = 2, pch = 24)
arrows(refmCumm$date, refmCumm$cumm+refmCumm$se, refmCumm$date, refmCumm$cumm-refmCumm$se, angle=90, code=3, length=0)
arrows(depmCumm$date, depmCumm$cumm+depmCumm$se, depmCumm$date, depmCumm$cumm-depmCumm$se, angle=90, code=3, length=0)
dev.off()

 EDIT: Just found a nice blog post with recommendations on device 
outputs on Revolutions here

Below is all the code that includes comparison of sizes with PDF, PNG, JPEG, and EPS files as well.

plot(x, y) # Make plot
tiff("Plot1.tiff")
dev.off()

tiff("Plot2.tiff", res = 300)
plot(x, y) # Make plot
dev.off()

tiff("Plot3.tiff", width = 4, height = 4, units = 'in', res = 300)
plot(x, y) # Make plot
dev.off()

tiff("Plot3.72.tiff", width = 4, height = 4, units = 'in', res = 72)
plot(x, y) # Make plot
dev.off()

tiff("Plot4.tiff", width = 4, height = 4, units = 'in', res = 300, compression = 'lzw')
plot(x, y) # Make plot
dev.off()

tiff("Plot5.tiff", width = 4, height = 4, units = 'in', res = 300, compression = 'none')
plot(x, y) # Make plot
dev.off()

tiff("Plot6.tiff", width = 4, height = 4, units = 'in', res = 300, compression = 'rle')
 # Make plot
dev.off()

bitmap("Plot7.tiff", height = 4, width = 4, units = 'in', type="tifflzw", res=300)
plot(x, y)
dev.off()
par(mfrow = c(1,1))

bitmap("Plot8.tiff", height = 480, width = 480, type="tifflzw", res=300)
plot(x, y)
dev.off()
par(mfrow = c(1,1))

jpeg("Plot3.jpeg", width = 4, height = 4, units = 'in', res = 300)
plot(x, y) # Make plot
dev.off()

tiff("Plot4b.tiff", width = 4, height = 4, pointsize = 1/300, units = 'in', res = 300)
plot(x, y) # Make plot
dev.off()

png("Plot3.png", width = 4, height = 4, units = 'in', res = 300)
plot(x, y) # Make plot
dev.off()

pdf("Plot3.pdf", width = 4, height = 4)
plot(x, y) # Make plot
dev.off()postscript("Plot3.eps", width = 480, height = 480)
plot(x, y) # Make plot
dev.off()

Reference and PDF management

Organizing and Managing PDFs and References
Organization and administration is an ever increasing part of any academic’s or researcher’s life. It begins as a small piece as an undergraduate and initial master’s student but grows almost exponentially over time. Luckily there are numerous programs to help manage the academic’s electronic life. Below is a list of programs and some useful links. Here is a link to one rather negative review of a variety of programs. You can find another summary of programs here. Also see these recommendations and this link for a nice comparison of the programs.


Papers – organize your .pdf files like iTunes does for music (mac only). Here is a video showing how to integrate Papers and Endnote.  I love this program but have not figured out a smooth workflow (or combination of existing files) for Papers-to-Endnote.

Mendeley – A great reference manager that I recommend for people who aren’t Mac uses and therefore can’t use Papers. It is better at linking to PDFs than Endnote.

Zotero – another excellent reference manager. I have less experience with this than the others but it seems very nice.

Endnote – citation manager that has been the standard for academics. Excellent for citations and you can link to the PDFs but it’s not great for managing your PDFs like Papers or even Mendeley.

Bibdesk (Mac only, oriented for LaTeX)

Refworks – no experience with this one

Labmeeting – I haven’t tried this yet since I just came across it but it looks interesting. Drop me a note if you have any experience with this.

*There are so many citation/reference management programs available now it is difficult to choose one. Here is a Wiki with info on dozens of options. Personally, I am using Papers and Endnote. I love papers and started with endnote nearly a decade ago and haven’t had the time/energy to fully migrate to another program. If I were to start now I’d probably try Mendeley for all of my reference management needs because it’s free, open source, available for multiple operating systems, and can sync online so as to avoid being tied to a single computer. There is also good sharing options for lab meetings and collaborative projects. They group at Mendeley also seem to update frequently and really make an effort to improve the product. The only downside is that I’ve heard it can be slow and the server is unavailable at times.

Please join the conversation and tell us about your experiences with these or other programs. Let’s not all suffer in silence.

Follow

Get every new post delivered to your Inbox.

Join 48 other followers