Category Archives: R

Blue Jay and Scrub Jay : Using rvertnet to check the distributions in R

Daniel Hocking:

This is a great post about a great package. Unlike some other broad vertebrate databases, VertNet has good QAQC. It’s great being able to access it directly in R where the data manipulation, analysis, and even plotting will occur. Thank you to the ROpenSci team for development of the rvertnet package and Vijay Barve for the clear, simple tutorial.

Originally posted on Vijay Barve:

As part of my Google Summer of Code, I am also working on another package for R called rvertnet. This package is a wrapper in R for VertNet websites. Vertnet is a vertebrate distributed database network consisting of FishNet2MaNISHerpNET, and ORNIS. Out of that currently Fishnet, HerpNET and ORNIS have their v2 portals serving data. rvertnet has functions now to access this data and import them into R data frames.

Some of my lab mates faced a difficulty in downloading data for Scrub Jay (Aphelocoma spp. ) due to large number of records (220k+) on ORNIS, so decided to try using rvertnet package which was still in development that time. The package really helpe and got the results quickly. So while ecploring that data I came up with this case study.

So here to get data for Blue Jay (Cyanocitta cristata) which is distributed in…

View original 330 more words

Teaching Scientific Computing: Peer Review

computingThis post is going out on a bit of a limb because I am not familiar with the pedagogical literature relating to teaching scientific computing. As such, I can only speak from my very limited experience. I’ve taken a couple short courses on scientific computing, but the only formal full-semester course I’ve taken was Introduction to C Programming for Engineers 15 years ago. In that course, the instructor spent 50 minutes 3 days a week writing code on the chalk board in front of us and we were expected to learn. Homework was to write increasingly large programs throughout the semester. If they didn’t work we got a 0%, if they produced the wrong output we got a 50%, and if they worked properly we got a 100%. Obviously it was a terrible course (although a number of my statistic courses that involved programming were not very different so this might be more common than I’d like to believe). Besides some of the conspicuous instructional problems, I was just thinking that scientific programming courses could learn from pedagogy in the humanities. The University of New Hampshire requires undergraduates to take a number of writing intensive courses. To qualify as writing intensive a course my meet 3 criteria:

  1. Students in the course should do substantial writing that enhances learning and demonstrates knowledge of the subject or the discipline. Writing should be an integral part of the course and should account for a significant part (approximately 50% or more) of the final grade.
  2. Writing should be assigned in such a manner as to require students to write regularly throughout the course. Major assignments should integrate the process of writing (prewriting, drafting, revision, editing). Students should be able to receive constructive feedback of some kind (peer response, workshop, professor, TA, etc.) during the drafting/revising process to help improve their writing.
  3. The course should include both formal (graded) and informal (heuristic) writing.  There should be papers written outside of class which are handed in for formal evaluation as well as informal assignments designed to promote learning, such as invention activities, in-class essays, reaction papers, journals, reading summaries, or other appropriate exercises.

I think these criteria could be applied or at least adapted for scientific computing courses. The 1st one is easy. The 2nd and 2rd are what I think computing courses could really take advantage of. From what I’ve seen, there is often not a lot of time spent of informal feedback from instructors and peers to help with revision. In programming, especially with flexible languages like R, there are often many solutions to the same problems. Useful assignments could be to critic the programs of peers, find ways to improve code efficiency, and provide alternative solutions to sections of code. This could include critics of the commenting and README files.

In introductory courses there is often an emphasis on cover content. Some people will balk at the idea of spending time of learning alternatives to simple options when there is clearly 1 best solution and so much material to cover to get students writing even simple scripts. However, it’s better in my opinion to learn a few things well than many things superficially. By evaluating, revising, and developing alternatives to code written by peers, students will learn how to program better. There is a reason that informal assessment, peer review, and revision is a required part of writing intensive courses. Those same reasons apply to scientific computing courses. Just as review and revision make us better writers, it will make us better programmers.

RStudio Presentations

After finally getting around to to posting about knitr and Markdown just yesterday, RStudio comes out with an update that makes it even easier: http://www.rstudio.com/ide/docs/presentations/overview

P.S. I Love RStudio

Knitting beautiful documents in RStudio

Title: Knitting Beautiful Documents in RStudio

Header: Markdown

This document was written using Markdown in RStudio. RStudio is a wonderful IDE for writing R scripts and keeping track of variables, dataframes, history, packages and much more. Markdown is a simple formatting syntax for authoring web pages. It allows for easy formatting, for example

# Header 1
## Header 2
### Header 3
#### Header 4
#### Header 5

Header 1

Header 2

Header 3

Header 4

Header 5

Code can be embedded in documents with nice formatting using backticks (symbol below the tilda in the upper left of an American qwerty keyboard). For example, if you were to type the following (without the >)

>```{r}
>install.packages("lme4", repos = 'http://cran.us.r-project.org')
>```

you could run that block in R and install the lme4 package. In the markdown document it would show up as:

install.packages("lme4", repos = "http://cran.us.r-project.org")

## 
## The downloaded binary packages are in
##  /var/folders/r5/l8t2qxpj3m1_gnlf6ys7fwdm0000gq/T//RtmpiqoAcq/downloaded_packages

You should try this with the knitr package [UPDATE: You will likely have to restart RStudio after installing and loading the knitr library because it changes options in RStudio]. knitr is a fantastic package for converting the R markdown file (.Rmd) into an HTML document. The design of knitr allows any input languages (e.g. R, Python and awk) and any output markup languages (e.g. LaTeX, HTML, Markdown, AsciiDoc, and reStructuredText). I find Markdown to be the easies language to use to create documents for coursework/assignments. knitr is dynamic, so it will run your code and print the results below it. The best part is if you get new data or find an error in your code, you just re-knit the file and it’s updated a few seconds later.

In RStudio, once you install knitr a button shows up that you can click to knit the markdown file (produce your html document). When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist    
##  Min.   : 4.0   Min.   :  2  
##  1st Qu.:12.0   1st Qu.: 26  
##  Median :15.0   Median : 36  
##  Mean   :15.4   Mean   : 43  
##  3rd Qu.:19.0   3rd Qu.: 56  
##  Max.   :25.0   Max.   :120

These will be embedded and run in the document, but you can also run single lines or blocks (chunks) of code in RStudio to make sure they work before knitting the document. You can also embed plots, for example:

plot(cars)

unnamed-chunk-3

This is great but after kniting you are left with an HTML document. What if you want a PDF or Word Document? There may be other ways to do this, but I use pandoc to covert the HTML file into other forms such as PDF or MS Word Documents. After installing pandoc, you use it via the command line. I use a Mac, so I just open the Terminal to get to my Bash Shell and cd to the folder containing my document, then just run

pandoc Knitting_Beautiful_Documents_in_RStudio.md -o 
Knitting_Beautiful_Documents_in_RStudio.pdf

and I have a PDF. Unfortunately, the pandoc default is to have huge margins that look ridiculous. To correct for that you can save a style file in the folder with the document you’re converting. To do that you just create a text file with the line:

\usepackage[vmargin=1in,hmargin=1in]{geometry}

and save it as something like margins.sty. Then in the command line you run

pandoc Knitting_Beautiful_Documents_in_RStudio.md -o 
Knitting_Beautiful_Documents_in_RStudio.pdf -H margins.sty

Now when you view the PDF it should have 1 inch margins and look much nicer. That is about it for making attractive, dynamic documents using R, RStudio, knitr, and pandoc. It seems like a lot but after doing it once or twice it becomes very quick and easy. I have yet to use it but there is also a package pander, which allows you to skip the command line step and do the file conversion right in R (RStudio).

Using Markdown in RStudio with knitr is wonderful for creating reports, class assignments, and homework solutions (as a student, you’d really impress your instructor), but other people take it even further. Karthik Ram of rOpenSci fame has a great blog post of how to exend this system to include citations and bibliographies that allow you to ditch word.

For a nice Markdown reference list, check out this useful cheat sheet

Finally, if you want to see my Markdown code and everything that went into making this, you can find in my GitHub repository. Specifically, you can find the .Rmd, .md, margins.sty, and built PDF files.

[UPDATE: On a Mac R/RStudio/pandoc may have trouble finding your LaTeX distribution to use to build the PDF (even though using Markdown not LaTeX directly). A potential solution I found when building an R package was to specify the path to LaTeX: http://danieljhocking.wordpress.com/2012/12/15/missing-pdflatex/]

 

[UPDATE 2: From RProgramming.net I found a nice way to create a PDF without leaving R (calling pandoc from R)]

# Create .md, .html, and .pdf files
knit("File.Rmd")
markdownToHTML('File.md', 'File.html', options=c("use_xhml"))
system("pandoc -s File.html -o File.pdf")

High Resolution Figures in R

LogisticAs I was recently preparing a manuscript for PLOS ONE, I realized the default resolution of R and RStudio images are insufficient for publication. PLOS ONE requires 300 ppi images in TIFF or EPS (encapsulated postscript) format. In R plots are exported at 72 ppi by default. I love RStudio but was disappointed to find that there was no options for exporting figures at high resolution.

PLOS ONE has extensive instructions for scaling, compressing, and converting image files to meet their standards. Unfortunately, there is no good way to go from low resolution to high resolution (i.e. rescaling in Photoshop) as my friend Liam Revell, and phytools author, pointed out with entertaining illustration from PhD comics (upper right panel). The point is if the original image isn’t created at a high enough resolution, then using Photoshop to artificially increase the resolution has problems of graininess because Photoshop can only guess how to interpolate between the points. This might not be a big problem with simple plots created in R because interpolation between points in a line shouldn’t be difficult, particularly when starting with a PDF.

Even if scaling up from a low resolution PDF would work, it would be better to have a direct solution in R.

[UPDATE: In the original post, I wrote about my trial and error when first trying this out. It was helpful for me but less helpful for others. Since this post gets a fair amount of traffic, I now have the solution first and the rest of the original post below that]

If you need to use a font not included in R, such as the Arial family of fonts for a publisher like PLOS, the extrafont package is very useful but takes a long time to run (but should only have to run once – except maybe when you update R you’ll have to do it again).

install.packages("extrafont")
library(extrafont)
font_import() # this gets fonts installed anywhere on your computer, most commonly from MS Office install fonts. It takes a LONG while.

Now to make a nice looking plot with a higher resolution postscript is the best but you may need to download another program to view it on you computer. TIFF is good for Raster data including photos and color gradients. PDF should be good for line and point based plots. JPEG is not going to be good for high resolution figures due to compression and detail loss but is easy to view and use for presentations, and PNG is lower quality that is useful for websites. Here are some high resolution figure examples. They all start by telling R to open a connection (device) to make a PDF or TIFF or EPS rather than just print to the R/RStudio plot default device. Then making the plot, then closing the device, at which point the file is saved. It won’t show up on the screen but will be saved to your working directory.

x = 1:20
y = x * 2

setwd('/Users/.../Folder/') # place to save the file - can be over-ridden by putting a path in the 
                              file = “ “ part of the functions below.

pdf(file = "FileName.pdf", width = 12, height = 17, family = "Helvetica") # defaults to 7 x 7 inches
plot(x, y, type = "l")
dev.off()

postscript(“FileName.eps", width = 12, height = 17, horizontal = FALSE, 
           onefile = FALSE, paper = "special", colormodel = "cmyk", 
           family = "Courier")
plot()
dev.off()

bitmap(“FileName.tiff", height = 12, width = 17, units = 'cm', 
       type = "tifflzw", res = 300)
plot()
dev.off()

tiff(“FileName.tiff", height = 12, width = 17, units = 'cm', 
     compression = "lzw", res = 300)
plot()
dev.off()

[ORIGINAL POST Follows]

It took some time to figure out but here are some trials and the ultimate solution I came up with:

x <- 1:100
y <- 1:100
plot(x, y) # Make plot
tiff("Plot1.tiff")
dev.off()
[/code]

Nothing happens in this case. However, by setting up the tiff file first, then making the plot, the resulting TIFF file is saved to your working directory and is 924 KB, 72 ppi, 480 x 480 pixels.

To increase the resolution I tried the following:
tiff("Plot2.tif", res = 300)
plot(x, y) # Make plot
dev.off()

but in RStudio the plot could not be printed and hence not saved because it was too large for the print area. Therefore, I had to open up R directly and run the code. Interestingly, a blank TIFF file was created of the same size as Plot1.tiff. This is where I got hung up for a while. I eventually found that R can't figure out the other image parameters when resolution is changes or because the default is too big to print, so they have to be specified directly as such:

tiff("Plot3.tiff", width = 4, height = 4, units = 'in', res = 300)
plot(x, y) # Make plot
dev.off()

This creates a TIFF file that is 5,800 KB, 300 ppi, 4 inches by 4 inches. Surprisingly, on a Mac it still indicates that it's only 72 ppi when viewed in Preview. The larger size indicates that it is actually 300 ppi. I ran the same code but specifically specified res = 72 and the file was only 334 KB, suggesting that Preview is incorrect and the file is really 300 ppi. I played with various compressions but lzw and none were the same while rle resulted in a larger file (less compression). That seems odd again.

Finally, I tried using the bitmap function to create a TIFF:

bitmap("Plot7.tiff", height = 4, width = 4, units = 'in', type="tifflzw", res=300)
plot(x, y)
dev.off()
par(mfrow = c(1,1))

Interestingly, this file is only 9 KB but is listed as 300 dpi, 1200 x 1200 pixels. I'm really not sure why these functions don't seem to be working as smoothly as expected but hopefully this can help get high resolution images directly from R for publication. I plan to use the bitmap function in the future to create high resolution TIFF files for publication. This is what is desired by outlets such as PLOS ONE. It's easier than dealing with postscript. I also don't know if EPS files from R or RStudio are created with LaTeX. I know that can be a problem for PLOS ONE.

[UPDATE: Here is the code for my final figure for PLOS ONE with both postscript and tiff options plus it uses the extrafonts package to allow Arial fonts in postscript figures as required by PLOS ONE]

install.packages("extrafont")
library(extrafont)
font_import() # this gets fonts installed anywhere on your computer, 
# most commonly from MS Office install fonts. 
# It takes a long while.

bitmap("CummulativeCaptures.tiff", height = 12, width = 17, 
units = 'cm', type="tifflzw", res=300)
postscript("CummulativeCaptures.eps", width = 8, height = 8, 
horizontal = FALSE, onefile = FALSE, paper = "special", 
colormodel = "cmyk", family = "Arial")
par(mar = c(3.5, 3.5, 1, 1), mgp = c(2, 0.7, 0), tck = -0.01)
plot(refmCumm$date, refmCumm$cumm, type = "n", xaxt = "n",
xlab = "Date",
ylab = "Cumulative number of salamanders per plot",
xlim = c(r[1], r[2]),
ylim = ylims)
axis.POSIXct(1, at = seq(r[1], r[2], by = "year"), format = "%b %Y")
lines(refmCumm$date, refmCumm$cumm, lty = 1, pch = 19)
lines(depmCumm$date, depmCumm$cumm, lty = 2, pch = 24)
arrows(refmCumm$date, refmCumm$cumm+refmCumm$se, refmCumm$date, refmCumm$cumm-refmCumm$se, angle=90, code=3, length=0)
arrows(depmCumm$date, depmCumm$cumm+depmCumm$se, depmCumm$date, depmCumm$cumm-depmCumm$se, angle=90, code=3, length=0)
dev.off()

 EDIT: Just found a nice blog post with recommendations on device 
outputs on Revolutions here

Below is all the code that includes comparison of sizes with PDF, PNG, JPEG, and EPS files as well.

plot(x, y) # Make plot
tiff("Plot1.tiff")
dev.off()

tiff("Plot2.tiff", res = 300)
plot(x, y) # Make plot
dev.off()

tiff("Plot3.tiff", width = 4, height = 4, units = 'in', res = 300)
plot(x, y) # Make plot
dev.off()

tiff("Plot3.72.tiff", width = 4, height = 4, units = 'in', res = 72)
plot(x, y) # Make plot
dev.off()

tiff("Plot4.tiff", width = 4, height = 4, units = 'in', res = 300, compression = 'lzw')
plot(x, y) # Make plot
dev.off()

tiff("Plot5.tiff", width = 4, height = 4, units = 'in', res = 300, compression = 'none')
plot(x, y) # Make plot
dev.off()

tiff("Plot6.tiff", width = 4, height = 4, units = 'in', res = 300, compression = 'rle')
 # Make plot
dev.off()

bitmap("Plot7.tiff", height = 4, width = 4, units = 'in', type="tifflzw", res=300)
plot(x, y)
dev.off()
par(mfrow = c(1,1))

bitmap("Plot8.tiff", height = 480, width = 480, type="tifflzw", res=300)
plot(x, y)
dev.off()
par(mfrow = c(1,1))

jpeg("Plot3.jpeg", width = 4, height = 4, units = 'in', res = 300)
plot(x, y) # Make plot
dev.off()

tiff("Plot4b.tiff", width = 4, height = 4, pointsize = 1/300, units = 'in', res = 300)
plot(x, y) # Make plot
dev.off()

png("Plot3.png", width = 4, height = 4, units = 'in', res = 300)
plot(x, y) # Make plot
dev.off()

pdf("Plot3.pdf", width = 4, height = 4)
plot(x, y) # Make plot
dev.off()postscript("Plot3.eps", width = 480, height = 480)
plot(x, y) # Make plot
dev.off()
Follow

Get every new post delivered to your Inbox.

Join 48 other followers