Blog Archives

New Paper: Modeling Daily Stream Temperature

Our paper, “A hierarchical model of daily stream temperature using air-water temperature synchronization, autocorrelation, and time lags” was just published today in the journal PeerJ. We developed a statistical model of daily stream temperature for a watershed in Western Massachusetts, USA.


Observed (blue) and predicted (red) temperatures compared to air temperature (black points).

We outline the major challenges of statistical stream modeling and show how our model addresses hysteresis, autocorrelations, time lags, and avoids problems of modeling winter temperatures when the air-water relationships do not exist because of phase changes and the lower limit of water temperature. Daily stream temperature models have potential use in regulations (protection levels based on thermal classifications) and in understanding the ecology of stream organisms.

If you aren’t familiar with PeerJ you should check them out. The provide open access peer-reviewed articles for a low price. Their review process is very fast and it is the most pleasurable experience you will ever have submitting a manuscript, pre-print, or review. Check out some reviews here.


dplyr is great…but

I have been loving Hadley Wickham’s new dplyr package for R. It creates a relatively small number of verbs to quickly and easily manipulate data. It is generally as fast as data.table but unless you’re already very familiar with data.table the syntax is much easier. There are a number of great introductions and tutorials, which I won’t attempt to recreate but here are some links:

The only challenge with going to dplyr has been dealing with the non-standard evaluation (see vignette(“nse”) and the new tbl_df class that gets created. In theory the resulting dataframes created from dplyr should act both as a data.frame object and as a new tbl_df object. There are some nice new features associated with tbl_df objects but also some idiosyncrasies that can break existing code.

For example, I had a function:

stdCovs <- function (x, y, var.names) {
    for (i in 1:length(var.names)) {
        x[, var.names[i]] <- (x[, var.names[i]] - mean(y[, var.names[i]], na.rm = T)) / 
            sd(y[, var.names[i]], na.rm = T)

that took the standardized covariates from one dataframe and applied them to standardize another dataframe for a particular set of covariates. This was particularly useful for doing predictions for new data using coefficients from a different dataset that had been standardized (y.standard). The code above worked when I was using data.frame objects. However, I have now started using dplyr for a variety of reasons but the code no longer works. I believe this is due to how tbl_df objects handle bracketing (e.g. y[ , var]). There is an issue on GitHub related to this.

In the end it turned out that double bracketing to extract the column works: Rather than the y[ , var] formulation it should be y[[var]]. Not too hard but the type of thing that wastes an hour when going to a new package. Despite this challenge and similar things, I highly recommend dplyr. Now hopefully the simplicity of the verbs and speed of computation can make up for the few hours spent learning and adjusting code.

Here’s my final code:

stdCovs <- function(x, y, var.names){
  for(i in 1:length(var.names)){
    x[ , var.names[i]] <- (x[ , var.names[i]] - mean(y[[var.names[i]]], na.rm = T)) / 
        sd(y[[var.names[i]]], na.rm = T)

One of the best features of dplyr is it’s use of magrittr for piping functions together with the %>% symbol. It’s not shown above but it’s worth learning. Previously, you would either have a bunch of useless intermediate objects created or a long string of nested functions that were hard (impossible at times) to decipher. The “then” command (%>%) does away with this as shown here by Revolution Analytics:

hourly_delay <- filter( 
      date, hour
    delay = mean(dep_delay), 
    n = n()
  n > 10 

Here’s the same code, but rather than nesting one function call inside the next, data is passed from one function to the next using the %>% operator:

hourly_delay <- flights %>% 
 filter(! %>% 
 group_by(date, hour) %>% 
   delay = mean(dep_delay), 
   n = n() ) %>% 
 filter(n > 10)

Beautiful, right. Pipes are fantastic and combined with dplyr’s simple verbs make coding a lot easier and more readable once you get the hang of it.

Paper Summary: Natural Disturbance and Logging Effects on Salamanders

Paper Summary:

Hocking, D.J., K.J. Babbitt, and M. Yamasaki. 2013. Comparison of Silvicultural and Natural Disturbance Effects on Terrestrial Salamanders in Northern Hardwood Forests. Biological Conservation 167:194-202. doi:

Unfortunately, this paper is behind a paywall. Please email me if you would like a copy for educational purposes.

We were interested in how red-backed salamanders respond to various logging practices compared with natural disturbance. Specifically, we compared abundance of salamanders in the two years following a major ice-storm with clearcuts, patch cuts, group cuts, single-tree selection harvests, and undisturbed forest patches in the White Mountains of New Hampshire (Northern Appalachian Mountains). The 100-year ice storm caused ~65% percent canopy loss in the effected areas. We know that clearcutting has detrimental effects on populations of woodland salamanders but the impacts of less intense harvesting and natural disturbances is less well understood.

We used transects of coverboards from 80m inside each forest patch extending to 80m outside each patch into the surround, undisturbed forest. Repeated counts of salamanders under these coverboards allowed us to employ a Dail-Madsen open population model to estimate abundance in each treatment, while accounting for imperfect detection. The results were quite clear as demonstrated in this figure:

Abundance Plot by Treatment

There were slightly fewer salamanders in the ice-storm damaged sites compared with undisturbed reference sites. The single-tree selection sites were most similar to the ice-storm damage sites. The group cut, patch cut, and clearcut didn’t differ from each other and all had ~88% fewer salamanders compared with reference sites.

In addition to comparing natural and anthropogenic disturbances, we were interested in examining how salamanders respond along the edge of even-aged harvests. Wind, solar exposure, and similar factors are altered in the forest edge adjacent to harvested areas. This can result in salamander abundance being reduced in forest edges around clearcuts. Previous researchers have used nonparametric estimates of edge effects. A limitation of this methods is that effects cannot be projected well across the landscape. These methods are also unable to account for imperfect detection. We developed a method to model edge effects as a logistic function while accounting for imperfect detection. As with the treatment effects, the results are quite clear with very few salamanders in the center of the even-aged harvests, a gradual increase in abundance near the forest edge, increasingly more salamanders in the forest moving away from the edge, and eventually leveling off at carrying capacity. In this case, red-backed salamander abundance reached 95% of carrying capacity 34 m into the surrounding forest. As the model is parametric, predictions can be projected across landscapes. The equation can be used in GIS and land managers can predict the total number of salamanders that would be lost from a landscape given a variety of alternative timber harvest plans.

Hopefully other researchers find this method useful and apply it for a variety of taxa. It could also be incorporated into ArcGIS or QGIS toolboxes/plugins as a tool for land managers. You can read our paper for more details if you’re interested. In addition to methodological details there is more information on environmental factors that affect detection and abundance of salamanders in this landscape.

Edge Effects

In Praise of Exploratory Statistics

If you haven’t seen it, you should definitely check out the latest post by Brian McGill at Dynamic Ecology. It’s a great post on the use of exporatory statistics. While it may be impossible to get an NSF grant with an analysis framed in terms of exploratory statistics, reviewers should definitely be more open to their appropriate use. Is is wrong to force authors to frame exploratory analyses in a hypothetico-deductive framework after the fact. Brian suggests that there are three forms of scientific analysis:

  1. Hypothesis testing
  2. Prediction
  3. Exploratory

and only one of these can be used for a single data set. My favorite quote is,

“If exploratory statistics weren’t treated like the crazy uncle nobody wants to talk about and everybody is embarrassed to admit being related to, science would be much better off.”

There’s lots of good stuff in his post, so I recommend spending 10 minutes of your day to read through it.

No Statistical Panacea, Hierarchical or Otherwise

Everyone in academia knows how painful the peer-review publication process can be. It’s a lot like Democracy, in that it’s the worst system ever invented, except for all the others. The peer-review process does a fair job at promoting good science overall, but it’s far from perfect. Sure anyone can point out a hundred flaws in the system, but I’m just going to focus on one aspect that has been bothering me particularly and has the potential to change: complicated statistical demands.

I have found that reviewers frequently require the data to be reanalyzed with a particularly popular or pet method. In my opinion, reviewers need to ask whether the statistical techniques are appropriate to answer the questions of interest. Do the data meet the assumptions necessary for the model? If there are violations are they likely to lead to biased inference? Let’s be honest, no model is perfect and there are always potential violations of assumptions. Data are samples from reality and a statistical model creates a representation of this reality from the data. The old adage, “All models are wrong, but some are useful,” is important for reviewers to remember. The questions are does the model answer the question of interest (not the question the reviewer wished was asked), is the question interesting, and was the data collected in an appropriate manner to fit with the model.

Last year I had a manuscript rejected primarily because I did not use a hierarchical model to account for detection probability when analyzing count data. In my opinion, the reviewer was way over staunch in requiring a specific type of analysis. The worst part, is it seemed like the reviewer didn’t have extensive experience with the method. The reviewer actually wrote,

They present estimates based on raw counts, rather than corrected totals. Given that they cite one manuscript, McKenny et al., that estimated detection probability and abundance for the same taxa evaluated in the current manuscript, I was puzzled by this decision. The literature base on this issue is large and growing rapidly, and I cannot recommend publication of a paper that uses naïve count data from a descriptive study to support management implications. (emphasis mine)

I’m not alone in having publications rejected on this account. I personally know of numerous manuscripts shot down for this very reason. After such a hardline statement, this reviewer goes on to say,

The sampling design had the necessary temporal and spatial replication of sample plots to use estimators for unmarked animals. The R package ‘unmarked’ provides code to analyze these data following Royle, J. A. 2004. N-mixture models for estimating population size from spatially replicated counts. Biometrics 60:108-115.

That would seem reasonable except that our study had 8 samples in each of 2 years at ~50 sites in 6 different habitats. That would suggest there was sufficient spatial and temporal replication to use Royle’s N-mixture model. HOWEVER, the N-mixture model has the assumption that the temporal replication all occur with population closure (no changes in abundance through births, deaths, immigration, and emigration). Clearly, 2 years of count data are going to violate this assumption. The N-mixture model would be an inappropriate choice for this data. Even if the 2 years were analyzed separately it would violate this assumption because the data were collected biweekly from May – October each year (and eggs hatch in June – September).

Recently, Dail and Madsen (2011; Biometics) developed a generalized form of the N-mixture model that works for open populations. This model might work for this data but in my experience the Dail-Madsen model requires a huge number of spatial replicates. All of these hierarchical models accounting for detection tend to be quite sensitive to spatial replication (more than temporal), low detection probability (common with terrestrial salamanders which were the focus of the study), and variation in detection not well modeled with covariates. Additionally, the Dail-Madsen model was only published a few months before my submission and hadn’t come out when I analyzed the data, plus the reviewer did not mention it. Given the lack of time for people to become aware of the model and lack of rigorous testing of the model, it would seem insane to require it be used for publication. To be fair, I believe Marc Kery did have a variation of the N-mixture model that allowed for population change (Kery et al. 2009).

So if I can’t use the N-mixture model because of extreme violations of model assumptions and the data are insufficient for the Dail-Madsen model, what was I supposed to do with this study? The associate editor rejected the paper without chance for rebuttal. It was a decent management journal, but certainly not Science or even Ecology or Conservation Biology. The data had been collected in 1999-2000 before most of these hierarchical detection models had been invented. They’ve unfortunately been sitting in a drawer for too long. Had they been published in 2001-2002, no one would have questioned this and it would have gotten favorable reviews. The data were collected quite well (I didn’t collect them, so it’s not bragging) and the results are extremely clear. I’m not saying the detection isn’t important to thing about, but in this case even highly biased detection wouldn’t change the story, just the magnitude of the already very large effect. There has recently been good discussion over the importance of accounting for detection and how well these model actually parse abundance/occupancy and detection, so I won’t rehash it too much here. See Brian McGill’s posts on Statistical Machismo and the plethora of thoughtful comments here and here.

Based on this one reviewer’s hardline comments and the associate editor’s decision to reject it outright, it seems like they are suggesting that this data reside in a drawer forever (if it can’t be used with an N-mix or Dail-Madsen model). With that mindset, all papers using count data published before ~2002-2004 should be ignored and most data collected before then should be thrown out to create more server space. This would be a real shame for long term dataset of which there are too few in ecology! This idea of hierarchical detection model or no publication seems like a hypercritical perspective and review. I’m still working on reanalysis and revision to send to another journal. We’ll see what happens with it in the future and if it ever gets published I’ll post a paper summary on this blog. If I don’t use a hierarchical detection model, then I am lumping abundance processes with detection processes and that should be acknowledged. It adds uncertainty to the inference about abundance, but given the magnitude of the differences among habitats and knowledge of the system, it’s hard to imagine it changing the management implications of the study at all.

My point in all of this is there is no statistical panacea. I think hierarchical models are great and in fact I spend most of my days running various forms of these models. However, I don’t think they solve all problems and they aren’t the right tool for every job. I think most current studies where there is a slight chance of detection bias should be designed to account for it, but that doesn’t mean that all studies are worthless if they don’t use these models. These models are WAY more difficult to fit than most people realize and don’t always work. Hopefully, as science and statistics move forward in ever more complicated ways, more reviewers start to realize that there is no perfect model or method. Just asking if the methods employed are adequate to answer the question and that the inference from the statistical models accurately reflect the data and model assumptions. Just because a technique is new and sexy doesn’t mean that everyone needs to use it in every study.


Get every new post delivered to your Inbox.

Join 75 other followers