Blog Archives

Paper Summary: Natural Disturbance and Logging Effects on Salamanders

Paper Summary:

Hocking, D.J., K.J. Babbitt, and M. Yamasaki. 2013. Comparison of Silvicultural and Natural Disturbance Effects on Terrestrial Salamanders in Northern Hardwood Forests. Biological Conservation 167:194-202. doi: http://dx.doi.org/10.1016/j.biocon.2013.08.006

Unfortunately, this paper is behind a paywall. Please email me if you would like a copy for educational purposes.

We were interested in how red-backed salamanders respond to various logging practices compared with natural disturbance. Specifically, we compared abundance of salamanders in the two years following a major ice-storm with clearcuts, patch cuts, group cuts, single-tree selection harvests, and undisturbed forest patches in the White Mountains of New Hampshire (Northern Appalachian Mountains). The 100-year ice storm caused ~65% percent canopy loss in the effected areas. We know that clearcutting has detrimental effects on populations of woodland salamanders but the impacts of less intense harvesting and natural disturbances is less well understood.

We used transects of coverboards from 80m inside each forest patch extending to 80m outside each patch into the surround, undisturbed forest. Repeated counts of salamanders under these coverboards allowed us to employ a Dail-Madsen open population model to estimate abundance in each treatment, while accounting for imperfect detection. The results were quite clear as demonstrated in this figure:

Abundance Plot by Treatment

There were slightly fewer salamanders in the ice-storm damaged sites compared with undisturbed reference sites. The single-tree selection sites were most similar to the ice-storm damage sites. The group cut, patch cut, and clearcut didn’t differ from each other and all had ~88% fewer salamanders compared with reference sites.

In addition to comparing natural and anthropogenic disturbances, we were interested in examining how salamanders respond along the edge of even-aged harvests. Wind, solar exposure, and similar factors are altered in the forest edge adjacent to harvested areas. This can result in salamander abundance being reduced in forest edges around clearcuts. Previous researchers have used nonparametric estimates of edge effects. A limitation of this methods is that effects cannot be projected well across the landscape. These methods are also unable to account for imperfect detection. We developed a method to model edge effects as a logistic function while accounting for imperfect detection. As with the treatment effects, the results are quite clear with very few salamanders in the center of the even-aged harvests, a gradual increase in abundance near the forest edge, increasingly more salamanders in the forest moving away from the edge, and eventually leveling off at carrying capacity. In this case, red-backed salamander abundance reached 95% of carrying capacity 34 m into the surrounding forest. As the model is parametric, predictions can be projected across landscapes. The equation can be used in GIS and land managers can predict the total number of salamanders that would be lost from a landscape given a variety of alternative timber harvest plans.

Hopefully other researchers find this method useful and apply it for a variety of taxa. It could also be incorporated into ArcGIS or QGIS toolboxes/plugins as a tool for land managers. You can read our paper for more details if you’re interested. In addition to methodological details there is more information on environmental factors that affect detection and abundance of salamanders in this landscape.

Edge Effects

In Praise of Exploratory Statistics

If you haven’t seen it, you should definitely check out the latest post by Brian McGill at Dynamic Ecology. It’s a great post on the use of exporatory statistics. While it may be impossible to get an NSF grant with an analysis framed in terms of exploratory statistics, reviewers should definitely be more open to their appropriate use. Is is wrong to force authors to frame exploratory analyses in a hypothetico-deductive framework after the fact. Brian suggests that there are three forms of scientific analysis:

  1. Hypothesis testing
  2. Prediction
  3. Exploratory

and only one of these can be used for a single data set. My favorite quote is,

“If exploratory statistics weren’t treated like the crazy uncle nobody wants to talk about and everybody is embarrassed to admit being related to, science would be much better off.”

There’s lots of good stuff in his post, so I recommend spending 10 minutes of your day to read through it.

No Statistical Panacea, Hierarchical or Otherwise

Everyone in academia knows how painful the peer-review publication process can be. It’s a lot like Democracy, in that it’s the worst system ever invented, except for all the others. The peer-review process does a fair job at promoting good science overall, but it’s far from perfect. Sure anyone can point out a hundred flaws in the system, but I’m just going to focus on one aspect that has been bothering me particularly and has the potential to change: complicated statistical demands.

I have found that reviewers frequently require the data to be reanalyzed with a particularly popular or pet method. In my opinion, reviewers need to ask whether the statistical techniques are appropriate to answer the questions of interest. Do the data meet the assumptions necessary for the model? If there are violations are they likely to lead to biased inference? Let’s be honest, no model is perfect and there are always potential violations of assumptions. Data are samples from reality and a statistical model creates a representation of this reality from the data. The old adage, “All models are wrong, but some are useful,” is important for reviewers to remember. The questions are does the model answer the question of interest (not the question the reviewer wished was asked), is the question interesting, and was the data collected in an appropriate manner to fit with the model.

Last year I had a manuscript rejected primarily because I did not use a hierarchical model to account for detection probability when analyzing count data. In my opinion, the reviewer was way over staunch in requiring a specific type of analysis. The worst part, is it seemed like the reviewer didn’t have extensive experience with the method. The reviewer actually wrote,

They present estimates based on raw counts, rather than corrected totals. Given that they cite one manuscript, McKenny et al., that estimated detection probability and abundance for the same taxa evaluated in the current manuscript, I was puzzled by this decision. The literature base on this issue is large and growing rapidly, and I cannot recommend publication of a paper that uses naïve count data from a descriptive study to support management implications. (emphasis mine)

I’m not alone in having publications rejected on this account. I personally know of numerous manuscripts shot down for this very reason. After such a hardline statement, this reviewer goes on to say,

The sampling design had the necessary temporal and spatial replication of sample plots to use estimators for unmarked animals. The R package ‘unmarked’ provides code to analyze these data following Royle, J. A. 2004. N-mixture models for estimating population size from spatially replicated counts. Biometrics 60:108-115.

That would seem reasonable except that our study had 8 samples in each of 2 years at ~50 sites in 6 different habitats. That would suggest there was sufficient spatial and temporal replication to use Royle’s N-mixture model. HOWEVER, the N-mixture model has the assumption that the temporal replication all occur with population closure (no changes in abundance through births, deaths, immigration, and emigration). Clearly, 2 years of count data are going to violate this assumption. The N-mixture model would be an inappropriate choice for this data. Even if the 2 years were analyzed separately it would violate this assumption because the data were collected biweekly from May – October each year (and eggs hatch in June – September).

Recently, Dail and Madsen (2011; Biometics) developed a generalized form of the N-mixture model that works for open populations. This model might work for this data but in my experience the Dail-Madsen model requires a huge number of spatial replicates. All of these hierarchical models accounting for detection tend to be quite sensitive to spatial replication (more than temporal), low detection probability (common with terrestrial salamanders which were the focus of the study), and variation in detection not well modeled with covariates. Additionally, the Dail-Madsen model was only published a few months before my submission and hadn’t come out when I analyzed the data, plus the reviewer did not mention it. Given the lack of time for people to become aware of the model and lack of rigorous testing of the model, it would seem insane to require it be used for publication. To be fair, I believe Marc Kery did have a variation of the N-mixture model that allowed for population change (Kery et al. 2009).

So if I can’t use the N-mixture model because of extreme violations of model assumptions and the data are insufficient for the Dail-Madsen model, what was I supposed to do with this study? The associate editor rejected the paper without chance for rebuttal. It was a decent management journal, but certainly not Science or even Ecology or Conservation Biology. The data had been collected in 1999-2000 before most of these hierarchical detection models had been invented. They’ve unfortunately been sitting in a drawer for too long. Had they been published in 2001-2002, no one would have questioned this and it would have gotten favorable reviews. The data were collected quite well (I didn’t collect them, so it’s not bragging) and the results are extremely clear. I’m not saying the detection isn’t important to thing about, but in this case even highly biased detection wouldn’t change the story, just the magnitude of the already very large effect. There has recently been good discussion over the importance of accounting for detection and how well these model actually parse abundance/occupancy and detection, so I won’t rehash it too much here. See Brian McGill’s posts on Statistical Machismo and the plethora of thoughtful comments here and here.

Based on this one reviewer’s hardline comments and the associate editor’s decision to reject it outright, it seems like they are suggesting that this data reside in a drawer forever (if it can’t be used with an N-mix or Dail-Madsen model). With that mindset, all papers using count data published before ~2002-2004 should be ignored and most data collected before then should be thrown out to create more server space. This would be a real shame for long term dataset of which there are too few in ecology! This idea of hierarchical detection model or no publication seems like a hypercritical perspective and review. I’m still working on reanalysis and revision to send to another journal. We’ll see what happens with it in the future and if it ever gets published I’ll post a paper summary on this blog. If I don’t use a hierarchical detection model, then I am lumping abundance processes with detection processes and that should be acknowledged. It adds uncertainty to the inference about abundance, but given the magnitude of the differences among habitats and knowledge of the system, it’s hard to imagine it changing the management implications of the study at all.

My point in all of this is there is no statistical panacea. I think hierarchical models are great and in fact I spend most of my days running various forms of these models. However, I don’t think they solve all problems and they aren’t the right tool for every job. I think most current studies where there is a slight chance of detection bias should be designed to account for it, but that doesn’t mean that all studies are worthless if they don’t use these models. These models are WAY more difficult to fit than most people realize and don’t always work. Hopefully, as science and statistics move forward in ever more complicated ways, more reviewers start to realize that there is no perfect model or method. Just asking if the methods employed are adequate to answer the question and that the inference from the statistical models accurately reflect the data and model assumptions. Just because a technique is new and sexy doesn’t mean that everyone needs to use it in every study.

R script to calculate QIC for Generalized Estimating Equation (GEE) Model Selection

[UPDATE: IMPROVED CODE AND EXTENSIONS ARE NOW AVAILABLE ON https://github.com/djhocking/qicpack INCLUDING AS AN R PACKAGE]

Generalized Estimating Equations (GEE) can be used to analyze longitudinal count data; that is, repeated counts taken from the same subject or site. This is often referred to as repeated measures data, but longitudinal data often has more repeated observations. Longitudinal data arises from studies in virtually all branches of science. In psychology or medicine, repeated measurements are taken on the same patients over time. In sociology, schools or other social distinct groups are observed over time. In my field, ecology, we frequently record data from the same plants or animals repeated over time. Furthermore, the repeated measures don’t have to be separated in time. A researcher could take multiple tissue samples from the same subject at a given time. I often repeatedly visit the same field sites (e.g. same patch of forest) over time. If the data are discrete counts of things (e.g. number of red blood cells, number of acorns, number of frogs), the data will generally follow a Poisson distribution.

Longitudinal count data, following a Poisson distribution, can be analyzed with Generalized Linear Mixed Models (GLMM) or with GEE. I won’t get into the computational or philosophical differences between conditional, subject-specific estimates associated with GLMM and marginal, population-level estimates obtained by GEE in this post. However, if you decide that GEE is right for you (I have a paper in preparation comparing GLMM and GEE), you may also want to compare multiple GEE models. Unlike GLMM, GEE does not use full likelihood estimates, but rather, relies on a quasi-likelihood function. Therefore, the popular AIC approach to model selection don’t apply to GEE models. Luckily, Pan (2001) developed an equivalent QIC for model comparison. Like AIC, it balances the model fit with model complexity to pick the most parsimonious model.

Unfortunately, there is currently no QIC package in R for GEE models. geepack is a popular R package for GEE analysis. So, I wrote the short R script below to calculate Pan’s QIC statistic from the output of a GEE model run in geepack using the geese function. It currently employs the Moore-Penrose Generalized Matrix Inverse through the MASS package. I left in my original code using the identity matrix but it is preceded by a pound sign so it doesn’t run. [edition: April 10, 2012] The input for the QIC function needs to come from the geeglm function (as opposed to “geese”) within geepack.

I hope you find it useful. I’m still fairly new to R and this is one of my first custom functions, so let me know if you have problems using it or if there are places it can be improved. If you decide to use this for analysis in a publication, please let me know just for my own curiosity (and ego boost!).

###################################################################################### QIC for GEE models# Daniel J. Hocking# 07 February 2012# Refs:
  # Pan (2001)
  # Liang and Zeger (1986)
  # Zeger and Liang (1986)
  # Hardin and Hilbe (2003)
  # Dornmann et al 2007
  # # http://www.unc.edu/courses/2010spring/ecol/562/001/docs/lectures/lecture14.htm###################################################################################### Poisson QIC for geese{geepack} output# Ref: Pan (2001)
QIC.pois.geeglm <-function(model.R, model.indep){
  library(MASS)
  # Fitted and observed values for quasi likelihood
  mu.R <- model.R$fitted.values
  # alt: X <- model.matrix(model.R)
      #  names(model.R$coefficients) <- NULL
      #  beta.R <- model.R$coefficients
      #  mu.R <- exp(X %*% beta.R)
  y <- model.R$y

  # Quasi Likelihood for Poisson
  quasi.R <- sum((y*log(mu.R))- mu.R)# poisson()$dev.resids - scale and weights = 1
 
  # Trace Term (penalty for model complexity)
  AIinverse<- ginv(model.indep$geese$vbeta.naiv)# Omega-hat(I) via Moore-Penrose 
generalized inverse of a matrix in MASS package
  # Alt: AIinverse <- solve(model.indep$geese$vbeta.naiv) # solve via identity
  Vr<- model.R$geese$vbeta
  trace.R <- sum(diag(AIinverse%*%Vr))
  px <- length(mu.R)# number non-redunant columns in design matrix

  # QIC
  QIC <-(-2)*quasi.R +2*trace.R
  QICu<-(-2)*quasi.R +2*px    # Approximation assuming model structured correctly 
  output <- c(QIC,QICu, quasi.R, trace.R, px)
  names(output)<- c('QIC','QICu','Quasi Lik','Trace','px')
  output}

Model Validation: Interpreting Residual Plots

When conducting any statistical analysis it is important to evaluate how well the model fits the data and that the data meet the assumptions of the model. There are numerous ways to do this and a variety of statistical tests to evaluate deviations from model assumptions. However, there is little general acceptance of any of the statistical tests. Generally statisticians (which I am not but I do my best impression) examine various diagnostic plots after running their regression models. There are a number of good sources of information on how to do this. My recommendation is Fox and Weisberg’s An R Companion to Applied Regression (Chp 6). You can refer to Fox’s book, Applied Regression Analysis and Generalized Linear Models for the theory and details behind these plots but the corresponding R book is more of the “how to” guide. A very brief but good introduction to checking linear model assumptions can be found here.

The point of this post isn’t to go over the details or theory but rather discuss one of the challenges that I and others have had with interpreting these diagnostic plots. Without going into the differences between standardized, studentized, Pearson’s and other residuals, I will say that most of the model validation centers around the residuals (essentially the distance of the data points from the fitted regression line). Here is an example from Zuur and Colleagues’ excellent book, Mixed Effects Models and Extensions in Ecology with R:

So these residuals appear exhibit homogeneity, normality, and independence. Those are pretty clear, although I’m not sure if the variation in residuals associated with the predictor (independent) variable Month is a problem. This might be a problem with heterogeneity. Most books just show a few examples like this and then residuals with clear patterning, most often increasing residual values with increasing fitted values (i.e. large values in the response/dependent variable results in greater variation, which is often correct with a log transformation). A good example of this can be see in (d) below in fitted vs. residuals plots (like top left plot in figure above).

These are the type of idealized examples usually shown. I think it’s important to show these perfect examples of problems but I wish I could get expert opinions on more subtle, realistic examples. These figures are often challenging to interpret because the density of points also changes along the x-axis. I don’t have a good example of this but will add one in when I get one. Instead I will show some diagnostic plots that I’ve generated as part of a recent attempt to fit a Generalized Linear Mixed Model (GLMM) to problematic count data.

The assumption of normality (upper left) is probably sufficient. However, the plot of the fitted vs. residuals (upper right) seems to have more variation at mid-level values compared with the low or high fitted values. Is this patten enough to be problematic and suggest a poor model fit? Is it driven by greater numbers of points at mid-level fitted values? I’m not sure. The diagonal dense line of points is generated by the large number of zeros in the dataset. My model does seem to have some problem fitting the zeros. I have two random effects in my GLMM. The residuals across plots (5 independent sites/subjects on which the data was repeatedly measured – salamanders were counted on the same 5 plots repeatedly over 4 years) don’t show any pattern. However, there is heterogeneity in residuals among years (bottom right). This isn’t surprising given that I collected much more data over a greater range of conditions in some years. This is a problem for the model and this variation will need to be modeled better.

So I refit the model and came up with these plots (different plots for further discussion rather than direct comparison):

Here you can see considerable variation from normality for the overall model (upper left) but okay normality within plots (lower right). The upper right plot is an okay example of what I was talking about with changes in density making interpretation difficult. There are far more points at lower values and a sparsity of points are very high fitted values. The eye is often pulled in the direction of the few points on the right creating difficult in interpretation. To help with this I like to add a loess smoother or smoothing spline (solid line) and a horizontal line at zero (broken line). The smoothing line should be approximately straight and horizontal around zero. Basically it should overlay the horizontal zero line. Here’s the code to do it in R for a fitted linear mixed model (lme1):
plot(fitted(lme1), residuals(lme1),
  xlab = “Fitted Values”, ylab = “Residuals”)
  abline(h=0, lty=2)
  lines(smooth.spline(fitted(lme1), residuals(lme1)))

This also helps determine if the points are symmetrical around zero. I often also find it useful to plot the absolute value of the residuals with the fitted values. This helps visualize if there is a trend in direction (bias). It can also help to better see changes in spread of the residuals indicating heterogeneity. The bias can be detected with a sloping loess or smooth spline. In the lower left plot, you can see little evidence of bias but some evidence of heterogeneity (change in spread of points). Again, I an not sure if this is bad enough to invalidate the model but in combination with the deviation from normality I would reject the fit of this model.

In a mixed model it can be important to look at variation across the values of the random effects. In my case here is an example of fitted vs. residuals for each of the plots (random sites/subjects). I used the following code, which takes advantage of the lattice package in R.
# Check for residual pattern within groups and difference between groups     
xyplot(residuals(glmm1) ~ fitted(glmm1) | Count$plot, main = “glmm1 – full model by plot”,
  panel=function(x, y){
    panel.xyplot(x, y)
    panel.loess(x, y, span = 0.75)
    panel.lmline(x, y, lty = 2)  # Least squares broken line
  }
)

And here is another way to visualize a mixed model:

You can see that the variation in the two random effects (Plot and Year) is much better in this model but there are problems with normality and potentially heterogeneity. Since violations of normality are off less concern than the other assumptions, I wonder if this model is completely invalid or if I could make some inference from it. I don’t know and would welcome expert opinion.

Regardless, this model was fit using a poisson GLMM and the deviance divided by the residual degrees of freedom (df) was 5.13, which is much greater than 1, indicating overdispersion. Therefore, I tried to fit the regression using a negative binomial distribution:

# Using glmmPQL via MASS package

library(MASS)

#recommended to run model first as non-mixed to get a starting value for the theta estimate:

#negbin

glmNB1 <- glm.nb(count ~ cday +cday2 + cSoilT + cSoilT2 + cRainAmt24 + cRainAmt242 + RHpct + soak24 + windspeed, data = Count, na.action = na.omit)

summary(glmNB1)

#anova(glmNB1)

#plot(glmNB1)

# Now run full GLMM with initial theta starting point from glm

glmmPQLnb1 <- glmmPQL(count ~ cday +cday2 + cSoilT + cSoilT2 + cRainAmt24 + cRainAmt242 + RHpct + soak24 + windspeed, random = list(~1 | plot, ~1 | year), data = Count, family = negative.binomial(theta = 1.480, link = log), na.action = na.exclude)

Unfortunately, I got the following validation plots:

Clearly, this model doesn’t work for the data. It is quite surprising given the fit of the poisson and that the negative binomial is a more general distribution than the poisson and handles overdispersed count data well usually. I’m not sure what the problem is in this case.

Next I tried to run the model as if all observations were random:

<!– /* Font Definitions */ @font-face {font-family:Cambria; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:Cambria; mso-fareast-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.Section1 {page:Section1;} — glmmObs1 <- lmer(count ~ cday +cday2 + cSoilT + cSoilT2 + cRainAmt24 + cRainAmt242 + RHpct + soak24 + windspeed + (1 | plot) + (1 | year) + (1 | obs), data = Count, family = poisson) Again I end up with more problematic validation/diagnostic plots:

So that’s about it for now. Hopefully this post helps some people with model validation and interpretation of fitted vs. residual plots. I would love to hear opinions regarding interpretation of residuals and when some pattern is too much and when it is acceptable. Let me know if you have examples of other more subtle residual plots.

Happy coding and may all your analyses run smoothly and provide clear interpretations!

Follow

Get every new post delivered to your Inbox.

Join 40 other followers