Blog Archives

Mapping Abundance in Streams in R

I haven’t worked with raster or spatial polygon data in R much before, but I want to create maps to show the results of a spatiotemporal model. Specifically, I want to depict the changes in abundance of fish in a stream network over time and space. I decided to play around with various packages for manipulating spatial data and then using base `plot` and `ggplot2` functions to make the maps. Rather than post the code and output here, it was easier to publish it to RPubs and link to it:

Fish abundance in a stream network

Fish abundance in a stream network

The scale, colors, projection, and background will have to be adjusted based on the purpose of the map, but hopefully the code and explanations will help people get started. This was intended as a learning tool for me, rather than strictly as a tutorial, but I figured I would share in case it proved useful to others.

Here’s some links to additional information I found useful along the way:

Add a scale bar

Add a north arrow and scale bar

R GIS Tutorial

R CRS System by NCEAS

ggplot2 mapping in R by Zev Ross (he has tons of great info on his site!!!)

ggplot2 cheat sheet

raserVis – haven’t tried yet but looks very useful

Raster data in R by NEON


Review of an Information Theoretic Approach to Ecology and Evolution

There are always those papers that you mean to read but just sit on your desktop or in your To_Read folder forever. Grueber et al. 2011 was one of those for me and I finally got around to reading it.

Grueber, Nakagawa, Laws, and Jamieson. 2011. Multimodel inference in ecology and evolution: challenges and solutions. Journal of Evolutionary Biology. 24:699-711.

The Information Theoretic (IT) approach, and AIC in particular, has become so pervasive in ecology that feels almost compulsory for studies outside very controlled laboratory experiments. However, it seems like many authors and reviewers either don’t believe in the value of other approaches or don’t understand the limitations of it’s use, especially with respect to complex statistical modeling.

I constantly struggle both philosophically and practically with how to best approach analyses of field surveys. I want to develop models that both explain observed patterns for increased understanding and have the power to predict unobserved points in time and space (i.e. sites not surveyed and future conditions of monitored and unmonitored sites). I frequently use linear and generalized linear mixed models as well as more complex hierarchical models. These are areas of rapid statistical development, so getting appropriately fitting models with sparse ecological data adds to the practical challenges, regardless of philosophical desires.

The general idea in an IT approach is to balance model fit with model complexity. Generally, a more complex model will describe the data better (high fit, high complexity). However, if you describe the data perfectly, it is unlikely to have good predictive power because some of the model parameters will only apply to the data collected at those locations at those times. Hence the desire to have a simpler model that still describes the data well. Despite the desire for predictive models, few ecologists actually test the predictive power of their models. I won’t say more about that now, but will refer the reader to posts by Brian McGill on the Dynamic Ecology blog for thoughtful discussion of this topic.

The balancing of fit and complexity sounds great but it is much more difficult in practice (as are most things). When too many models are compared, especially without a priori formulation, it is common to get spurious results. If people are going to try every combination of variables from the most complex global model then model average, I hope the resulting model is validated on independent data to ensure that the model is useful. An extreme alternative to this approach is one often taken by Bayesians. Just develop a sensible biological model and estimate the parameters. Don’t worry about the “best” model but rather about the parameter estimates and their uncertainties.

For those interested in an IT approach and want to learn more about the practical uses, Grueber et al. (2011) provide a great resource. I can’t believe I waited this long to read it. Box 1 provides an nice overview of the different Information Criteria (e.g. AIC, BIC, AICc, DIC). Table 2 is really a great overview of the practical issues and tentative solutions. They point out that,

Translating biological hypotheses into statistical models is likely to remain the most difficult aspect of using an IT approach…because of the complexity of biological processes.

I agree but also think this is the most important part of the process. Significant time should be spent on this step and it’s generally helpful to talk through the hypotheses with colleagues (perfect for lab meetings). Model averaging should be avoided when completing models cannot be combined to form a biologically relevant model.

One interesting point the authors make is to, “Always fit [random] slope if possible, otherwise use just the intercept”. I would love to hear what people think about this. In the past, I had avoided fitting many random slopes to avoid model complexity and because I often had trouble thinking how the effect would vary by subject (often survey site). However, more recently, I’ve been including random slopes to differentiate variation in the effect of a parameter from uncertainty (SE) of the effect (fixed effect coefficient). The authors point out that including random slopes reduces the incidence of Type I and II errors and reduces the chance of overconfident estimates.

Another interesting point is whether to do exploratory plots or not. The authors are in favor of it, but note that IT advocates such as Burnham and Anderson (2002: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach)
oppose any data exploration because it results in post hoc creation of statistical models and therefore associated biological processes. I generally do a fair number of exploratory plotting.

Grueber and colleagues recommend generating a model set from all possible submodels of the global model, assuming that all the submodels are biological plausible. After this though, they provide a large number of caveats and cautions. This remains an area in need of further research.

I am curious about how they would generate all submodels with inclusion of random slopes whenever possible. I have generally followed Zuur and colleagues recommendations of putting in all fixed effects parameters (most complex, over-parametrized global model) then select random effects via AIC holding the fixed effects constant. Then reduce the complexity of the fixed effects, although this method limits the fixed effects that can be removed to those without random slopes. It can also be a problem if the global model has convergence problems. I’d love to hear how you proceed with model selection in mixed and other hierarchical models. Let me know in the comments.

Some take home points:

  • Use a 10:1 subject-to-predictor ratio in multiple regression
  • Generally avoid retaining a focal parameter of interest in all models, especially when interested in model averaging.
  • Recommend model averaging but not the full set of models. Their tentative solution to which to average is to exclude models from the set that are more complex versions of those with lower AICc, but with caution.
  • The zero method should be used for model averaging when the aim of the study is to determine which factors have the strongest effect on the response variable.
  • Recommend standardizing input variables with a mean of zero and a standard deviation of 0.5 (traditionally 1) to allow the standardization of binary and categorical dummy variables

Overall this paper provides a great overview and good recommendations for using an Information Theoretic approach in ecology. Hopefully it also indicates that AIC isn’t perfect and doesn’t invalidate other approaches to scientific understanding in ecology. For those who use a lot of mixed models, Zuur et al. (2009) provide valuable guidance as well. Although we all want specific rules to follow, model development and selection remains nearly as much an art as a science. This paper would make great lab group reading and I hope it stimulates a healthy discussion in ecology and evolution circles.

Research Summary: Journal Impact Metrics

I recently published my first sole-author paper, which was also my first open access publication (plus first preprint). It was a fun side project unrelated to my primary research, comparing the influence of ecology journals using a variety of metrics (like the journal impact factor). The paper was published in Ideas in Ecology and Evolution, a journal I’m really excited about. It’s a great outlet for creative ideas in the field of EcoEvo, plus they have a section on the Future of Publishing, which has explored some exceptionally innovative ideas regarding scientific publishing and peer review.


Hocking, D. J. 2013. Comparing the influence of ecology journals using citation-based indices: making sense of a multitude of metrics. Ideas in Ecology and Evolution, 6(1), 55–65. doi:10.4033/iee.v6i1.4949


Most researchers are at least moderately familiar with the Journal Impact Factor (JIF), the first and most prevalent citation-based metric of journal influence. The JIF represents the average number of citations in a given year to articles in a journal published in the previous 2 years. Despite its prevalence, the JIF has a number of serious problems such as drawing inference from the mean of a HIGHLY skewed distributions (a small minority of articles receive the vast majority of citations in any journal). Other criticisms of the JIF include an insufficient time period and bias among journals because not all articles are included in the denominator of the average, only “substantial” articles, but citations to all articles are included in the numerator. Numerous metrics have been proposed to improve upon the JIF. I compared 11 citation-based metrics for 110 ecology journals.

Journal Impact Factor (JIF), 5-year Journal Impact Factor (JIF5), Scimago Journal Report (SJR), Source-Normalized Impact per Paper (SNIP), Eigenfactor, Article Influence (AI), H-index, contemporary h-index (Hc-index), g-index, e-index, AR-index

Journal Impact Factor (JIF), 5-year Journal Impact Factor (JIF5), Scimago Journal Report (SJR), Source-Normalized Impact per Paper (SNIP), Eigenfactor, Article Influence (AI), H-index, contemporary h-index (Hc-index), g-index, e-index, AR-index

The relationship among metrics can be visualized via a plot of principal components analysis. On the left side of the plot are the metrics that are averaged per article, whereas the metrics that group on the right side of the graph are metrics that tend to be higher for journals with higher rates of publication (not explicitly on a per article basis). What is also evident from the PCA plot is that no single metric can encompass all of the multidimensional complexity of scholarly influence among journals. Different metrics can be used to understand different aspects of influence, impact, and prestige.

In addition to whether a metric is on a per-article basis, metrics split philosophically on whether they use network theory or just direct citations. The Eigenfactor, AI, and SJR use variations of the Google PageRank algorithm. This basically means that citations from highly cited journals are worth more than citations from less influential journals.

Overall, I would recommend using Article Influence (AI; available via Web of Science) or alternatively the SCImago Journal Report (SJR; available via Scopus) in place of the JIF when average article influence is of interest. The Eigenfactor is the best metric of the total influence of a journal on science. The Source-Normalized Impact per Paper (SNIP) can be especially useful when comparing journals across disparate fields of research. It corrects for differences in publishing and citation practices among fields of study.

Since review articles tend to get more citations than original research articles on average, journals that publish reviews tend have higher scores across all metrics. Therefore, it’s not surprising that the top ranked ecology journals across most metrics are Annual Review of Ecology, Evolution, and Systematics, Trends in Ecology and Evolution (TREE), and Ecology Letters. A list of some of the journal and metrics are below, but you can find much more information in the original article.


Paper Summary: Natural Disturbance and Logging Effects on Salamanders

Paper Summary:

Hocking, D.J., K.J. Babbitt, and M. Yamasaki. 2013. Comparison of Silvicultural and Natural Disturbance Effects on Terrestrial Salamanders in Northern Hardwood Forests. Biological Conservation 167:194-202. doi:

Unfortunately, this paper is behind a paywall. Please email me if you would like a copy for educational purposes.

We were interested in how red-backed salamanders respond to various logging practices compared with natural disturbance. Specifically, we compared abundance of salamanders in the two years following a major ice-storm with clearcuts, patch cuts, group cuts, single-tree selection harvests, and undisturbed forest patches in the White Mountains of New Hampshire (Northern Appalachian Mountains). The 100-year ice storm caused ~65% percent canopy loss in the effected areas. We know that clearcutting has detrimental effects on populations of woodland salamanders but the impacts of less intense harvesting and natural disturbances is less well understood.

We used transects of coverboards from 80m inside each forest patch extending to 80m outside each patch into the surround, undisturbed forest. Repeated counts of salamanders under these coverboards allowed us to employ a Dail-Madsen open population model to estimate abundance in each treatment, while accounting for imperfect detection. The results were quite clear as demonstrated in this figure:

Abundance Plot by Treatment

There were slightly fewer salamanders in the ice-storm damaged sites compared with undisturbed reference sites. The single-tree selection sites were most similar to the ice-storm damage sites. The group cut, patch cut, and clearcut didn’t differ from each other and all had ~88% fewer salamanders compared with reference sites.

In addition to comparing natural and anthropogenic disturbances, we were interested in examining how salamanders respond along the edge of even-aged harvests. Wind, solar exposure, and similar factors are altered in the forest edge adjacent to harvested areas. This can result in salamander abundance being reduced in forest edges around clearcuts. Previous researchers have used nonparametric estimates of edge effects. A limitation of this methods is that effects cannot be projected well across the landscape. These methods are also unable to account for imperfect detection. We developed a method to model edge effects as a logistic function while accounting for imperfect detection. As with the treatment effects, the results are quite clear with very few salamanders in the center of the even-aged harvests, a gradual increase in abundance near the forest edge, increasingly more salamanders in the forest moving away from the edge, and eventually leveling off at carrying capacity. In this case, red-backed salamander abundance reached 95% of carrying capacity 34 m into the surrounding forest. As the model is parametric, predictions can be projected across landscapes. The equation can be used in GIS and land managers can predict the total number of salamanders that would be lost from a landscape given a variety of alternative timber harvest plans.

Hopefully other researchers find this method useful and apply it for a variety of taxa. It could also be incorporated into ArcGIS or QGIS toolboxes/plugins as a tool for land managers. You can read our paper for more details if you’re interested. In addition to methodological details there is more information on environmental factors that affect detection and abundance of salamanders in this landscape.

Edge Effects

In Praise of Exploratory Statistics

If you haven’t seen it, you should definitely check out the latest post by Brian McGill at Dynamic Ecology. It’s a great post on the use of exporatory statistics. While it may be impossible to get an NSF grant with an analysis framed in terms of exploratory statistics, reviewers should definitely be more open to their appropriate use. Is is wrong to force authors to frame exploratory analyses in a hypothetico-deductive framework after the fact. Brian suggests that there are three forms of scientific analysis:

  1. Hypothesis testing
  2. Prediction
  3. Exploratory

and only one of these can be used for a single data set. My favorite quote is,

“If exploratory statistics weren’t treated like the crazy uncle nobody wants to talk about and everybody is embarrassed to admit being related to, science would be much better off.”

There’s lots of good stuff in his post, so I recommend spending 10 minutes of your day to read through it.


Get every new post delivered to your Inbox.

Join 75 other followers