Blog Archives

Year in Review: 2014

This year has flown by. I can’t believe it will be 2015 before the week is over. Overall, 2014 was a good year. I published 2 papers, have 2 in press, and 2 more in review. The two published papers have already received 3 citations each with only 1 of the 6 being a self citation. That bodes well for the future, hopefully.

Milanovich, J.R., D.J. Hocking, W.E. Peterman, and J.A. Crawford. In Press. Effective use of trails for assessing terrestrial salamander abundance and detection:  A case study at Great Smoky Mountains National Park. Natural Areas Journal.

Anderson, T. L., D. J. Hocking, C. A. Conner, J. E. Earl, E. B. Harper, M. S. Osbourn, W. E. Peterman, T. A. G. Rittenhouse, and R. D. Semlitsch. In Press. Abundance and phenology patterns of two pond-breeding salamanders determine species interactions in natural populations. Oecologia. DOI: 10.1007/s00442-014-3151-z

Hocking, D. J. and K. J. Babbitt. 2014. Amphibian Contributions to Ecosystem Services. Herpetological Conservation and Biology. 9(1): 1-17. (Open Access)

Hocking, D. J. and K. J. Babbitt. 2014. The role of red-backed salamanders on ecosystem functions. PLoS ONE 9(1): e86854. DOI:10.1371/journal.pone.0086854 (Open Access).

I also received a USGS Mendenhall Postdoctoral Fellowship. This has been great on many levels. I enjoy the projects I’m working on and really like all the people I work with. In general, 2014 has been a year of building and networking. I have met well over 100 new people related to my work (state, federal, environmental NGO, and academic). I’m collaborating with 4 new PIs and more than a dozen people I hadn’t met before this year. I spent over 4 weeks traveling to meet with collaborators and attend meetings and participated in more webinars than in the rest of my life combined. The travel was generally quite nice though. A week in Fort Collins at the USGS Powell Center working on integrated (joint) population models to combine occupancy, count, and capture-recapture data. Then a week in Glacier National Park discussing stream temperature modeling and fish abundance and distributions. I even had time to take an epic bike ride up the Going-to-the-Sun Road.

istock_000005622581mediumI also got to hang out with bright young climate scientists at the Northeast Climate Science Center retreat in the Missouri Ozarks. It was very cool to hear about all the different projects. The neatest one, or at least the one I had never thought about before, was relating climate warming to airport takeoffs. Warmer air is less dense so a plane needs to be lighter or have a longer runway to take off. This could be incredibly important for many airports, especially in developed areas like the northeast US where runways can’t easily be extended (pesky sky scrapers and oceans get in the way).

To round out my travel, I presented at the American Fisheries Society conference in Quebec City (great city) and at the Northeast Fish and Wildlife Society meeting in Portland, ME. I gave two talks at the AFS meeting, one on the effect of repeated sampling (1 vs 3 pass) on brook trout abundance and detection estimates and the other on combining information from occupancy, abundance, and capture-recapture (CJS) analyses for understanding climate and land-use effects on brook trout.

In addition to networking, it was a learning and building year in other ways too. I developed a daily stream temperature model for the northeastern US that used nested random effects and spatially-explicit autoregressive residual effects. Working with millions of temperature records across thousands of sites and predicting to 1.8 billion points required learning a lot about handling big data. I also had to learn how to do this in a way that would work within a web-based decision support tool that our research group is developing. It certainly slowed the progress of manuscripts but I am very excited about the potential utility for state and federal agencies and anyone else interested in headwater stream management. The tool will allow managers and scientists to not only visualize all the data from the region, but also the model predictions for sites and years without data. There will be slider bars that allow managers to do on-the-fly and formal scenario testing (e.g. change forest cover, impervious surface, etc. and see what happens to temperatures and fish populations). It will also be used in a Structured Decision Making process with managers that I’m involved with. The web tool is undergoing internal testing now but I will definitely share it on this blog when ready for use.

On a personal level things have been challenging at times but life is good. I spend the work week away from my family with is hard (but productive) and results in a huge amount of driving. I also retired from competitive running after 22 years. My body just couldn’t handle training at an elite level anymore. After 3 stress fractures, a collapsed arch, arthritic hip impingement, damaged achilles tendons, and 2 knee/hamstring surgeries, I decided enough was enough. I couldn’t give up competing in athletic events though, so I took up cycling which apart from falling is much easier on the body. I did 2 road races and moved from Cat5 to Cat4 and ~12 cyclocross races and moved from Cat5 to Cat3. Cycling has been a fun new challenge but I don’t feel the need to take it seriously as I did with running. It’s also a great way to see new areas!

Finally, although I haven’t blogged that much this year, it has been a good year for the blog. The wordpress auto-summary says,

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 28,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 10 sold-out performances for that many people to see it.

The busiest day of the year was December 3rd with 264 views. The most popular post that day was Lags and Moving Means in dplyr.

That’s almost twice as many views as last year. Hopefully some of them are actual people and not bots. My most popular posts were similar to those last year, which is nice to see that people are still finding them and finding them useful:

You can find the full blog report here.

Wishing Everyone the Best in 2015

Happy New Year!

No Statistical Panacea, Hierarchical or Otherwise

Everyone in academia knows how painful the peer-review publication process can be. It’s a lot like Democracy, in that it’s the worst system ever invented, except for all the others. The peer-review process does a fair job at promoting good science overall, but it’s far from perfect. Sure anyone can point out a hundred flaws in the system, but I’m just going to focus on one aspect that has been bothering me particularly and has the potential to change: complicated statistical demands.

I have found that reviewers frequently require the data to be reanalyzed with a particularly popular or pet method. In my opinion, reviewers need to ask whether the statistical techniques are appropriate to answer the questions of interest. Do the data meet the assumptions necessary for the model? If there are violations are they likely to lead to biased inference? Let’s be honest, no model is perfect and there are always potential violations of assumptions. Data are samples from reality and a statistical model creates a representation of this reality from the data. The old adage, “All models are wrong, but some are useful,” is important for reviewers to remember. The questions are does the model answer the question of interest (not the question the reviewer wished was asked), is the question interesting, and was the data collected in an appropriate manner to fit with the model.

Last year I had a manuscript rejected primarily because I did not use a hierarchical model to account for detection probability when analyzing count data. In my opinion, the reviewer was way over staunch in requiring a specific type of analysis. The worst part, is it seemed like the reviewer didn’t have extensive experience with the method. The reviewer actually wrote,

They present estimates based on raw counts, rather than corrected totals. Given that they cite one manuscript, McKenny et al., that estimated detection probability and abundance for the same taxa evaluated in the current manuscript, I was puzzled by this decision. The literature base on this issue is large and growing rapidly, and I cannot recommend publication of a paper that uses naïve count data from a descriptive study to support management implications. (emphasis mine)

I’m not alone in having publications rejected on this account. I personally know of numerous manuscripts shot down for this very reason. After such a hardline statement, this reviewer goes on to say,

The sampling design had the necessary temporal and spatial replication of sample plots to use estimators for unmarked animals. The R package ‘unmarked’ provides code to analyze these data following Royle, J. A. 2004. N-mixture models for estimating population size from spatially replicated counts. Biometrics 60:108-115.

That would seem reasonable except that our study had 8 samples in each of 2 years at ~50 sites in 6 different habitats. That would suggest there was sufficient spatial and temporal replication to use Royle’s N-mixture model. HOWEVER, the N-mixture model has the assumption that the temporal replication all occur with population closure (no changes in abundance through births, deaths, immigration, and emigration). Clearly, 2 years of count data are going to violate this assumption. The N-mixture model would be an inappropriate choice for this data. Even if the 2 years were analyzed separately it would violate this assumption because the data were collected biweekly from May – October each year (and eggs hatch in June – September).

Recently, Dail and Madsen (2011; Biometics) developed a generalized form of the N-mixture model that works for open populations. This model might work for this data but in my experience the Dail-Madsen model requires a huge number of spatial replicates. All of these hierarchical models accounting for detection tend to be quite sensitive to spatial replication (more than temporal), low detection probability (common with terrestrial salamanders which were the focus of the study), and variation in detection not well modeled with covariates. Additionally, the Dail-Madsen model was only published a few months before my submission and hadn’t come out when I analyzed the data, plus the reviewer did not mention it. Given the lack of time for people to become aware of the model and lack of rigorous testing of the model, it would seem insane to require it be used for publication. To be fair, I believe Marc Kery did have a variation of the N-mixture model that allowed for population change (Kery et al. 2009).

So if I can’t use the N-mixture model because of extreme violations of model assumptions and the data are insufficient for the Dail-Madsen model, what was I supposed to do with this study? The associate editor rejected the paper without chance for rebuttal. It was a decent management journal, but certainly not Science or even Ecology or Conservation Biology. The data had been collected in 1999-2000 before most of these hierarchical detection models had been invented. They’ve unfortunately been sitting in a drawer for too long. Had they been published in 2001-2002, no one would have questioned this and it would have gotten favorable reviews. The data were collected quite well (I didn’t collect them, so it’s not bragging) and the results are extremely clear. I’m not saying the detection isn’t important to thing about, but in this case even highly biased detection wouldn’t change the story, just the magnitude of the already very large effect. There has recently been good discussion over the importance of accounting for detection and how well these model actually parse abundance/occupancy and detection, so I won’t rehash it too much here. See Brian McGill’s posts on Statistical Machismo and the plethora of thoughtful comments here and here.

Based on this one reviewer’s hardline comments and the associate editor’s decision to reject it outright, it seems like they are suggesting that this data reside in a drawer forever (if it can’t be used with an N-mix or Dail-Madsen model). With that mindset, all papers using count data published before ~2002-2004 should be ignored and most data collected before then should be thrown out to create more server space. This would be a real shame for long term dataset of which there are too few in ecology! This idea of hierarchical detection model or no publication seems like a hypercritical perspective and review. I’m still working on reanalysis and revision to send to another journal. We’ll see what happens with it in the future and if it ever gets published I’ll post a paper summary on this blog. If I don’t use a hierarchical detection model, then I am lumping abundance processes with detection processes and that should be acknowledged. It adds uncertainty to the inference about abundance, but given the magnitude of the differences among habitats and knowledge of the system, it’s hard to imagine it changing the management implications of the study at all.

My point in all of this is there is no statistical panacea. I think hierarchical models are great and in fact I spend most of my days running various forms of these models. However, I don’t think they solve all problems and they aren’t the right tool for every job. I think most current studies where there is a slight chance of detection bias should be designed to account for it, but that doesn’t mean that all studies are worthless if they don’t use these models. These models are WAY more difficult to fit than most people realize and don’t always work. Hopefully, as science and statistics move forward in ever more complicated ways, more reviewers start to realize that there is no perfect model or method. Just asking if the methods employed are adequate to answer the question and that the inference from the statistical models accurately reflect the data and model assumptions. Just because a technique is new and sexy doesn’t mean that everyone needs to use it in every study.


Get every new post delivered to your Inbox.

Join 75 other followers