I recently published my first sole-author paper, which was also my first open access publication (plus first preprint). It was a fun side project unrelated to my primary research, comparing the influence of ecology journals using a variety of metrics (like the journal impact factor). The paper was published in Ideas in Ecology and Evolution, a journal I’m really excited about. It’s a great outlet for creative ideas in the field of EcoEvo, plus they have a section on the Future of Publishing, which has explored some exceptionally innovative ideas regarding scientific publishing and peer review.
Hocking, D. J. 2013. Comparing the influence of ecology journals using citation-based indices: making sense of a multitude of metrics. Ideas in Ecology and Evolution, 6(1), 55–65. doi:10.4033/iee.v6i1.4949
Most researchers are at least moderately familiar with the Journal Impact Factor (JIF), the first and most prevalent citation-based metric of journal influence. The JIF represents the average number of citations in a given year to articles in a journal published in the previous 2 years. Despite its prevalence, the JIF has a number of serious problems such as drawing inference from the mean of a HIGHLY skewed distributions (a small minority of articles receive the vast majority of citations in any journal). Other criticisms of the JIF include an insufficient time period and bias among journals because not all articles are included in the denominator of the average, only “substantial” articles, but citations to all articles are included in the numerator. Numerous metrics have been proposed to improve upon the JIF. I compared 11 citation-based metrics for 110 ecology journals.
The relationship among metrics can be visualized via a plot of principal components analysis. On the left side of the plot are the metrics that are averaged per article, whereas the metrics that group on the right side of the graph are metrics that tend to be higher for journals with higher rates of publication (not explicitly on a per article basis). What is also evident from the PCA plot is that no single metric can encompass all of the multidimensional complexity of scholarly influence among journals. Different metrics can be used to understand different aspects of influence, impact, and prestige.
In addition to whether a metric is on a per-article basis, metrics split philosophically on whether they use network theory or just direct citations. The Eigenfactor, AI, and SJR use variations of the Google PageRank algorithm. This basically means that citations from highly cited journals are worth more than citations from less influential journals.
Overall, I would recommend using Article Influence (AI; available via Web of Science) or alternatively the SCImago Journal Report (SJR; available via Scopus) in place of the JIF when average article influence is of interest. The Eigenfactor is the best metric of the total influence of a journal on science. The Source-Normalized Impact per Paper (SNIP) can be especially useful when comparing journals across disparate fields of research. It corrects for differences in publishing and citation practices among fields of study.
Since review articles tend to get more citations than original research articles on average, journals that publish reviews tend have higher scores across all metrics. Therefore, it’s not surprising that the top ranked ecology journals across most metrics are Annual Review of Ecology, Evolution, and Systematics, Trends in Ecology and Evolution (TREE), and Ecology Letters. A list of some of the journal and metrics are below, but you can find much more information in the original article.
Everyone in academia knows how painful the peer-review publication process can be. It’s a lot like Democracy, in that it’s the worst system ever invented, except for all the others. The peer-review process does a fair job at promoting good science overall, but it’s far from perfect. Sure anyone can point out a hundred flaws in the system, but I’m just going to focus on one aspect that has been bothering me particularly and has the potential to change: complicated statistical demands.
I have found that reviewers frequently require the data to be reanalyzed with a particularly popular or pet method. In my opinion, reviewers need to ask whether the statistical techniques are appropriate to answer the questions of interest. Do the data meet the assumptions necessary for the model? If there are violations are they likely to lead to biased inference? Let’s be honest, no model is perfect and there are always potential violations of assumptions. Data are samples from reality and a statistical model creates a representation of this reality from the data. The old adage, “All models are wrong, but some are useful,” is important for reviewers to remember. The questions are does the model answer the question of interest (not the question the reviewer wished was asked), is the question interesting, and was the data collected in an appropriate manner to fit with the model.
Last year I had a manuscript rejected primarily because I did not use a hierarchical model to account for detection probability when analyzing count data. In my opinion, the reviewer was way over staunch in requiring a specific type of analysis. The worst part, is it seemed like the reviewer didn’t have extensive experience with the method. The reviewer actually wrote,
They present estimates based on raw counts, rather than corrected totals. Given that they cite one manuscript, McKenny et al., that estimated detection probability and abundance for the same taxa evaluated in the current manuscript, I was puzzled by this decision. The literature base on this issue is large and growing rapidly, and I cannot recommend publication of a paper that uses naïve count data from a descriptive study to support management implications. (emphasis mine)
I’m not alone in having publications rejected on this account. I personally know of numerous manuscripts shot down for this very reason. After such a hardline statement, this reviewer goes on to say,
The sampling design had the necessary temporal and spatial replication of sample plots to use estimators for unmarked animals. The R package ‘unmarked’ provides code to analyze these data following Royle, J. A. 2004. N-mixture models for estimating population size from spatially replicated counts. Biometrics 60:108-115.
That would seem reasonable except that our study had 8 samples in each of 2 years at ~50 sites in 6 different habitats. That would suggest there was sufficient spatial and temporal replication to use Royle’s N-mixture model. HOWEVER, the N-mixture model has the assumption that the temporal replication all occur with population closure (no changes in abundance through births, deaths, immigration, and emigration). Clearly, 2 years of count data are going to violate this assumption. The N-mixture model would be an inappropriate choice for this data. Even if the 2 years were analyzed separately it would violate this assumption because the data were collected biweekly from May – October each year (and eggs hatch in June – September).
Recently, Dail and Madsen (2011; Biometics) developed a generalized form of the N-mixture model that works for open populations. This model might work for this data but in my experience the Dail-Madsen model requires a huge number of spatial replicates. All of these hierarchical models accounting for detection tend to be quite sensitive to spatial replication (more than temporal), low detection probability (common with terrestrial salamanders which were the focus of the study), and variation in detection not well modeled with covariates. Additionally, the Dail-Madsen model was only published a few months before my submission and hadn’t come out when I analyzed the data, plus the reviewer did not mention it. Given the lack of time for people to become aware of the model and lack of rigorous testing of the model, it would seem insane to require it be used for publication. To be fair, I believe Marc Kery did have a variation of the N-mixture model that allowed for population change (Kery et al. 2009).
So if I can’t use the N-mixture model because of extreme violations of model assumptions and the data are insufficient for the Dail-Madsen model, what was I supposed to do with this study? The associate editor rejected the paper without chance for rebuttal. It was a decent management journal, but certainly not Science or even Ecology or Conservation Biology. The data had been collected in 1999-2000 before most of these hierarchical detection models had been invented. They’ve unfortunately been sitting in a drawer for too long. Had they been published in 2001-2002, no one would have questioned this and it would have gotten favorable reviews. The data were collected quite well (I didn’t collect them, so it’s not bragging) and the results are extremely clear. I’m not saying the detection isn’t important to thing about, but in this case even highly biased detection wouldn’t change the story, just the magnitude of the already very large effect. There has recently been good discussion over the importance of accounting for detection and how well these model actually parse abundance/occupancy and detection, so I won’t rehash it too much here. See Brian McGill’s posts on Statistical Machismo and the plethora of thoughtful comments here and here.
Based on this one reviewer’s hardline comments and the associate editor’s decision to reject it outright, it seems like they are suggesting that this data reside in a drawer forever (if it can’t be used with an N-mix or Dail-Madsen model). With that mindset, all papers using count data published before ~2002-2004 should be ignored and most data collected before then should be thrown out to create more server space. This would be a real shame for long term dataset of which there are too few in ecology! This idea of hierarchical detection model or no publication seems like a hypercritical perspective and review. I’m still working on reanalysis and revision to send to another journal. We’ll see what happens with it in the future and if it ever gets published I’ll post a paper summary on this blog. If I don’t use a hierarchical detection model, then I am lumping abundance processes with detection processes and that should be acknowledged. It adds uncertainty to the inference about abundance, but given the magnitude of the differences among habitats and knowledge of the system, it’s hard to imagine it changing the management implications of the study at all.
My point in all of this is there is no statistical panacea. I think hierarchical models are great and in fact I spend most of my days running various forms of these models. However, I don’t think they solve all problems and they aren’t the right tool for every job. I think most current studies where there is a slight chance of detection bias should be designed to account for it, but that doesn’t mean that all studies are worthless if they don’t use these models. These models are WAY more difficult to fit than most people realize and don’t always work. Hopefully, as science and statistics move forward in ever more complicated ways, more reviewers start to realize that there is no perfect model or method. Just asking if the methods employed are adequate to answer the question and that the inference from the statistical models accurately reflect the data and model assumptions. Just because a technique is new and sexy doesn’t mean that everyone needs to use it in every study.