Conference Presentations

I recently gave a talk at the Ecological Society of America (ESA) annual meeting in Portland, OR and a poster presentation at the World Congress of Herpetology meeting in Vancouver, BC, Canada. Both presentations were comparing generalized linear mixed models (GLMM) and generalized estimating equations (GEE) for analyzing repeated count data. I advocate for using GEE over the more common GLMM to analyze longitudinal count (or binomial) data when the specific subjects (sites as random effects) are not of special interest. The overall confidence intervals are much smaller in the GEE models and the coefficient estimates are averaged over all subjects (sites). This means the interpretation of coefficients is the log change in Y for each 1 unit change in X on average (averaged across subjects). Below you can see my two presentations for more details.



About these ads

About Daniel Hocking

I am a USGS Mendenhall Postdoctoral Fellow at the the S.O. Conte Anadromous Fish Research Center. I am interested in the use of statistical models in ecology and population biology. I model the abundance and occupancy of organisms in response to land-use and climate change. All opinions are my own and do not represent those of the government or any other organization.

Posted on August 16, 2012, in GEE, GLMM, Modeling and tagged , , , , , , , , , , , , . Bookmark the permalink. 2 Comments.

  1. Nice! I have a couple of questions and possible suggestions:

    You might center your temperatures (subtract overall mean airT, then compute airT^2 from that centered value) to make the first 3 parameters more easily interpretable. Your intercepts are extrapolations to airT=0, which are even more unreliable for a quadratic fit in T than for a linear fit. With centered airT, the intercept is (log) count at mean airT and much more directly comparable between the GLIMM and GEE estimates. That also affects your SEs of the parameter estimates (which are highly correlated in your non-centered fits). You might think about centering values of RH, too.

    To what extent are the pairs {airT + airT^2} and {sin(DOY) + cos(DOY)} defining the same parameter space (in arm-waving terms, something like “co-planar” instead of colinear)? I’m not sure how to answer that, but if you think of a plot of airT v. DOY, {sin(DOY) + cos(DOY)} can fit seasonal airT pretty well, and +/- {airT + airT^2} is likely to fit DOY pretty well (2 DOY values for a given airT + airT^2). Are you treating these pairs of predictors as sequential: after the effects of airT + airT^2, is there an additional component predicted by an annual sinusoidal function of DOY (e.g., day length)? Or, are you thinking in terms of your sin(DOY) + cos(DOY) function fitting a unimodal component for something like “non-dormant abundance” and airT + airT^2 fitting the decrease in activity in mid-summer?

    If 100 / 455 site-nights had 0, yet mean ~ 10, do you have zero-inflation as well as overdispersion, or do your predictors fit those 0s well (e.g., very low RH and large droughtdays predict the nights you got skunked)? Also what is the +/-0.6 in observed 10.0 +/-0.6 per site-night? Whether Var or SD, that would be gross underdispersion.

    Do you think that 5 random sites is sufficient for GEEs? I’m a bit leery about estimating random effects in GLiMM from n=5 in my work, and you note that GEEs can have trouble with small N.

    • Hi Tom,

      Thanks for the comments and suggestions. When there aren’t problems with convergence, I often just use variables without centering or standardizing. However, in this case since the model is rather complex with low replication of sites and coefficients that are not independently interpretable I think centering or standardizing would make sense. Even then the models might be more stable but most of the coefficients wouldn’t be interpretable because of the harmonic functions being out of phase. I hadn’t thought about it improving the reliability of the coefficients and SEs. I didn’t mention it in the presentations but air temperature was divided by 10 (before calculating airT^2) for model stability. It was back transformed for all of the plots.

      I was somewhat surprised that so many of these parameters were selected in the best model using AIC (for the GLMM). I think the harmonic function of day of the year tracks temperature roughly but the airT and airT^2 terms appear to explain additional variance and improve the model. There can be relatively relatively hot and cold days in any season in New England. I probably could have left out the squared term and just had an interaction with airT and DOY. Generally, I think of it sequentially as in after the effects of airT there is additional components predicted by day of the year. You can see it in the plot of just DOY while holding all other variables constant (good levels of rainfall and temperature). Even under optimal conditions the captures in the summer go way down. Even three days of very cool summer temperatures and lots of rain fails to bring out salamanders. Those same conditions in the spring or fall lead to huge numbers of salamanders.

      I think my models predict the zeros relatively well, better than they predict the large values. There still might be slight zero inflation but there is definitely overdispersion. The fact that GEE are a bit better when the data don’t fit the assumed distribution well is one reason why I like them.

      I don’t think 5 random sites is ideal since GEE estimates are asymptotically accurate. I would much rather have 10+ sites but this was part of another project and 5 sites was the most I could get to in a given night. Since the sites were quite similar and the variance wasn’t very high across sites, I think the models are okay. I think it really just limits the extent of inference. The models apply to beech dominated forest stands in southeast New Hampshire but are unlikely to be more general than that. I think that is the limitation whether GEE and GLMM models are used, especially if GLMM estimates are incorrectly interpreted as population-averaged (which most ecologists do).

      I would definitely like to use a more robust data set to demonstrate the pros and cons of GLMM and GEE but I also think my salamander activity data has value. Although the absolute numbers predicted by the models are limited in inference, based on my experience, I think the models show the patterns of activity of red-backed salamanders outside mountains regions throughout much of their range. I’m still working out the best way to present the data for publication.

      Thanks for all the thoughts,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 64 other followers

%d bloggers like this: