# R script to calculate QIC for Generalized Estimating Equation (GEE) Model Selection

[UPDATE: IMPROVED CODE AND EXTENSIONS ARE NOW AVAILABLE ON https://github.com/djhocking/qicpack INCLUDING AS AN R PACKAGE]

Generalized Estimating Equations (GEE) can be used to analyze longitudinal count data; that is, repeated counts taken from the same subject or site. This is often referred to as repeated measures data, but longitudinal data often has more repeated observations. Longitudinal data arises from studies in virtually all branches of science. In psychology or medicine, repeated measurements are taken on the same patients over time. In sociology, schools or other social distinct groups are observed over time. In my field, ecology, we frequently record data from the same plants or animals repeated over time. Furthermore, the repeated measures don’t have to be separated in time. A researcher could take multiple tissue samples from the same subject at a given time. I often repeatedly visit the same field sites (e.g. same patch of forest) over time. If the data are discrete counts of things (e.g. number of red blood cells, number of acorns, number of frogs), the data will generally follow a Poisson distribution.

Longitudinal count data, following a Poisson distribution, can be analyzed with Generalized Linear Mixed Models (GLMM) or with GEE. I won’t get into the computational or philosophical differences between conditional, subject-specific estimates associated with GLMM and marginal, population-level estimates obtained by GEE in this post. However, if you decide that GEE is right for you (I have a paper in preparation comparing GLMM and GEE), you may also want to compare multiple GEE models. Unlike GLMM, GEE does not use full likelihood estimates, but rather, relies on a quasi-likelihood function. Therefore, the popular AIC approach to model selection don’t apply to GEE models. Luckily, Pan (2001) developed an equivalent QIC for model comparison. Like AIC, it balances the model fit with model complexity to pick the most parsimonious model.

Unfortunately, there is currently no QIC package in R for GEE models. geepack is a popular R package for GEE analysis. So, I wrote the short R script below to calculate Pan’s QIC statistic from the output of a GEE model run in geepack using the geese function. It currently employs the Moore-Penrose Generalized Matrix Inverse through the MASS package. I left in my original code using the identity matrix but it is preceded by a pound sign so it doesn’t run. [edition: April 10, 2012] The input for the QIC function needs to come from the geeglm function (as opposed to “geese”) within geepack.

I hope you find it useful. I’m still fairly new to R and this is one of my first custom functions, so let me know if you have problems using it or if there are places it can be improved. If you decide to use this for analysis in a publication, please let me know just for my own curiosity (and ego boost!).

###################################################################################### QIC for GEE models# Daniel J. Hocking# 07 February 2012# Refs:
# Pan (2001)
# Liang and Zeger (1986)
# Zeger and Liang (1986)
# Hardin and Hilbe (2003)
# Dornmann et al 2007
# # http://www.unc.edu/courses/2010spring/ecol/562/001/docs/lectures/lecture14.htm###################################################################################### Poisson QIC for geese{geepack} output# Ref: Pan (2001)
QIC.pois.geeglm <-function(model.R, model.indep){
library(MASS)
# Fitted and observed values for quasi likelihood
mu.R <- model.R$fitted.values # alt: X <- model.matrix(model.R) # names(model.R$coefficients) <- NULL
#  beta.R <- model.R$coefficients # mu.R <- exp(X %*% beta.R) y <- model.R$y

# Quasi Likelihood for Poisson
quasi.R <- sum((y*log(mu.R))- mu.R)# poisson()$dev.resids - scale and weights = 1  # Trace Term (penalty for model complexity) AIinverse<- ginv(model.indep$geese$vbeta.naiv)# Omega-hat(I) via Moore-Penrose  generalized inverse of a matrix in MASS package # Alt: AIinverse <- solve(model.indep$geese$vbeta.naiv) # solve via identity Vr<- model.R$geese\$vbeta
trace.R <- sum(diag(AIinverse%*%Vr))
px <- length(mu.R)# number non-redunant columns in design matrix

# QIC
QIC <-(-2)*quasi.R +2*trace.R
QICu<-(-2)*quasi.R +2*px    # Approximation assuming model structured correctly
output <- c(QIC,QICu, quasi.R, trace.R, px)
names(output)<- c('QIC','QICu','Quasi Lik','Trace','px')
output}

I am a USGS Mendenhall Postdoctoral Fellow at the the S.O. Conte Anadromous Fish Research Center. I am interested in the use of statistical models in ecology and population biology. I model the abundance and occupancy of organisms in response to land-use and climate change. All opinions are my own and do not represent those of the government or any other organization.

Posted on March 24, 2012, in Uncategorized and tagged , , , , , , , , , , . Bookmark the permalink. 9 Comments.

1. Liz

Hi, thanks for publishing this. But I’m confused, what is the second argument to the function?
Thanks!
Liz

• Hi Liz,

I should have explained it in the post. Pan’s (2000) QIC formulation compares the GEE fit with an autoregressive or exchangeable or other parametrization with the same GEE model fit assuming an independent correlation structure. The first argument is the parametrization of interest and the second argument is the same model fit with an “independence” correlation structure using geepack. I actually have a better function written now and am in the process of creating an R package for this were the independent model is calculated within the function, so only one argument is required. I will post it on my blog when I get it finished (early this winter likely, now that the field season is winding down).

Thanks for your interest. Let me know if you have other questions.
Dan

2. Vinícius Menarin

Hi Daniel,

If I want to calculate the QIC for the independent correlation structure, how can I do it? Is it possible?
qic.pois.geeglm(ajust_independent,ajust_independent)

I obtained a QIC value using it, but I’m not sure about that.
What I need is a relation of QIC values for some different correlation structures.

Thanks!
Vinícius

• Hi Vinicius,

Thanks for the interest in the QIC code. It’s still a work in progress. I just updated the code so it should be much improved and provided an example of how to use it. You can just run in with a geeglm model that used and “independence” correlation structure to get the QIC. The new post and code can be found at https://danieljhocking.wordpress.com/2012/11/15/gee-qic-update/

It shows how to run the QIC function on multiple models that use different correlation structures or different predictor variables using the sapply function. In the new code you don’t have to run a separate independent model to input into the QIC function, the independent structure model is automatically run within the QIC function. However, you can still run an independent correlation structure model in the QIC function to get a QIC function. It seems circular but in theory it should provide a reliable QIC value that you can compare with other models that differ in correlation structure. The new function also includes more distribution options (normal, binomial, Poisson, or gamma). Negative binomial is unnecessary in GEE models because there is already an inherent overdispersion term (Phi).

Let me know if you have more questions or if this doesn’t work for you.
-Dan
————————————————————————————
Daniel J. Hocking
114 James Hall
Department of Natural Resources & the Environment
University of New Hampshire
Durham, NH 03824

https://danieljhocking.wordpress.com/

“Somewhere, something incredible is waiting to be known” – Carl Sagan
————————————————————————————-

3. Hi Daniel. i want ask about QIC in GEE… I accounted QIC in my data, but result always structure correlation independence are the best for my data, my data is data panel.. why?
thanks you, sorri i cant speak english …

Thanks
Yuni Y. , Indonesian

• Hi Yuni,

Unfortunately, I haven’t worked with GEE models enough to get a feel for when independence correlation structures are selected over exchangeable or autoregressive. I think I remember reading once that QIC shouldn’t be used to select correlation structures, but should only be used for testing different models once the error structure is specified. I’m not sure what that is or if it’s really a best practice. I’ll let you know if I figure it out or if I figure out when the independence structure tends to be selected.

Sorry I don’t have more answers for you. I’ve just been dipping my feet in the GEE waters but still don’t have a good feel for them. I’ve used mixed models and hierarchical Bayesian models more frequently for my research.

Best of luck,
Dan

4. The QIC statistic has been demonstrated in several journal articles to be in fact a poor selector of the appropriate GEE correlation structures. Two new statistics have been designed through simulations to better select the best structure. See Hardin & Hilbe, Generalized Estimating Equations, 2nd ed,, Chapman & Hall/CRC Press(2013) and Shults & Hilbe, Quasi-least Squares Regression, Chapman & Hall/CRC Press (2014)

• Thanks for the information. Does QIC seem appropriate for selection of the predictor variables if the correlation structure is “correct”? I’ll take a look at the two books and papers referenced therein. I’ll have to see if I can find code for these new methods in R. Otherwise maybe I’ll work something up. We’ll see, I haven’t been using GEE much lately. Mostly GLMM and other hierarchical regression models.