Generalized Linear Mixed Model (GLMM) is a statistical model that extends the linear mixed model to accommodate non-normal response variables. It is widely used in many fields, such as biology, medicine, social sciences, and economics. In R, the lme4 package provides functions for fitting GLMMs.
One common mistake people make when fitting GLMMs is using the wrong distribution for the response variable. GLMMs can handle a variety of distributions, such as binomial, Poisson, and gamma, among others. It is important to choose the appropriate distribution that fits the data.
Another mistake is ignoring the random effect structure. GLMMs include both fixed effects and random effects. Random effects account for the correlation among observations within groups or clusters. Neglecting the random effect structure can lead to incorrect inference.
Example 1: Binary Response Variable
We will use the “swiss” dataset in R, which contains socio-economic indicators for 47 Swiss provinces.
Suppose we want to investigate the effect of fertility rate and education level on the probability of a province having a high Catholic population. We define “high Catholic” as having more than 50% of the population being Catholic. We can fit a GLMM with a binomial distribution and a logit link function:
library(lme4) model1 <- glmer(catholic > 0.5 ~ fertility + education + (1 | region), data = swiss, family = binomial(link ="logit")) summary(model1)
The output shows that both fertility and education have a significant effect on the probability of a province having a high Catholic population.
Example 2: Count Response Variable
We will use the “sleepstudy” dataset in R, which contains reaction times of subjects in a sleep deprivation study.
Suppose we want to investigate the effect of days of sleep deprivation and the subject’s ID on the number of reaction time errors. We can fit a GLMM with a Poisson distribution and a log link function:
model2 <- glmer(errors ~ days + (1 | subject), data = sleepstudy, family = poisson(link ="log")) summary(model2)
The output shows that both days of sleep deprivation and the subject's ID have a significant effect on the number of reaction time errors.