19 GLMM, Binary Outcome: Contraception & Amenorrhea
19.1 Packages
19.1.1 CRAN
library(tidyverse) # all things tidy
library(pander) # nice looking genderal tabulations
library(furniture) # nice table1() descriptives
library(texreg) # Convert Regression Output to LaTeX or HTML Tables
library(psych) # contains some useful functions, like headTail
library(lme4) # Linear, generalized linear, & nonlinear mixed models
library(effects) # Plotting estimated marginal means
library(emmeans)
library(interactions)
library(performance)
library(optimx) # Unify and streamline optimization capabilities in R19.2 Data Prep
Data on Amenorrhea from Clinical Trial of Contracepting Women.
Source:
Table 1 (page 168) of Machin et al. (1988). With permission of Elsevier.
Reference:
Machin D, Farley T, Busca B, Campbell M and d’Arcangues C. (1988). Assessing changes in vaginal bleeding patterns in contracepting women. Contraception, 38, 165-179.
Description:
The data are from a longitudinal clinical trial of contracepting women. In this trial women received an injection of either 100 mg or 150 mg of depot-medroxyprogesterone acetate (DMPA) on the day of randomization and three additional injections at 90-day intervals. There was a final follow-up visit 90 days after the fourth injection, i.e., one year after the first injection. Throughout the study each woman completed a menstrual diary that recorded any vaginal bleeding pattern disturbances. The diary data were used to determine whether a women experienced amenorrhea, the absence of menstrual bleeding for a specified number of days. A total of 1151 women completed the menstrual diaries and the diary data were used to generate a binary sequence for each woman according to whether or not she had experienced amenorrhea in the four successive three month intervals.
In clinical trials of modern hormonal contraceptives, pregnancy is exceedingly rare (and would be regarded as a failure of the contraceptive method), and is not the main outcome of interest in this study. Instead, the outcome of interest is a binary response indicating whether a woman experienced amenorrhea in the four successive three month intervals. A feature of this clinical trial is that there was substantial dropout. More than one third of the women dropped out before the completion of the trial.
Variable List:
Indicators
idparticipant identificationoccasiondenotes the four 90-day periods
Outcome or dependent variable
amenorrheaAmenorrhea Status: 1=Amenorrhea, 0=No Amenorrhea
Main predictor or independent variable of interest
dose0 = Low (100 mg), 1 = High (150 mg)
19.2.1 Import
data_raw <- read.table("https://raw.githubusercontent.com/CEHS-research/data/master/MLM/RCTcontraception.txt", header=TRUE)str(data_raw)'data.frame': 4604 obs. of 4 variables:
$ id : int 1 1 1 1 2 2 2 2 3 3 ...
$ dose : int 0 0 0 0 0 0 0 0 0 0 ...
$ occasion : int 1 2 3 4 1 2 3 4 1 2 ...
$ amenorrhea: chr "0" "." "." "." ...
psych::headTail(data_raw, top = 10) id dose occasion amenorrhea
1 1 0 1 0
2 1 0 2 .
3 1 0 3 .
4 1 0 4 .
5 2 0 1 0
6 2 0 2 .
7 2 0 3 .
8 2 0 4 .
9 3 0 1 0
10 3 0 2 .
... ... ... ... <NA>
4601 1151 1 1 1
4602 1151 1 2 1
4603 1151 1 3 1
4604 1151 1 4 1
19.2.2 Long Format
data_long <- data_raw %>%
dplyr::mutate(id = factor(id)) %>%
dplyr::mutate(dose = factor(dose,
levels = c("0", "1"),
labels = c("Low", "High"))) %>%
dplyr::mutate(time = occasion - 1) %>%
dplyr::mutate(amenorrhea = amenorrhea %>% # outcome needs to be numeric
as.character() %>%
as.numeric()) %>%
dplyr::filter(complete.cases(amenorrhea)) %>% # dump missing occations
dplyr::arrange(id, time)str(data_long)'data.frame': 3616 obs. of 5 variables:
$ id : Factor w/ 1151 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ dose : Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
$ occasion : int 1 1 1 1 1 1 1 1 1 1 ...
$ amenorrhea: num 0 0 0 0 0 0 0 0 0 0 ...
$ time : num 0 0 0 0 0 0 0 0 0 0 ...
psych::headTail(data_long, bottom = 10) id dose occasion amenorrhea time
1 1 Low 1 0 0
2 2 Low 1 0 0
3 3 Low 1 0 0
4 4 Low 1 0 0
... <NA> <NA> ... ... ...
3607 1149 High 3 1 2
3608 1149 High 4 1 3
3609 1150 High 1 1 0
3610 1150 High 2 1 1
3611 1150 High 3 1 2
3612 1150 High 4 1 3
3613 1151 High 1 1 0
3614 1151 High 2 1 1
3615 1151 High 3 1 2
3616 1151 High 4 1 3
19.2.3 Wide Format
data_wide <- data_long %>%
dplyr::select(-time) %>%
tidyr::pivot_wider(id_cols = c(id, dose),
names_from = occasion,
names_prefix = "occasion_",
values_from = amenorrhea)str(data_wide)tibble [1,151 × 6] (S3: tbl_df/tbl/data.frame)
$ id : Factor w/ 1151 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ dose : Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
$ occasion_1: num [1:1151] 0 0 0 0 0 0 0 0 0 0 ...
$ occasion_2: num [1:1151] NA NA NA NA NA NA NA NA NA NA ...
$ occasion_3: num [1:1151] NA NA NA NA NA NA NA NA NA NA ...
$ occasion_4: num [1:1151] NA NA NA NA NA NA NA NA NA NA ...
psych::headTail(data_wide, bottom = 10) id dose occasion_1 occasion_2 occasion_3 occasion_4
1 1 Low 0 <NA> <NA> <NA>
2 2 Low 0 <NA> <NA> <NA>
3 3 Low 0 <NA> <NA> <NA>
4 4 Low 0 <NA> <NA> <NA>
5 <NA> <NA> ... ... ... ...
6 1142 High 1 1 1 1
7 1143 High 1 1 1 1
8 1144 High 1 1 1 1
9 1145 High 1 1 1 1
10 1146 High 1 1 1 1
11 1147 High 1 1 1 1
12 1148 High 1 1 1 1
13 1149 High 1 1 1 1
14 1150 High 1 1 1 1
15 1151 High 1 1 1 1
19.3 Exploratory Data Analysis
19.3.1 Summary Statistics
data_summary <- data_long %>%
dplyr::group_by(dose, occasion) %>%
dplyr::summarise(N = n(),
M = mean(amenorrhea),
SD = sd(amenorrhea),
SE = SD/sqrt(N))
data_summary# A tibble: 8 × 6
# Groups: dose [2]
dose occasion N M SD SE
<fct> <int> <int> <dbl> <dbl> <dbl>
1 Low 1 576 0.186 0.389 0.0162
2 Low 2 477 0.262 0.440 0.0202
3 Low 3 409 0.389 0.488 0.0241
4 Low 4 361 0.501 0.501 0.0264
5 High 1 575 0.205 0.404 0.0169
6 High 2 476 0.336 0.473 0.0217
7 High 3 389 0.494 0.501 0.0254
8 High 4 353 0.535 0.499 0.0266
19.3.2 Visualize
data_summary %>%
ggplot(aes(x = occasion,
y = M,
fill = dose)) +
geom_col(position = "dodge") +
theme_bw() +
theme(legend.position = c(0, 1),
legend.justification = c(-0.1, 1.1),
legend.background = element_rect(color = "black"),
legend.key.width = unit(1.5, "cm")) +
labs(x = "90-day windows",
y = "Observed Proportion of Amenorrhea",
fill = "Dosage") +
scale_x_continuous(breaks = 1:4,
labels = c("First",
"Second",
"Third",
"Fourth"))
data_summary %>%
ggplot(aes(x = occasion,
y = M,
color = dose %>% fct_rev())) +
geom_errorbar(aes(ymin = M - SE,
ymax = M + SE),
width = .3,
position = position_dodge(width = .25)) +
geom_point(position = position_dodge(width = .25)) +
geom_line(position = position_dodge(width = .25)) +
theme_bw() +
theme(legend.position = c(0, 1),
legend.justification = c(-0.1, 1.1),
legend.background = element_rect(color = "black"),
legend.key.width = unit(1.5, "cm")) +
labs(x = "90-day windows",
y = "Observed Proportion of Amenorrhea",
color = "Dosage") +
scale_x_continuous(breaks = 1:4,
labels = c("First",
"Second",
"Third",
"Fourth"))
19.4 GLMM - Basic
19.4.1 Fit Models
fit_1 <- lme4::glmer(amenorrhea ~ time*dose + (1 | id),
data = data_long,
family = binomial(link = "logit"))
fit_2 <- lme4::glmer(amenorrhea ~ time + dose + (1 | id),
data = data_long,
family = binomial(link = "logit"))19.4.1.1 Compare via LRT
Should the interaction be included? No.
anova(fit_1, fit_2)Data: data_long
Models:
fit_2: amenorrhea ~ time + dose + (1 | id)
fit_1: amenorrhea ~ time * dose + (1 | id)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
fit_2 4 3931 3956 -1961 3923
fit_1 5 3932 3963 -1961 3922 0.69 1 0.41
performance::compare_performance(fit_1, fit_2, rank = TRUE)# Comparison of Model Performance Indices
Name | Model | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma | Log_loss | Score_log | Score_spherical | AIC weights | AICc weights | BIC weights | Performance-Score
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
fit_1 | glmerMod | 0.619 | 0.126 | 0.564 | 0.303 | 1.000 | 0.311 | -Inf | 0.004 | 0.342 | 0.342 | 0.023 | -Inf%
fit_2 | glmerMod | 0.619 | 0.126 | 0.563 | 0.303 | 1.000 | 0.311 | -Inf | 0.003 | 0.658 | 0.658 | 0.977 | -Inf%
19.4.2 Model Parameter Tables
19.4.2.1 Logit Scale
texreg::knitreg(list(fit_1, fit_2),
custom.model.names = c("with", "without"),
single.row = TRUE,
caption = "MLM Parameter Estimates: Inclusion of Interaction (SE and p-values)")| with | without | |
|---|---|---|
| (Intercept) | -2.55 (0.17)*** | -2.61 (0.16)*** |
| time | 0.87 (0.07)*** | 0.91 (0.05)*** |
| doseHigh | 0.39 (0.21) | 0.50 (0.16)** |
| time:doseHigh | 0.08 (0.09) | |
| AIC | 3932.14 | 3930.83 |
| BIC | 3963.11 | 3955.61 |
| Log Likelihood | -1961.07 | -1961.42 |
| Num. obs. | 3616 | 3616 |
| Num. groups: id | 1151 | 1151 |
| Var: id (Intercept) | 4.26 | 4.24 |
| ***p < 0.001; **p < 0.01; *p < 0.05 | ||
19.4.2.2 Odds ratio scale
texreg::knitreg(list(extract_glmer_exp(fit_1),
extract_glmer_exp(fit_2)),
custom.model.names = c("with", "without"),
ci.test = 1,
single.row = TRUE,
caption = "MLM Parameter Estimates: Inclusion of Interaction (95% CI's)")| with | without | |
|---|---|---|
| (Intercept) | 0.08 [0.06; 0.11]* | 0.07 [0.05; 0.10]* |
| time | 2.40 [2.08; 2.75]* | 2.49 [2.24; 2.77]* |
| doseHigh | 1.48 [0.98; 2.22] | 1.64 [1.19; 2.27]* |
| time:doseHigh | 1.08 [0.90; 1.30] | |
| * Null hypothesis value outside the confidence interval. | ||
19.4.3 Visualize the Model
19.4.3.1 Sclae = Likert
interactions::interact_plot(model = fit_2,
pred = time,
modx = dose,
interval = TRUE,
outcome.scale = "link",
y.label = "Likert Scale for Ammenorea")
19.4.3.2 Scale = Probability
interactions::interact_plot(model = fit_2,
pred = time,
modx = dose,
interval = TRUE,
outcome.scale = "response",
y.label = "Estimated Marginal Probability of Ammenorea") 
interactions::interact_plot(model = fit_2,
pred = time,
modx = dose,
interval = TRUE,
outcome.scale = "response",
x.label = "Months",
y.label = "Predicted Probability of Amenorrhea",
legend.main = "Dosage:",
colors = c("black", "black")) +
geom_hline(yintercept = 0.5, # reference lines
color = "gray",
size = 1.5,
alpha = .5) +
theme_bw() +
theme(legend.position = c(0, 1),
legend.justification = c(-0.1, 1.1),
legend.background = element_rect(color = "black"),
legend.key.width = unit(1.5, "cm")) +
scale_x_continuous(breaks = 0:3,
labels = c("0",
"3",
"6",
"9"))
effects::Effect(focal.predictors = c("dose", "time"),
xlevels = list(time = seq(from = 0, to = 3, by = .1)),
mod = fit_2) %>%
data.frame %>%
ggplot(aes(x = time,
y = fit)) +
geom_hline(yintercept = c(0, 0.5, 1), # reference lines
color = "gray",
size = 1.5) +
geom_ribbon(aes(ymin = fit - se,
ymax = fit + se,
fill = dose),
alpha = .2) +
geom_line(aes(color = dose),
size = 1.5) +
theme_bw() +
labs(y = "Predicted Probability")
Remove the error bands:
effects::Effect(focal.predictors = c("dose", "time"),
xlevels = list(time = seq(from = 0, to = 3, by = .1)),
mod = fit_2) %>%
data.frame %>%
ggplot(aes(x = time,
y = fit)) +
geom_hline(yintercept = c(0, 0.5),
color = "gray",
size = 1.5) +
geom_line(aes(linetype = dose),
size = 1) +
theme_bw() +
theme(legend.position = c(0, 1),
legend.justification = c(-0.1, 1.1),
legend.background = element_rect(color = "black"),
legend.key.width = unit(1.5, "cm")) +
labs(x = "90-day Window",
y = "Predicted Probability of Amenorrhea",
linetype = "Dosage:") +
scale_x_continuous(breaks = 0:3,
labels = c("First",
"Second",
"Third",
"Fourth"))
19.5 GLMM - Optimizers
From the documentation:
The lme4::glmer() function fits a generalized linear mixed model, which incorporates both fixed-effects parameters and random effects in a linear predictor, via maximum likelihood. The linear predictor is related to the conditional mean of the response through the inverse link function defined in the GLM family.
The expression for the likelihood of a mixed-effects model is an integral over the random effects space. For a linear mixed-effects model (LMM), as fit by lmer, this integral can be evaluated exactly. For a GLMM the integral must be approximated. The most reliable approximation for GLMMs is adaptive Gauss-Hermite quadrature, at present implemented only for models with a single scalar random effect. The nAGQ argument controls the number of nodes in the quadrature formula. A model with a single, scalar random-effects term could reasonably use up to 25 quadrature points per scalar integral.
The lme4::lmerControl() function includes an argument for the optimizer, which is the name of a optimizing function(s). IT is a character vector or list of functions: length 1 for lmer or glmer, possibly length 2 for glmer). The built-in optimizers are Nelder_Mead and bobyqa (from the minqa package). Other minimizing functions are allows (constraints do apply).
Special provisions are made for bobyqa, Nelder_Mead, and optimizers wrapped in the optimx package; to use the optimx optimizers (including L-BFGS-B from base optim and nlminb), pass the method argument to optim in the optCtrl argument (you may also need to load the optimx package manually using library(optimx).
19.5.1 Adaptive Gauss-Hermite Quadrature: Increase the number of quadrature points
nAGQ(integer scalar) the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. Defaults to 1, corresponding to the Laplace approximation. Values greater than 1 produce greater accuracy in the evaluation of the log-likelihood at the expense of speed. A value of zero uses a faster but less exact form of parameter estimation for GLMMs by optimizing the random effects and the fixed-effects coefficients in the penalized iteratively reweighted least squares step. (See Details.)
fit_3a <- lme4::glmer(amenorrhea ~ time + I(time^2) + time:dose + I(time^2):dose + (1 | id),
data = data_long,
nAGQ = 50, # increase the number of points
family = binomial)19.5.2 Laplace Approximation: switch to the Nelder_Mead optimizer
fit_3b <- lme4::glmer(amenorrhea ~ time + I(time^2) + time:dose + I(time^2):dose + (1 | id),
data = data_long,
control = glmerControl(optimizer ="Nelder_Mead"),
family = binomial)19.6 Quadratic Time?
Assess need for quadratic time with the LRT
anova(fit_2, fit_3d)Data: data_long
Models:
fit_2: amenorrhea ~ time + dose + (1 | id)
fit_3d: amenorrhea ~ time + I(time^2) + time:dose + I(time^2):dose + (1 | id)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
fit_2 4 3931 3956 -1961 3923
fit_3d 6 3925 3962 -1957 3913 9.72 2 0.0078 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
texreg::knitreg(list(fit_3a, fit_3b, fit_3c, fit_3d),
custom.model.names = c("nAGQ", "Nelder_Mead",
"L BFGS B", "nlminb"),
caption = "GLMM: Various methods of ML approximation",
digits = 4)| nAGQ | Nelder_Mead | L BFGS B | nlminb | |
|---|---|---|---|---|
| (Intercept) | -2.4829*** | -2.4604*** | -2.4601*** | -2.4604*** |
| (0.1416) | (0.1397) | (0.1397) | (0.1397) | |
| time | 0.7714*** | 0.7561*** | 0.7558*** | 0.7561*** |
| (0.2026) | (0.1985) | (0.1985) | (0.1985) | |
| time^2 | 0.0346 | 0.0340 | 0.0341 | 0.0340 |
| (0.0667) | (0.0655) | (0.0655) | (0.0655) | |
| time:doseHigh | 0.8920*** | 0.8861*** | 0.8860*** | 0.8861*** |
| (0.2574) | (0.2513) | (0.2513) | (0.2513) | |
| time^2:doseHigh | -0.2599** | -0.2579** | -0.2579** | -0.2579** |
| (0.0895) | (0.0879) | (0.0879) | (0.0879) | |
| AIC | 3879.4906 | 3925.1127 | 3925.1128 | 3925.1127 |
| BIC | 3916.6493 | 3962.2715 | 3962.2715 | 3962.2715 |
| Log Likelihood | -1933.7453 | -1956.5564 | -1956.5564 | -1956.5564 |
| Num. obs. | 3616 | 3616 | 3616 | 3616 |
| Num. groups: id | 1151 | 1151 | 1151 | 1151 |
| Var: id (Intercept) | 5.0794 | 4.3478 | 4.3494 | 4.3479 |
| ***p < 0.001; **p < 0.01; *p < 0.05 | ||||
interactions::interact_plot(model = fit_3d,
pred = time,
modx = dose,
interval = TRUE)
effects::Effect(focal.predictors = c("dose", "time"),
xlevels = list(time = seq(from = 0, to = 3, by = .1)),
mod = fit_3d) %>%
data.frame %>%
ggplot(aes(x = time,
y = fit)) +
geom_hline(yintercept = c(0, 0.5),
color = "gray",
size = 1.5) +
geom_line(aes(linetype = dose),
size = 1) +
theme_bw() +
theme(legend.position = c(0, 1),
legend.justification = c(-0.1, 1.1),
legend.background = element_rect(color = "black"),
legend.key.width = unit(1.5, "cm")) +
labs(x = "90-day Window",
y = "Predicted Probability of Amenorrhea",
linetype = "Dosage:") +
scale_x_continuous(breaks = 0:3,
labels = c("First",
"Second",
"Third",
"Fourth"))
19.6.1 Post hoc compairsons
fit_3d %>%
emmeans::emmeans(pairwise ~ dose, at = c(time = 0))$emmeans
dose emmean SE df asymp.LCL asymp.UCL
Low -2.46 0.14 Inf -2.73 -2.19
High -2.46 0.14 Inf -2.73 -2.19
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
$contrasts
contrast estimate SE df z.ratio p.value
Low - High 0 0 Inf NaN NaN
Results are given on the log odds ratio (not the response) scale.
fit_3d %>%
emmeans::emmeans(pairwise ~ dose, at = c(time = 0), type = "response")$emmeans
dose prob SE df asymp.LCL asymp.UCL
Low 0.0787 0.0101 Inf 0.061 0.101
High 0.0787 0.0101 Inf 0.061 0.101
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
$contrasts
contrast odds.ratio SE df null z.ratio p.value
Low / High 1 0 Inf 1 NaN NaN
Tests are performed on the log odds ratio scale
fit_3d %>%
emmeans::emmeans(pairwise ~ dose, at = c(time = 1))$emmeans
dose emmean SE df asymp.LCL asymp.UCL
Low -1.67 0.141 Inf -1.95 -1.394
High -1.04 0.132 Inf -1.30 -0.783
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
$contrasts
contrast estimate SE df z.ratio p.value
Low - High -0.628 0.169 Inf -3.711 0.0002
Results are given on the log odds ratio (not the response) scale.
fit_3d %>%
emmeans::emmeans(pairwise ~ dose, at = c(time = 1), type = "response")$emmeans
dose prob SE df asymp.LCL asymp.UCL
Low 0.158 0.0188 Inf 0.125 0.199
High 0.261 0.0255 Inf 0.214 0.314
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
$contrasts
contrast odds.ratio SE df null z.ratio p.value
Low / High 0.533 0.0903 Inf 1 -3.711 0.0002
Tests are performed on the log odds ratio scale
fit_3d %>%
emmeans::emmeans(pairwise ~ dose, at = c(time = 2), type = "response")$emmeans
dose prob SE df asymp.LCL asymp.UCL
Low 0.307 0.0304 Inf 0.251 0.37
High 0.482 0.0348 Inf 0.415 0.55
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
$contrasts
contrast odds.ratio SE df null z.ratio p.value
Low / High 0.477 0.0935 Inf 1 -3.776 0.0002
Tests are performed on the log odds ratio scale
fit_3d %>%
emmeans::emmeans(pairwise ~ dose, at = c(time = 3), type = "response")$emmeans
dose prob SE df asymp.LCL asymp.UCL
Low 0.528 0.0418 Inf 0.446 0.609
High 0.611 0.0403 Inf 0.530 0.686
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
$contrasts
contrast odds.ratio SE df null z.ratio p.value
Low / High 0.714 0.166 Inf 1 -1.446 0.1482
Tests are performed on the log odds ratio scale