12 Extensions of Logistic Regression - Ex: Canadian Women’s Labour-Force Participation
library(tidyverse)
library(haven) # read in SPSS dataset
library(furniture) # nice table1() descriptives
library(stargazer) # display nice tables: summary & regression
library(texreg) # Convert Regression Output to LaTeX or HTML Tables
library(psych) # contains some useful functions, like headTail
library(car) # Companion to Applied Regression
library(sjPlot) # Quick plots and tables for models
library(car) # Companion to Applied Regression (a text book - includes datasets)
library(MASS) # Support Functions and Datasets
library(nnet) # Multinomial Log-Linear Models
12.1 Background
The Womenlf
data frame has 263 rows and 4 columns. The data are from a 1977 survey of the Canadian population.
Dependent variable (DV) or outcome
partic
Labour-Force Participation, a factor with levels:fulltime
Working full-time
not.work
Not working outside the home
parttime
Working part-time
Indepdentend variables (IV) or predictors
hincome
Husband’s income, in $1000’schildren
Presence of children in the household, a factor with levels:absent
no children in the home
present
at least one child at home
region
A factor with levels:Atlantic
Atlantic CanadaBC
British ColumbiaOntario
Prairie
Prairie provincesQuebec
12.1.1 Raw Dataset
The data is included in the carData
package which installs and loads with the car
package.
data(Womenlf, package = "carData") # load the internal data
::glimpse(Womenlf) # glimpse a bit of the data tibble
Rows: 263
Columns: 4
$ partic <fct> not.work, not.work, not.work, not.work, not.work, not.work, n~
$ hincome <int> 15, 13, 45, 23, 19, 7, 15, 7, 15, 23, 23, 13, 9, 9, 45, 15, 5~
$ children <fct> present, present, present, present, present, present, present~
$ region <fct> Ontario, Ontario, Ontario, Ontario, Ontario, Ontario, Ontario~
%>%
Womenlf :: filter(row_number() %in% sample(1:nrow(.), size = 10)) # select a random sample of 10 rows dplyr
# A tibble: 10 x 4
partic hincome children region
<fct> <int> <fct> <fct>
1 not.work 15 present Ontario
2 parttime 4 present Ontario
3 not.work 17 present Atlantic
4 parttime 13 present Prairie
5 fulltime 14 absent Ontario
6 fulltime 1 absent Prairie
7 not.work 7 present Ontario
8 not.work 19 present Quebec
9 not.work 11 present Quebec
10 not.work 11 absent Quebec
Notice the order of the factor levels, especially for the partic
factor
str(Womenlf) # view the structure of the data
'data.frame': 263 obs. of 4 variables:
$ partic : Factor w/ 3 levels "fulltime","not.work",..: 2 2 2 2 2 2 2 1 2 2 ...
$ hincome : int 15 13 45 23 19 7 15 7 15 23 ...
$ children: Factor w/ 2 levels "absent","present": 2 2 2 2 2 2 2 2 2 2 ...
$ region : Factor w/ 5 levels "Atlantic","BC",..: 3 3 3 3 3 3 3 3 3 3 ...
We can view the order of the factors levels
$partic %>% levels() # view the levels (in order) of the variable Womenlf
[1] "fulltime" "not.work" "parttime"
12.1.2 Declare Factors
<- Womenlf %>%
Womenlf_clean ::mutate(working_ord = partic %>%
dplyr::fct_recode("Full Time" = "fulltime",
forcats"Not at All" = "not.work",
"Part Time" = "parttime") %>%
factor(levels = c("Not at All",
"Part Time",
"Full Time"))) %>%
::mutate(working_any = dplyr::case_when(partic %in% c("fulltime", "parttime") ~ "At Least Part Time",
dplyr== "not.work" ~ "Not at All") %>%
partic factor(levels = c("Not at All",
"At Least Part Time"))) %>%
::mutate(working_full = dplyr::case_when(partic == "fulltime" ~ "Full Time",
dplyr%in% c("not.work", "parttime") ~ "Less Than Full Time")%>%
partic factor(levels = c("Less Than Full Time",
"Full Time"))) %>%
::mutate(working_type = dplyr::case_when(partic == "fulltime" ~ "Full Time",
dplyr== "parttime" ~ "Part Time")%>%
partic factor(levels = c("Part Time",
"Full Time")))
Display the structure of the ‘clean’ version of the dataset
str(Womenlf_clean) # view the structure of the data
'data.frame': 263 obs. of 8 variables:
$ partic : Factor w/ 3 levels "fulltime","not.work",..: 2 2 2 2 2 2 2 1 2 2 ...
$ hincome : int 15 13 45 23 19 7 15 7 15 23 ...
$ children : Factor w/ 2 levels "absent","present": 2 2 2 2 2 2 2 2 2 2 ...
$ region : Factor w/ 5 levels "Atlantic","BC",..: 3 3 3 3 3 3 3 3 3 3 ...
$ working_ord : Factor w/ 3 levels "Not at All","Part Time",..: 1 1 1 1 1 1 1 3 1 1 ...
$ working_any : Factor w/ 2 levels "Not at All","At Least Part Time": 1 1 1 1 1 1 1 2 1 1 ...
$ working_full: Factor w/ 2 levels "Less Than Full Time",..: 1 1 1 1 1 1 1 2 1 1 ...
$ working_type: Factor w/ 2 levels "Part Time","Full Time": NA NA NA NA NA NA NA 2 NA NA ...
12.2 Exploratory Data Analysis
Three versions of the outcome
%>%
Womenlf_clean ::table1(working_ord, working_any,
furniture
working_full, working_type,na.rm = FALSE, # do NOT restrict to complete cases!!!
output = "markdown")
Mean/Count (SD/%) | |
---|---|
n = 263 | |
working_ord | |
Not at All | 155 (58.9%) |
Part Time | 42 (16%) |
Full Time | 66 (25.1%) |
NA | 0 (0%) |
working_any | |
Not at All | 155 (58.9%) |
At Least Part Time | 108 (41.1%) |
NA | 0 (0%) |
working_full | |
Less Than Full Time | 197 (74.9%) |
Full Time | 66 (25.1%) |
NA | 0 (0%) |
working_type | |
Part Time | 42 (16%) |
Full Time | 66 (25.1%) |
NA | 155 (58.9%) |
Other Predisctors, univariate
%>%
Womenlf_clean ::table1("Husband's Income, $1000's" = hincome,
furniture"Children In the Home" = children,
"Region of Canada" = region,
output = "markdown")
Mean/Count (SD/%) | |
---|---|
n = 263 | |
Husband’s Income, $1000’s | |
14.8 (7.2) | |
Children In the Home | |
absent | 79 (30%) |
present | 184 (70%) |
Region of Canada | |
Atlantic | 30 (11.4%) |
BC | 29 (11%) |
Ontario | 108 (41.1%) |
Prairie | 31 (11.8%) |
Quebec | 65 (24.7%) |
%>%
Womenlf_clean ::table1("Husband's Income, $1000's" = hincome,
furniture"Children In the Home" = children,
"Region of Canada" = region,
splitby = ~ working_ord,
row_wise = TRUE, # show row %s rather than default column %s
test = TRUE,
total = TRUE,
output = "markdown")
Total | Not at All | Part Time | Full Time | P-Value | |
---|---|---|---|---|---|
n = 263 | n = 155 | n = 42 | n = 66 | ||
Husband’s Income, $1000’s | 0.002 | ||||
14.8 (7.2) | 15.6 (7.2) | 16.0 (8.1) | 12.1 (6.1) | ||
Children In the Home | <.001 | ||||
absent | 79 (100%) | 26 (32.9%) | 7 (8.9%) | 46 (58.2%) | |
present | 184 (100%) | 129 (70.1%) | 35 (19%) | 20 (10.9%) | |
Region of Canada | 0.71 | ||||
Atlantic | 30 (100%) | 20 (66.7%) | 4 (13.3%) | 6 (20%) | |
BC | 29 (100%) | 14 (48.3%) | 8 (27.6%) | 7 (24.1%) | |
Ontario | 108 (100%) | 64 (59.3%) | 17 (15.7%) | 27 (25%) | |
Prairie | 31 (100%) | 17 (54.8%) | 6 (19.4%) | 8 (25.8%) | |
Quebec | 65 (100%) | 40 (61.5%) | 7 (10.8%) | 18 (27.7%) |
12.2.1 Husband’s Income
%>%
Womenlf_clean ggplot(aes(hincome,
fill = working_ord)) +
geom_density(alpha = .3)
%>%
Womenlf_clean ggplot(aes(x = working_ord,
y = hincome)) +
geom_jitter(position=position_jitter(0.2)) +
stat_summary(fun.y = mean,
geom = "errorbar",
aes(ymax = ..y..,
ymin = ..y..),
width = .75,
color = "red",
size = 1)
%>%
Womenlf_clean ggplot(aes(hincome,
x = working_ord,
fill = working_ord)) +
geom_boxplot(alpha = .3)
%>%
Womenlf_clean ggplot(aes(hincome,
x = working_ord,
fill = working_ord)) +
geom_violin(alpha = .3) +
stat_summary(fun = mean,
geom = "errorbar",
aes(ymax = ..y..,
ymin = ..y..),
width = .75,
color = "red",
size = 1)
12.2.2 Children in Home
%>%
Womenlf_clean ggplot(aes(x = children,
fill = working_ord)) +
geom_bar()
%>%
Womenlf_clean ggplot(aes(x = children,
fill = working_ord %>% fct_rev)) +
geom_bar(position="fill") +
labs(x = "Children in the Home",
y = "Proportion of Women",
fill = "Working") +
theme_bw() +
scale_fill_manual(values = c("gray25", "gray50", "gray75"))
12.2.3 Region of Canada
%>%
Womenlf_clean ggplot(aes(x = region,
fill = working_ord)) +
geom_bar()
%>%
Womenlf_clean ggplot(aes(x = region,
fill = working_ord %>% fct_rev)) +
geom_bar(position="fill") +
labs(x = "Region of Canada",
y = "Proportion of Women",
fill = "Working") +
theme_bw() +
scale_fill_manual(values = c("gray25", "gray50", "gray75"))
12.3 Hierarchical (nested) Logistic Regression
For an \(m-\)category polytomy dependent variable is respecified as a series of \(m – 1\) nested dichotomies. A single or combined levels of outcome compared to another single or combination of levels. Then they are analyzed using a series of binary logistic regressions, such that:
- Dichotomies selected based on theory
- Avoid redundancy
- Similar to contrast coding, but for outcome
For this dataset example, the outcome (partic
) has \(3-\)categories, so we will investigate TWO nested dichotomies
- outcome =
working_any
- outcome =
working_type
12.3.1 Role of Predictors on ANY working
Fit a regular logistic model with all three predictors regressed on the binary indicator for any working. Use the glm()
function in the base \(R\) stats
package.
<- glm(working_any ~ hincome + children + region,
fit_glm_1 data = Womenlf_clean,
family = binomial(link = "logit"))
summary(fit_glm_1)
Call:
glm(formula = working_any ~ hincome + children + region, family = binomial(link = "logit"),
data = Womenlf_clean)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7927 -0.8828 -0.7283 0.9562 2.0074
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.26771 0.55296 2.293 0.0219 *
hincome -0.04534 0.02057 -2.204 0.0275 *
childrenpresent -1.60434 0.30187 -5.315 1.07e-07 ***
regionBC 0.34196 0.58504 0.585 0.5589
regionOntario 0.18778 0.46762 0.402 0.6880
regionPrairie 0.47186 0.55680 0.847 0.3967
regionQuebec -0.17310 0.49957 -0.347 0.7290
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 356.15 on 262 degrees of freedom
Residual deviance: 317.30 on 256 degrees of freedom
AIC: 331.3
Number of Fisher Scoring iterations: 4
Check if region
is statistically significant with the drop1()
function from the base \(R\) stats
package. This may be done with a Likelihood Ratio Test (test = "LRT"
, which is the same as test = "Chisq"
for glm
models).
drop1(fit_glm_1, test = "LRT")
# A tibble: 4 x 5
Df Deviance AIC LRT `Pr(>Chi)`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 317. 331. NA NA
2 1 322. 334. 5.13 0.0236
3 1 348. 360. 30.5 0.0000000326
4 4 320. 326. 2.43 0.657
Since the region doesn’t have exhibit any effect on odds a women is in the labor force, remove that predictor in the model to simplify to a ‘best’ final model. Also, center husband’s income at a value near the mean so the intercept has meaning.
<- glm(working_any ~ I(hincome - 14) + children,
fit_glm_2 data = Womenlf_clean,
family = binomial(link = "logit"))
summary(fit_glm_2)
Call:
glm(formula = working_any ~ I(hincome - 14) + children, family = binomial(link = "logit"),
data = Womenlf_clean)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6767 -0.8652 -0.7768 0.9292 1.9970
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.74351 0.24312 3.058 0.00223 **
I(hincome - 14) -0.04231 0.01978 -2.139 0.03244 *
childrenpresent -1.57565 0.29226 -5.391 7e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 356.15 on 262 degrees of freedom
Residual deviance: 319.73 on 260 degrees of freedom
AIC: 325.73
Number of Fisher Scoring iterations: 4
The texreg
package uses an intermediate function called extract()
to extract information for the model and then put it in the right places in the table. We can invervene by writing our own extract_exp()
function to use instead.
<- function(fit_glm){
extract_exp = coef(fit_glm)
beta = confint(fit_glm)
betaci = texreg::extract(fit_glm)
fit_glm_exp @coef = exp(beta)
fit_glm_exp@ci.low = exp(betaci[, 1])
fit_glm_exp@ci.up = exp(betaci[, 2])
fit_glm_expreturn(fit_glm_exp)
}
::knitreg(extract_glm_exp(fit_glm_2),
texregcustom.coef.names = c("BL: No children, Husband Earns $14,000/yr",
"Husband's Income, $1000's",
"Children in the Home"),
custom.model.names = "OR, Women is in the Workforce at All",
single.row = TRUE,
custom.note = "* The value of '1' is outside the confidence interval for the OR")
OR, Women is in the Workforce at All | |
---|---|
BL: No children, Husband Earns $14,000/yr | 2.10 [1.32; 3.44]* |
Husband’s Income, $1000’s | 0.96 [0.92; 1.00]* |
Children in the Home | 0.21 [0.12; 0.36]* |
|
Interpretation:
Among women without children in the home and a husband making $14,000 annually, there is about a 2:1 odds she is in the workforce.
For each additional thousand dollars the husband makes, the odds ratio decreases by about 4 percent.
If there are children in the home, the odds of being in the workforce is nearly a fifth as large.
::interact_plot(model = fit_glm_2,
interactionspred = hincome,
modx = children,
interval = TRUE)
::interact_plot(model = fit_glm_2,
interactionspred = hincome,
modx = children,
interval = TRUE,
x.label = "Husband's Income, in $1,000's",
y.label = "Predicted Probability of\nWomen Being in the Workforce",
legend.main = "Children in the Home:",
modx.labels = c("Absent", "Present"),
colors = rep("black", 2)) +
geom_vline(xintercept = 14, color = "gray25") + # reference line for intercept
theme_bw() +
theme(legend.position = "bottom",
legend.key.width = unit(2, "cm"))
::allEffects(fit_glm_2) effects
model: working_any ~ I(hincome - 14) + children
hincome effect
hincome
1 10 20 30 40
0.5476466 0.4527392 0.3514450 0.2619655 0.1886414
children effect
children
absent present
0.6707323 0.2964731
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = seq(from = 1, to = 45, by = .1)),
mod = fit_glm_2) %>%
data.frame() %>%
ggplot(aes(x = hincome,
y = fit,
color = children,
linetype = children)) +
geom_vline(xintercept = 14, color = "gray25") + # reference line for intercept
geom_line(size = 1) +
theme_bw() +
labs(x = "Husband's Income, in $1000's",
y = "Predicted Probability of\nWomen Being in the Workforce",
color = "Children in\nthe Home:",
linetype = "Children in\nthe Home:") +
theme(legend.position = c(1, 1),
legend.justification = c(1.1, 1.1),
legend.background = element_rect(color = "black")) +
coord_cartesian(ylim = c(0, 1))
12.3.2 Role of Predictors on TYPE of work
Fit a regular logistic model with all three predictors regressed on the binary indicator for level/type of working.
<- glm(working_type ~ hincome + children + region,
fit_glm_3 data = Womenlf_clean,
family = binomial(link = "logit"))
summary(fit_glm_3)
Call:
glm(formula = working_type ~ hincome + children + region, family = binomial(link = "logit"),
data = Womenlf_clean)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5202 -0.8048 0.3583 0.7201 1.9957
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.76164 1.05718 3.558 0.000373 ***
hincome -0.10475 0.04032 -2.598 0.009383 **
childrenpresent -2.74781 0.56893 -4.830 1.37e-06 ***
regionBC -1.18248 1.02764 -1.151 0.249865
regionOntario -0.14876 0.84703 -0.176 0.860589
regionPrairie -0.39173 0.96310 -0.407 0.684200
regionQuebec 0.14842 0.93300 0.159 0.873612
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 144.34 on 107 degrees of freedom
Residual deviance: 101.84 on 101 degrees of freedom
(155 observations deleted due to missingness)
AIC: 115.84
Number of Fisher Scoring iterations: 5
Check if region
is statistically significant with the drop1()
function from the base \(R\) stats
package. This may be done with a Likelihood Ratio Test (test = "LRT"
, which is the same as test = "Chisq"
for glm
models).
drop1(fit_glm_3, test = "LRT")
# A tibble: 4 x 5
Df Deviance AIC LRT `Pr(>Chi)`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 102. 116. NA NA
2 1 110. 122. 7.84 0.00512
3 1 134. 146. 31.9 0.0000000162
4 4 104. 110. 2.65 0.618
Since the region doesn’t have exhibit any effect on odds a working women is in the labor force full time, remove that predictor in the model to simplify to a ‘best’ final model. Also, center husband’s income at a value near the mean so the intercept has meaning.
<- glm(working_type ~ I(hincome - 14) + children,
fit_glm_4 data = Womenlf_clean,
family = binomial(link = "logit"))
summary(fit_glm_4)
Call:
glm(formula = working_type ~ I(hincome - 14) + children, family = binomial(link = "logit"),
data = Womenlf_clean)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4047 -0.8678 0.3949 0.6213 1.7641
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.97602 0.43024 4.593 4.37e-06 ***
I(hincome - 14) -0.10727 0.03915 -2.740 0.00615 **
childrenpresent -2.65146 0.54108 -4.900 9.57e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 144.34 on 107 degrees of freedom
Residual deviance: 104.49 on 105 degrees of freedom
(155 observations deleted due to missingness)
AIC: 110.49
Number of Fisher Scoring iterations: 5
The texreg
package uses an intermediate function called extract()
to extract information for the model and then put it in the right places in the table. We can invervene by writing our own extract_exp()
function to use instead.
::knitreg(list(extract_glm_exp(fit_glm_2),
texregextract_glm_exp(fit_glm_4)),
custom.coef.names = c("BL: No children, Husband Earns $14,000/yr",
"Husband's Income, $1000's",
"Children in the Home"),
custom.model.names = c("Working at All",
"Full vs. Part-Time"),
single.row = TRUE,
custom.note = "* The value of '1' is outside the confidence interval for the OR")
Working at All | Full vs. Part-Time | |
---|---|---|
BL: No children, Husband Earns $14,000/yr | 2.10 [1.32; 3.44]* | 7.21 [3.34; 18.45]* |
Husband’s Income, $1000’s | 0.96 [0.92; 1.00]* | 0.90 [0.83; 0.97]* |
Children in the Home | 0.21 [0.12; 0.36]* | 0.07 [0.02; 0.19]* |
|
Interpretation:
Among working women without children in the home and a husband making $14,000 annually, there is more than 7:1 odds she is working full time verses part time.
For each additional thousand dollars the husband makes, the odds ratio decreases by about 10 percent.
If there are children in the home, the odds of being in the workforce is drastically reduced.
::interact_plot(model = fit_glm_4,
interactionspred = hincome,
modx = children,
interval = TRUE)
::interact_plot(model = fit_glm_4,
interactionspred = hincome,
modx = children,
interval = TRUE,
x.label = "Husband's Income, in $1,000's",
y.label = "Predicted Probability of\nWomen Being in the Workforce Full Time\nif they are working",
legend.main = "Children in the Home:",
modx.labels = c("Absent", "Present"),
colors = rep("black", 2)) +
geom_vline(xintercept = 14, color = "gray25") + # reference line for intercept
theme_bw() +
theme(legend.position = "bottom",
legend.key.width = unit(2, "cm"))
::allEffects(fit_glm_4) effects
model: working_type ~ I(hincome - 14) + children
hincome effect
hincome
1 10 20 30 40
0.8829045 0.7416987 0.4955346 0.2515165 0.1031024
children effect
children
absent present
0.8830579 0.3475686
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = seq(from = 1, to = 45, by = .1)),
mod = fit_glm_4) %>%
data.frame() %>%
ggplot(aes(x = hincome,
y = fit,
color = children,
linetype = children)) +
geom_vline(xintercept = 14, color = "gray25") + # reference line for intercept
geom_line(size = 1) +
theme_bw() +
labs(x = "Husband's Income, in $1000's",
y = "Predicted Probability of\nWomen Being in the Workforce Full Time\nif they are working",
color = "Children in\nthe Home:",
linetype = "Children in\nthe Home:") +
theme(legend.position = c(1, 1),
legend.justification = c(1.1, 1.1),
legend.background = element_rect(color = "black")) +
coord_cartesian(ylim = c(0, 1))
12.4 Multinomial (nominal) Logistic Regression
Multinomial Logistic Regression fits a single model by specifing a reference level of the outcome and comparing each additional level to it. In our case we will choose not working
as the reference category adn get a set of parameter estimates (betas) for each of the two options part time
and full time
.
12.4.1 Fit Model 1: main effects only
Use multinom()
function in the base \(R\) nnet
package. You will also need the MASS
and \(R\) package (only to compute MLEs). Make sure to remove cases with missing data on predictors before modeling or use the na.action = na.omit
optin in the multinom()
model command.
<- nnet::multinom(working_ord ~ I(hincome - 14) + children + region,
fit_multnom_1 data = Womenlf_clean)
# weights: 24 (14 variable)
initial value 288.935032
iter 10 value 208.470682
iter 20 value 207.732796
iter 20 value 207.732796
iter 20 value 207.732796
final value 207.732796
converged
summary(fit_multnom_1,
corr = FALSE,
wald = TRUE)
Call:
nnet::multinom(formula = working_ord ~ I(hincome - 14) + children +
region, data = Womenlf_clean)
Coefficients:
(Intercept) I(hincome - 14) childrenpresent regionBC regionOntario
Part Time -1.7521373 0.005261435 0.1462009 1.0863549 0.2856917
Full Time 0.7240428 -0.100034170 -2.6977927 -0.4599247 0.1135573
regionPrairie regionQuebec
Part Time 0.5747258 -0.1105184
Full Time 0.4681016 -0.3116829
Std. Errors:
(Intercept) I(hincome - 14) childrenpresent regionBC regionOntario
Part Time 0.7204798 0.02468887 0.4901621 0.7193077 0.6175050
Full Time 0.6102008 0.02901623 0.3876731 0.7837044 0.6175128
regionPrairie regionQuebec
Part Time 0.7259118 0.6873048
Full Time 0.7332449 0.6515172
Residual Deviance: 415.4656
AIC: 443.4656
12.4.2 Fit Model 2: only significant predictors
Reduce the model by removing the region
variable.
<- nnet::multinom(working_ord ~ I(hincome - 14) + children,
fit_multnom_2 data = Womenlf_clean)
# weights: 12 (6 variable)
initial value 288.935032
iter 10 value 211.456740
final value 211.440963
converged
summary(fit_multnom_2,
corr = FALSE,
wald = TRUE)
Call:
nnet::multinom(formula = working_ord ~ I(hincome - 14) + children,
data = Womenlf_clean)
Coefficients:
(Intercept) I(hincome - 14) childrenpresent
Part Time -1.3357927 0.00688932 0.02149927
Full Time 0.6215469 -0.09723492 -2.55867912
Std. Errors:
(Intercept) I(hincome - 14) childrenpresent
Part Time 0.4340062 0.02345463 0.4690285
Full Time 0.2585136 0.02809670 0.3622077
Residual Deviance: 422.8819
AIC: 434.8819
12.4.3 Compre model fit
Check if we need to keep the region
variable in our model.
anova(fit_multnom_1, fit_multnom_2)
# A tibble: 2 x 7
Model `Resid. df` `Resid. Dev` Test ` Df` `LR stat.` `Pr(Chi)`
<chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 I(hincome - 14) ~ 520 423. "" NA NA NA
2 I(hincome - 14) ~ 512 415. "1 vs~ 8 7.42 0.492
::compare_performance(fit_multnom_1, fit_multnom_2, rank = TRUE) performance
# A tibble: 2 x 8
Name Model AIC BIC R2_Nagelkerke RMSE Sigma Performance_Score
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fit_multnom_2 multinom 435. 456. 0.300 0.397 1.28 0.6
2 fit_multnom_1 multinom 443. 493. 0.325 0.392 1.29 0.4
12.4.4 Extract parameters
12.4.4.1 Logit Scale
Here is one way to extract the parameter estimates, but recall they are in terms of the logit or log-odds, not probability.
::tidy(fit_multnom_2) %>%
broom::mutate(p.value = round(p.value, 4)) dplyr
# A tibble: 6 x 6
y.level term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Part Time (Intercept) -1.34 0.434 -3.08 0.0021
2 Part Time I(hincome - 14) 0.00689 0.0235 0.294 0.769
3 Part Time childrenpresent 0.0215 0.469 0.0458 0.963
4 Full Time (Intercept) 0.622 0.259 2.40 0.0162
5 Full Time I(hincome - 14) -0.0972 0.0281 -3.46 0.0005
6 Full Time childrenpresent -2.56 0.362 -7.06 0
12.4.4.2 Odds-Ratio Scale
The effects::allEffects()
function provides probability estimates for each outcome level for different levels of the predictors.
::allEffects(fit_multnom_2) effects
model: working_ord ~ I(hincome - 14) + children
hincome effect (probability) for Not at All
hincome
1 10 20 30 40
0.4265713 0.5819837 0.6888795 0.7333353 0.7439897
hincome effect (probability) for Part Time
hincome
1 10 20 30 40
0.1041120 0.1511290 0.1916462 0.2185644 0.2375547
hincome effect (probability) for Full Time
hincome
1 10 20 30 40
0.46931674 0.26688731 0.11947430 0.04810031 0.01845552
children effect (probability) for Not at All
children
absent present
0.3339937 0.7122698
children effect (probability) for Part Time
children
absent present
0.08828253 0.19236144
children effect (probability) for Full Time
children
absent present
0.57772376 0.09536878
12.4.5 Tabulate parameters
The texreg
package know how to handle this type of model and displays the parameters estimates in two separate columns.
::extract(fit_multnom_2) texreg
coef. s.e. p
Part Time: (Intercept) -1.33579274 0.43400624 2.085214e-03
Part Time: I(hincome - 14) 0.00688932 0.02345463 7.689645e-01
Part Time: childrenpresent 0.02149927 0.46902849 9.634395e-01
Full Time: (Intercept) 0.62154688 0.25851357 1.620301e-02
Full Time: I(hincome - 14) -0.09723492 0.02809670 5.387245e-04
Full Time: childrenpresent -2.55867912 0.36220774 1.616364e-12
GOF dec. places
AIC 434.8819 TRUE
BIC 456.3149 TRUE
Log Likelihood -211.4410 TRUE
Deviance 422.8819 TRUE
Num. obs. 263.0000 FALSE
K 3.0000 FALSE
::knitreg(fit_multnom_2,
texregcustom.model.name = c("b (SE)"),
custom.coef.map = list("Part Time: (Intercept)" = "PT-BL: No children, Husband Earns $14,000/yr",
"Part Time: I(hincome - 14)" = "PT-Husband's Income, in $1,000's",
"Part Time: childrenpresent" = "PT-Children in the Home",
"Full Time: (Intercept)" = "FT-BL: No children, Husband Earns $14,000/yr",
"Full Time: I(hincome - 14)" = "FT-Husband's Income, in $1,000's",
"Full Time: childrenpresent" = "FT-Children in the Home"),
groups = list("Part Time" = 1:3,
"Full Time" = 4:6),
single.row = TRUE)
b (SE) | |
---|---|
Part Time | |
PT-BL: No children, Husband Earns $14,000/yr | -1.34 (0.43)** |
PT-Husband’s Income, in $1,000’s | 0.01 (0.02) |
PT-Children in the Home | 0.02 (0.47) |
Full Time | |
FT-BL: No children, Husband Earns $14,000/yr | 0.62 (0.26)* |
FT-Husband’s Income, in $1,000’s | -0.10 (0.03)*** |
FT-Children in the Home | -2.56 (0.36)*** |
AIC | 434.88 |
BIC | 456.31 |
Log Likelihood | -211.44 |
Deviance | 422.88 |
Num. obs. | 263 |
K | 3 |
p < 0.001; p < 0.01; p < 0.05 |
%>% coef() fit_multnom_2
(Intercept) I(hincome - 14) childrenpresent
Part Time -1.3357927 0.00688932 0.02149927
Full Time 0.6215469 -0.09723492 -2.55867912
%>% coef() %>% exp() fit_multnom_2
(Intercept) I(hincome - 14) childrenpresent
Part Time 0.2629496 1.0069131 1.02173205
Full Time 1.8618058 0.9073428 0.07740692
%>% confint() %>% exp() fit_multnom_2
, , Part Time
2.5 % 97.5 %
(Intercept) 0.1123171 0.6156011
I(hincome - 14) 0.9616729 1.0542816
childrenpresent 0.4074734 2.5619744
, , Full Time
2.5 % 97.5 %
(Intercept) 1.12172715 3.0901640
I(hincome - 14) 0.85872767 0.9587102
childrenpresent 0.03805993 0.1574315
12.4.6 Plot Predicted Probabilities
NOTE: I’m not sure how to use the
interactions::interact_plot()
function with multinomial models.
12.4.6.1 Manually Compute
- estimates for probabilities and the associated 95% confidence intervals
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = c(20, 30, 40)),
mod = fit_multnom_2)
hincome*children effect (probability) for Not at All
children
hincome absent present
20 0.4323538 0.7350680
30 0.5929483 0.7516619
40 0.6834699 0.7502613
hincome*children effect (probability) for Part Time
children
hincome absent present
20 0.1184851 0.2058207
30 0.1740850 0.2254779
40 0.2149730 0.2411093
hincome*children effect (probability) for Full Time
children
hincome absent present
20 0.4491611 0.059111252
30 0.2329667 0.022860160
40 0.1015571 0.008629454
12.4.6.2 Wrange
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = c(20, 30, 40)),
mod = fit_multnom_2) %>%
data.frame()
# A tibble: 6 x 26
hincome children prob.Not.at.All prob.Part.Time prob.Full.Time
<dbl> <fct> <dbl> <dbl> <dbl>
1 20 absent 0.432 0.118 0.449
2 30 absent 0.593 0.174 0.233
3 40 absent 0.683 0.215 0.102
4 20 present 0.735 0.206 0.0591
5 30 present 0.752 0.225 0.0229
6 40 present 0.750 0.241 0.00863
# ... with 21 more variables: logit.Not.at.All <dbl>, logit.Part.Time <dbl>,
# logit.Full.Time <dbl>, se.prob.Not.at.All <dbl>, se.prob.Part.Time <dbl>,
# se.prob.Full.Time <dbl>, se.logit.Not.at.All <dbl>,
# se.logit.Part.Time <dbl>, se.logit.Full.Time <dbl>,
# L.prob.Not.at.All <dbl>, L.prob.Part.Time <dbl>, L.prob.Full.Time <dbl>,
# U.prob.Not.at.All <dbl>, U.prob.Part.Time <dbl>, U.prob.Full.Time <dbl>,
# L.logit.Not.at.All <dbl>, L.logit.Part.Time <dbl>, ...
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = c(20, 30, 40)),
mod = fit_multnom_2) %>%
data.frame() %>%
::select(hincome, children,
dplyrstarts_with("prob"),
starts_with("L.prob"),
starts_with("U.prob")) %>%
::rename(est_fit_none = prob.Not.at.All,
dplyrest_fit_part = prob.Part.Time,
est_fit_full = prob.Full.Time,
est_lower_none = L.prob.Not.at.All,
est_lower_part = L.prob.Part.Time,
est_lower_full = L.prob.Full.Time,
est_upper_none = U.prob.Not.at.All,
est_upper_part = U.prob.Part.Time,
est_upper_full = U.prob.Full.Time)
# A tibble: 6 x 11
hincome children est_fit_none est_fit_part est_fit_full est_lower_none
<dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 20 absent 0.432 0.118 0.449 0.306
2 30 absent 0.593 0.174 0.233 0.399
3 40 absent 0.683 0.215 0.102 0.418
4 20 present 0.735 0.206 0.0591 0.655
5 30 present 0.752 0.225 0.0229 0.597
6 40 present 0.750 0.241 0.00863 0.486
# ... with 5 more variables: est_lower_part <dbl>, est_lower_full <dbl>,
# est_upper_none <dbl>, est_upper_part <dbl>, est_upper_full <dbl>
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = c(20, 30, 40)),
mod = fit_multnom_2) %>%
data.frame() %>%
::select(hincome, children,
dplyrstarts_with("prob"),
starts_with("L.prob"),
starts_with("U.prob")) %>%
::rename(est_fit_none = prob.Not.at.All,
dplyrest_fit_part = prob.Part.Time,
est_fit_full = prob.Full.Time,
est_lower_none = L.prob.Not.at.All,
est_lower_part = L.prob.Part.Time,
est_lower_full = L.prob.Full.Time,
est_upper_none = U.prob.Not.at.All,
est_upper_part = U.prob.Part.Time,
est_upper_full = U.prob.Full.Time) %>%
::pivot_longer(cols = starts_with("est"),
tidyrnames_to = c(".value", "work_level"),
names_pattern = "est_(.*)_(.*)",
values_to = "fit")
# A tibble: 18 x 6
hincome children work_level fit lower upper
<dbl> <fct> <chr> <dbl> <dbl> <dbl>
1 20 absent none 0.432 0.306 0.568
2 20 absent part 0.118 0.0568 0.231
3 20 absent full 0.449 0.318 0.588
4 30 absent none 0.593 0.399 0.762
5 30 absent part 0.174 0.0728 0.361
6 30 absent full 0.233 0.104 0.443
7 40 absent none 0.683 0.418 0.866
8 40 absent part 0.215 0.0692 0.502
9 40 absent full 0.102 0.0256 0.327
10 20 present none 0.735 0.655 0.802
11 20 present part 0.206 0.145 0.283
12 20 present full 0.0591 0.0319 0.107
13 30 present none 0.752 0.597 0.861
14 30 present part 0.225 0.120 0.383
15 30 present full 0.0229 0.00782 0.0649
16 40 present none 0.750 0.486 0.905
17 40 present part 0.241 0.0887 0.509
18 40 present full 0.00863 0.00176 0.0411
12.4.6.3 Plot, version 1
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = seq(from = 1, to = 45, by = .1)),
mod = fit_multnom_2) %>%
data.frame() %>%
::select(hincome, children,
dplyrstarts_with("prob"),
starts_with("L.prob"),
starts_with("U.prob")) %>%
::rename(est_fit_none = prob.Not.at.All,
dplyrest_fit_part = prob.Part.Time,
est_fit_full = prob.Full.Time,
est_lower_none = L.prob.Not.at.All,
est_lower_part = L.prob.Part.Time,
est_lower_full = L.prob.Full.Time,
est_upper_none = U.prob.Not.at.All,
est_upper_part = U.prob.Part.Time,
est_upper_full = U.prob.Full.Time) %>%
::pivot_longer(cols = starts_with("est"),
tidyrnames_to = c(".value", "work_level"),
names_pattern = "est_(.*)_(.*)",
values_to = "fit") %>%
::mutate(work_level = work_level %>%
dplyrfactor() %>%
::fct_recode("Not at All" = "none",
forcats"Part Time" = "part",
"Full Time" = "full") %>%
::fct_rev()) %>%
forcatsggplot(aes(x = hincome,
y = fit)) +
geom_vline(xintercept = 14, color = "gray50") + # reference line for intercept
geom_hline(yintercept = 0.5, color = "gray50") + # 50% chance line for reference
geom_ribbon(aes(ymin = lower,
ymax = upper,
fill = work_level),
alpha = .3) +
geom_line(aes(color = work_level,
linetype = work_level),
size = 1) +
facet_grid(. ~ children, labeller = label_both) +
theme_bw() +
labs(x = "Husband's Income, in $1000's",
y = "Predicted Probability",
color = "Woman Works:",
fill = "Woman Works:",
linetype = "Woman Works:") +
theme(legend.position = "bottom",
legend.key.width = unit(2, "cm")) +
coord_cartesian(ylim = c(0, 1)) +
scale_linetype_manual(values = c("solid", "longdash", "dotted"))
12.4.6.4 Plot, version 2
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = seq(from = 1, to = 45, by = .1)),
mod = fit_multnom_2) %>%
data.frame() %>%
::select(hincome, children,
dplyrstarts_with("prob"),
starts_with("L.prob"),
starts_with("U.prob")) %>%
::rename(est_fit_none = prob.Not.at.All,
dplyrest_fit_part = prob.Part.Time,
est_fit_full = prob.Full.Time,
est_lower_none = L.prob.Not.at.All,
est_lower_part = L.prob.Part.Time,
est_lower_full = L.prob.Full.Time,
est_upper_none = U.prob.Not.at.All,
est_upper_part = U.prob.Part.Time,
est_upper_full = U.prob.Full.Time) %>%
::pivot_longer(cols = starts_with("est"),
tidyrnames_to = c(".value", "work_level"),
names_pattern = "est_(.*)_(.*)",
values_to = "fit") %>%
::mutate(work_level = factor(work_level,
dplyrlevels = c("none", "part", "full"),
labels = c("Not at All",
"Part Time",
"Full Time"))) %>%
ggplot(aes(x = hincome,
y = fit)) +
geom_vline(xintercept = 14, color = "gray50") + # reference line for intercept
geom_hline(yintercept = 0.5, color = "gray50") + # 50% chance line for reference
geom_ribbon(aes(ymin = lower,
ymax = upper,
fill = children),
alpha = .3) +
geom_line(aes(color = children,
linetype = children),
size = 1) +
facet_grid(. ~ work_level) +
theme_bw() +
labs(x = "Husband's Income, in $1000's",
y = "Predicted Probability",
color = "Children in the Home:",
fill = "Children in the Home:",
linetype = "Children in the Home:") +
theme(legend.position = "bottom",
legend.key.width = unit(2, "cm")) +
coord_cartesian(ylim = c(0, 1)) +
scale_linetype_manual(values = c("solid", "longdash")) +
scale_fill_manual(values = c("dodgerblue", "coral3"))+
scale_color_manual(values = c("dodgerblue", "coral3"))
12.4.7 Interpretation
Among women without children in the home and a husband making $14,000 annually, there is about 1:4 odds she is working part time verses not at all. and a 1.8:1 odds she is working full time.
For each additional thousand dollars the husband makes, the odds ratio decreases by about 10 percent that she is working full time, yet stay the same that she works part time.
If there are children in the home, the odds of working part time increase by 2 percent and there is a very unlikely change she works full time.
12.5 Proportional-odds (ordinal) Logistic Regression
This type of logisit regression model forces the predictors to have similar relationship with the outcome (slopes), but different means (intercepts). This is called the proportional odds assumption.
12.5.1 Fit the Model
Use polr()
function in the base \(R\) MASS
package. While outcome variable (dependent variable, “Y”) may be a regular factor
, it is preferable to specify it as an ordered
factor.
<- MASS::polr(working_ord ~ hincome + children,
fit_polr_1 data = Womenlf_clean)
summary(fit_polr_1)
Call:
MASS::polr(formula = working_ord ~ hincome + children, data = Womenlf_clean)
Coefficients:
Value Std. Error t value
hincome -0.0539 0.01949 -2.766
childrenpresent -1.9720 0.28695 -6.872
Intercepts:
Value Std. Error t value
Not at All|Part Time -1.8520 0.3863 -4.7943
Part Time|Full Time -0.9409 0.3699 -2.5435
Residual Deviance: 441.663
AIC: 449.663
12.5.2 Extract Parameters
12.5.2.1 Logit Scale
$zeta fit_polr_1
Not at All|Part Time Part Time|Full Time
-1.8520378 -0.9409261
%>% confint() fit_polr_1
2.5 % 97.5 %
hincome -0.09323274 -0.0166527
childrenpresent -2.54479931 -1.4177487
12.5.2.2 Odds-Ratio Scale
%>% coef() %>% exp() fit_polr_1
hincome childrenpresent
0.9475262 0.1391841
%>% confint() %>% exp() fit_polr_1
2.5 % 97.5 %
hincome 0.9109815 0.9834852
childrenpresent 0.0784888 0.2422588
12.5.2.3 Predicted Probabilities
::allEffects(fit_polr_1) effects
model: working_ord ~ hincome + children
hincome effect (probability) for Not at All
hincome
1 10 20 30 40
0.3968718 0.5166413 0.6469356 0.7585238 0.8433820
hincome effect (probability) for Part Time
hincome
1 10 20 30 40
0.22384583 0.21001062 0.17311759 0.12800001 0.08713914
hincome effect (probability) for Full Time
hincome
1 10 20 30 40
0.37928237 0.27334810 0.17994676 0.11347618 0.06947888
children effect (probability) for Not at All
children
absent present
0.2579513 0.7140863
children effect (probability) for Part Time
children
absent present
0.2057297 0.1472491
children effect (probability) for Full Time
children
absent present
0.5363190 0.1386646
12.5.3 Tabulate parameters
::knitreg(fit_polr_1,
texregcustom.model.name = "b (SE)",
custom.coef.map = list("hincome" = "Husband's Income, $1000's",
"childrenpresent" = "Children in the Home",
"Not at All|Part Time" = "Not at All vs.Part Time",
"Part Time|Full Time" = "Part Time vs.Full Time"),
groups = list("Coefficents" = 1:2,
"Intercepts" = 3:4),
caption = "Proportional-odds (ordinal) Logistic Regression",
caption.above = TRUE,
single.row = TRUE)
b (SE) | |
---|---|
Coefficents | |
Husband’s Income, $1000’s | -0.05 (0.02)** |
Children in the Home | -1.97 (0.29)*** |
Intercepts | |
Not at All vs.Part Time | -1.85 (0.39)*** |
Part Time vs.Full Time | -0.94 (0.37)* |
AIC | 449.66 |
BIC | 463.95 |
Log Likelihood | -220.83 |
Deviance | 441.66 |
Num. obs. | 263 |
p < 0.001; p < 0.01; p < 0.05 |
12.5.4 Plot Predicted Probabilities
12.5.4.1 Manually Compute
- estimates for probabilities and the associated 95% confidence intervals
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = c(20, 30, 40)),
mod = fit_polr_1)
hincome*children effect (probability) for Not at All
children
hincome absent present
20 0.3156092 0.7681570
30 0.4415146 0.8502980
40 0.5754175 0.9068653
hincome*children effect (probability) for Part Time
children
hincome absent present
20 0.2186091 0.12362226
30 0.2213518 0.08359281
40 0.1957828 0.05347906
hincome*children effect (probability) for Full Time
children
hincome absent present
20 0.4657817 0.10822076
30 0.3371336 0.06610916
40 0.2287997 0.03965563
12.5.4.2 Wrangle
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = c(20, 30, 40)),
mod = fit_polr_1) %>%
data.frame()
# A tibble: 6 x 26
hincome children prob.Not.at.All prob.Part.Time prob.Full.Time
<dbl> <fct> <dbl> <dbl> <dbl>
1 20 absent 0.316 0.219 0.466
2 30 absent 0.442 0.221 0.337
3 40 absent 0.575 0.196 0.229
4 20 present 0.768 0.124 0.108
5 30 present 0.850 0.0836 0.0661
6 40 present 0.907 0.0535 0.0397
# ... with 21 more variables: logit.Not.at.All <dbl>, logit.Part.Time <dbl>,
# logit.Full.Time <dbl>, se.prob.Not.at.All <dbl>, se.prob.Part.Time <dbl>,
# se.prob.Full.Time <dbl>, se.logit.Not.at.All <dbl>,
# se.logit.Part.Time <dbl>, se.logit.Full.Time <dbl>,
# L.prob.Not.at.All <dbl>, L.prob.Part.Time <dbl>, L.prob.Full.Time <dbl>,
# U.prob.Not.at.All <dbl>, U.prob.Part.Time <dbl>, U.prob.Full.Time <dbl>,
# L.logit.Not.at.All <dbl>, L.logit.Part.Time <dbl>, ...
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = c(20, 30, 40)),
mod = fit_polr_1) %>%
data.frame() %>%
::select(hincome, children,
dplyrstarts_with("prob"),
starts_with("L.prob"),
starts_with("U.prob")) %>%
::rename(est_fit_none = prob.Not.at.All,
dplyrest_fit_part = prob.Part.Time,
est_fit_full = prob.Full.Time,
est_lower_none = L.prob.Not.at.All,
est_lower_part = L.prob.Part.Time,
est_lower_full = L.prob.Full.Time,
est_upper_none = U.prob.Not.at.All,
est_upper_part = U.prob.Part.Time,
est_upper_full = U.prob.Full.Time)
# A tibble: 6 x 11
hincome children est_fit_none est_fit_part est_fit_full est_lower_none
<dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 20 absent 0.316 0.219 0.466 0.217
2 30 absent 0.442 0.221 0.337 0.274
3 40 absent 0.575 0.196 0.229 0.320
4 20 present 0.768 0.124 0.108 0.692
5 30 present 0.850 0.0836 0.0661 0.741
6 40 present 0.907 0.0535 0.0397 0.775
# ... with 5 more variables: est_lower_part <dbl>, est_lower_full <dbl>,
# est_upper_none <dbl>, est_upper_part <dbl>, est_upper_full <dbl>
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = c(20, 30, 40)),
mod = fit_polr_1) %>%
data.frame() %>%
::select(hincome, children,
dplyrstarts_with("prob"),
starts_with("L.prob"),
starts_with("U.prob")) %>%
::rename(est_fit_none = prob.Not.at.All,
dplyrest_fit_part = prob.Part.Time,
est_fit_full = prob.Full.Time,
est_lower_none = L.prob.Not.at.All,
est_lower_part = L.prob.Part.Time,
est_lower_full = L.prob.Full.Time,
est_upper_none = U.prob.Not.at.All,
est_upper_part = U.prob.Part.Time,
est_upper_full = U.prob.Full.Time) %>%
::pivot_longer(cols = starts_with("est"),
tidyrnames_to = c(".value", "work_level"),
names_pattern = "est_(.*)_(.*)",
values_to = "fit")
# A tibble: 18 x 6
hincome children work_level fit lower upper
<dbl> <fct> <chr> <dbl> <dbl> <dbl>
1 20 absent none 0.316 0.217 0.435
2 20 absent part 0.219 0.163 0.286
3 20 absent full 0.466 0.346 0.589
4 30 absent none 0.442 0.274 0.623
5 30 absent part 0.221 0.165 0.291
6 30 absent full 0.337 0.195 0.516
7 40 absent none 0.575 0.320 0.796
8 40 absent part 0.196 0.122 0.299
9 40 absent full 0.229 0.0926 0.463
10 20 present none 0.768 0.692 0.830
11 20 present part 0.124 0.0875 0.172
12 20 present full 0.108 0.0716 0.160
13 30 present none 0.850 0.741 0.919
14 30 present part 0.0836 0.0467 0.145
15 30 present full 0.0661 0.0327 0.129
16 40 present none 0.907 0.775 0.965
17 40 present part 0.0535 0.0210 0.129
18 40 present full 0.0397 0.0138 0.109
12.5.4.3 Plot, version 1
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = seq(from = 1, to = 45, by = .1)),
mod = fit_polr_1) %>%
data.frame() %>%
::select(hincome, children,
dplyrstarts_with("prob"),
starts_with("L.prob"),
starts_with("U.prob")) %>%
::rename(est_fit_none = prob.Not.at.All,
dplyrest_fit_part = prob.Part.Time,
est_fit_full = prob.Full.Time,
est_lower_none = L.prob.Not.at.All,
est_lower_part = L.prob.Part.Time,
est_lower_full = L.prob.Full.Time,
est_upper_none = U.prob.Not.at.All,
est_upper_part = U.prob.Part.Time,
est_upper_full = U.prob.Full.Time) %>%
::pivot_longer(cols = starts_with("est"),
tidyrnames_to = c(".value", "work_level"),
names_pattern = "est_(.*)_(.*)",
values_to = "fit") %>%
::mutate(work_level = work_level %>%
dplyrfactor() %>%
::fct_recode("Not at All" = "none",
forcats"Part Time" = "part",
"Full Time" = "full") %>%
::fct_rev()) %>%
forcatsggplot(aes(x = hincome,
y = fit)) +
geom_vline(xintercept = 14, color = "gray50") + # reference line for intercept
geom_hline(yintercept = 0.5, color = "gray50") + # 50% chance line for reference
geom_ribbon(aes(ymin = lower,
ymax = upper,
fill = work_level),
alpha = .3) +
geom_line(aes(color = work_level,
linetype = work_level),
size = 1) +
facet_grid(. ~ children, labeller = label_both) +
theme_bw() +
labs(x = "Husband's Income, in $1000's",
y = "Predicted Probability",
color = "Woman Works:",
fill = "Woman Works:",
linetype = "Woman Works:") +
theme(legend.position = "bottom",
legend.key.width = unit(2, "cm")) +
coord_cartesian(ylim = c(0, 1)) +
scale_linetype_manual(values = c("solid", "longdash", "dotted"))
12.5.4.4 Plot, version 2
::Effect(focal.predictors = c("hincome", "children"),
effectsxlevels = list(hincome = seq(from = 1, to = 45, by = .1)),
mod = fit_polr_1) %>%
data.frame() %>%
::select(hincome, children,
dplyrstarts_with("prob"),
starts_with("L.prob"),
starts_with("U.prob")) %>%
::rename(est_fit_none = prob.Not.at.All,
dplyrest_fit_part = prob.Part.Time,
est_fit_full = prob.Full.Time,
est_lower_none = L.prob.Not.at.All,
est_lower_part = L.prob.Part.Time,
est_lower_full = L.prob.Full.Time,
est_upper_none = U.prob.Not.at.All,
est_upper_part = U.prob.Part.Time,
est_upper_full = U.prob.Full.Time) %>%
::pivot_longer(cols = starts_with("est"),
tidyrnames_to = c(".value", "work_level"),
names_pattern = "est_(.*)_(.*)",
values_to = "fit") %>%
::mutate(work_level = factor(work_level,
dplyrlevels = c("none", "part", "full"),
labels = c("Not at All",
"Part Time",
"Full Time"))) %>%
ggplot(aes(x = hincome,
y = fit)) +
geom_vline(xintercept = 14, color = "gray50") + # reference line for intercept
geom_hline(yintercept = 0.5, color = "gray50") + # 50% chance line for reference
geom_ribbon(aes(ymin = lower,
ymax = upper,
fill = children),
alpha = .3) +
geom_line(aes(color = children,
linetype = children),
size = 1) +
facet_grid(. ~ work_level) +
theme_bw() +
labs(x = "Husband's Income, in $1000's",
y = "Predicted Probability",
color = "Children in the Home:",
fill = "Children in the Home:",
linetype = "Children in the Home:") +
theme(legend.position = "bottom",
legend.key.width = unit(2, "cm")) +
coord_cartesian(ylim = c(0, 1)) +
scale_linetype_manual(values = c("solid", "longdash")) +
scale_fill_manual(values = c("dodgerblue", "coral3"))+
scale_color_manual(values = c("dodgerblue", "coral3"))
12.5.5 Interpretation
Among women without children in the home and a husband making $14,000 annually, there is a 26% chance she is not working, 21% change she is working part time and just over a 53% change she is working full time.
For each additional thousand dollars the husband makes, the odds ratio decreases by about 5 percent that she is working part time vs not at all and 5% that she is working full time vs part time.
If there are children in the home, the odds ratio of working part time vs not at all decreases by 86% and similartly the odds ratio fo working full time vs part time also decreases by 86%.
12.6 Compare Model Fits: Multinomial vs. Ordinal
The multinomail and proportional-odds models aren’t truely nested, so you can NOT conduct a Likelihood-Ratio Test (aka Deviance Difference Test) with the anova()
command.
You can use the performance::compare_performance()
command to compare overal model performance via the Bayes factor (BF
).
::compare_performance(fit_multnom_2, fit_polr_1, rank = TRUE) performance
# A tibble: 2 x 8
Name Model AIC BIC R2_Nagelkerke RMSE Sigma Performance_Score
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fit_multnom_2 multinom 435. 456. 0.300 0.397 1.28 1
2 fit_polr_1 polr 450. 464. 0.236 1.60 1.30 0
12.6.1 Interpretation
The multinomial model looks to fit these data better than the proportional-odds (aka ordinal) logisic model.