Chapter 8 talks about how we can assess the importance of an individual or a set of predictors. The principles discussed herein help us be able to test whether two predictors in the same regression model have the same effect on the outcome.
tidyverse
, the furniture
package, and educ7610
packages.library(tidyverse)
library(furniture)
library(educ7610)
risk
data set into R.data(risk)
dfoster == 1
) or if having a lack of water access (water == 0
) in the home are predictive of how happy they rate a negative image (neg3sad
).risk %>%
lm(neg3happy ~ dfoster + water,
data = .) %>%
summary()
##
## Call:
## lm(formula = neg3happy ~ dfoster + water, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.34031 -0.17813 -0.05088 -0.05088 1.15520
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.21305 0.05403 3.943 0.000141 ***
## dfosterFoster 0.12726 0.05717 2.226 0.028026 *
## waterYes -0.16218 0.06643 -2.441 0.016196 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2753 on 112 degrees of freedom
## (10 observations deleted due to missingness)
## Multiple R-squared: 0.066, Adjusted R-squared: 0.04932
## F-statistic: 3.957 on 2 and 112 DF, p-value: 0.02185
car
’s linearHypothesis()
. Instead of looking at if they are equal: \[
dfoster - water = 0
\] we will do: \[
dfoster - -water = 0
\] or, in other words, \[
dfoster + water = 0
\]risk %>%
lm(neg3happy ~ dfoster + water,
data = .) %>%
car::linearHypothesis("dfosterFoster + waterYes = 0")
## Linear hypothesis test
##
## Hypothesis:
## dfosterFoster + waterYes = 0
##
## Model 1: restricted model
## Model 2: neg3happy ~ dfoster + water
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 113 8.5075
## 2 112 8.4881 1 0.019359 0.2554 0.6143
dfoster
significantly more or less than water
?poverty.xlsx
to look at this (can be downloaded from Canvas or from this site).poverty <- readxl::read_excel("poverty.xlsx")
poverty_pct
) or violent crime (ViolentCrime
) predict teen birthrate (TeenBirth
). Let’s take a look at a multiple regression model.poverty %>%
lm(TeenBirth ~ poverty_pct + ViolentCrime,
data = .) %>%
summary()
##
## Call:
## lm(formula = TeenBirth ~ poverty_pct + ViolentCrime, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.5183 -6.4833 -0.3196 6.5035 14.7125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.6848 3.8686 4.571 3.41e-05 ***
## poverty_pct 1.6304 0.3119 5.227 3.70e-06 ***
## ViolentCrime 0.4037 0.1497 2.697 0.00962 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.329 on 48 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5428
## F-statistic: 30.68 on 2 and 48 DF, p-value: 2.608e-09
poverty %>%
lm(TeenBirth ~ poverty_pct + ViolentCrime,
data = .) %>%
car::linearHypothesis("poverty_pct = ViolentCrime")
## Linear hypothesis test
##
## Hypothesis:
## poverty_pct - ViolentCrime = 0
##
## Model 1: restricted model
## Model 2: TeenBirth ~ poverty_pct + ViolentCrime
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 49 3968.3
## 2 48 3329.9 1 638.44 9.203 0.003892 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
poverty %>%
mutate(poverty_std = scale(poverty_pct),
violent_std = scale(ViolentCrime)) %>%
lm(TeenBirth ~ poverty_std + violent_std,
data = .) %>%
summary()
##
## Call:
## lm(formula = TeenBirth ~ poverty_std + violent_std, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.5183 -6.4833 -0.3196 6.5035 14.7125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.243 1.166 36.220 < 2e-16 ***
## poverty_std 6.974 1.334 5.227 3.7e-06 ***
## violent_std 3.598 1.334 2.697 0.00962 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.329 on 48 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5428
## F-statistic: 30.68 on 2 and 48 DF, p-value: 2.608e-09
poverty %>%
mutate(poverty_std = scale(poverty_pct),
violent_std = scale(ViolentCrime)) %>%
lm(TeenBirth ~ poverty_std + violent_std,
data = .) %>%
car::linearHypothesis("poverty_std = violent_std")
## Linear hypothesis test
##
## Hypothesis:
## poverty_std - violent_std = 0
##
## Model 1: restricted model
## Model 2: TeenBirth ~ poverty_std + violent_std
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 49 3481.0
## 2 48 3329.9 1 151.09 2.178 0.1465
dominanceAnalysis
. Let’s install and load the package.#devtools::install_github("clbustos/dominanceAnalysis")
library(dominanceanalysis)
totadversity
and commadversity
) is more dominant.dominanceAnalysis(lm(TeenBirth ~ poverty_pct + ViolentCrime,
data = poverty))
## $predictors
## [1] "poverty_pct" "ViolentCrime"
##
## $constants
## NULL
##
## $fit.functions
## [1] "r2"
##
## $fits
## $fit.functions
## [1] "r2"
##
## $fits
## $fits$r2
## poverty_pct ViolentCrime
## 1 0.4946093 0.31129354
## poverty_pct NA 0.06651268
## ViolentCrime 0.2498284 NA
## poverty_pct+ViolentCrime NA NA
##
##
## $base.fits
## r2
## 1 0.0000000
## poverty_pct 0.4946093
## ViolentCrime 0.3112935
## poverty_pct+ViolentCrime 0.5611220
##
## $level
## [1] 0 1 1 2
##
## attr(,"class")
## [1] "daRawResults"
##
## $contribution.by.level
## $contribution.by.level$r2
## level poverty_pct ViolentCrime
## 1 0 0.4946093 0.31129354
## 2 1 0.2498284 0.06651268
##
##
## $contribution.average
## $contribution.average$r2
## poverty_pct ViolentCrime
## 0.3722189 0.1889031
##
##
## $complete
## $complete$r2
## poverty_pct ViolentCrime
## poverty_pct 0.5 1.0
## ViolentCrime 0.0 0.5
##
##
## $conditional
## $conditional$r2
## poverty_pct ViolentCrime
## poverty_pct 0.5 1.0
## ViolentCrime 0.0 0.5
##
##
## $general
## $general$r2
## poverty_pct ViolentCrime
## poverty_pct 0.5 1.0
## ViolentCrime 0.0 0.5
Comparing predictors in the same regression model is often of substantive interest. Knowing how, and when, to use this can be a great benefit to your research.