Chapter 3 introduces multiple linear regression. The following examples help illustrate a number of principles that were discussed. Follow all instructions to complete Chapter 3.
GSS_reduced_example.csv
data set from Canvas or tysonbarrett.com/teaching to your computer. Save it in a directory you can access fairly easily.tidyverse
package.library(tidyverse)
gss <- read.csv("GSS_reduced_example.csv")
educ
) on income (income06
) while controlling for home population (hompop
).gss %>%
lm(income06 ~ educ + hompop,
data = .)
##
## Call:
## lm(formula = income06 ~ educ + hompop, data = .)
##
## Coefficients:
## (Intercept) educ hompop
## -18417 4286 7125
educ
, and another for the slope of hompop
. What does the slope of educ
mean here (i.e., interpret the value)?To better understand how educ
and hompop
will change the other’s simple effect to a partial effect, we can first check the correlation between the two. As a reminder, if the correlation between them is non-zero and they both are correlated with the outcome, the simple effect and the partial effect will differ (at least by a little).
gss %>%
furniture::tableC(income06, educ, hompop,
na.rm = TRUE)
##
## ────────────────────────────────────────────────
## [1] [2] [3]
## [1]income06 1.00
## [2]educ 0.341 (<.001) 1.00
## [3]hompop 0.207 (<.001) -0.058 (0.015) 1.00
## ────────────────────────────────────────────────
educ
and hompop
? That is, run both simple regressions and compare to the results from the multiple regression earlier.gss %>%
lm(income06 ~ educ,
data = .)
##
## Call:
## lm(formula = income06 ~ educ, data = .)
##
## Coefficients:
## (Intercept) educ
## 1742 4127
gss %>%
lm(income06 ~ hompop,
data = .)
##
## Call:
## lm(formula = income06 ~ hompop, data = .)
##
## Coefficients:
## (Intercept) hompop
## 41206 6486
educ
) on income (income06
) while controlling for home population (hompop
) and age (age
).gss %>%
lm(income06 ~ educ + hompop + age,
data = .)
##
## Call:
## lm(formula = income06 ~ educ + hompop + age, data = .)
##
## Coefficients:
## (Intercept) educ hompop age
## -33492.0 4319.8 8222.6 251.2
scale()
. We first have to grab just the complete cases of the variables first (again, consider why).gss %>%
filter(complete.cases(income06, educ, hompop, age)) %>%
mutate(incomeZ = scale(income06) %>% as.numeric,
educZ = scale(educ) %>% as.numeric,
hompopZ = scale(hompop) %>% as.numeric,
ageZ = scale(age) %>% as.numeric) %>%
lm(incomeZ ~ educZ + hompopZ + ageZ,
data = .)
##
## Call:
## lm(formula = incomeZ ~ educZ + hompopZ + ageZ, data = .)
##
## Coefficients:
## (Intercept) educZ hompopZ ageZ
## -1.196e-16 3.568e-01 2.628e-01 9.756e-02
sds <- gss %>%
filter(complete.cases(income06, educ, hompop, age)) %>%
summarize(s_educ = sd(educ),
s_hom = sd(hompop),
s_age = sd(age),
s_inc = sd(income06))
gss %>%
lm(income06 ~ educ + hompop + age,
data = .) %>%
coef() %>%
.[-1] * sds[,1:3]/sds[[4]]
## s_educ s_hom s_age
## 1 0.3568421 0.2628036 0.0975636
gss %>%
filter(complete.cases(income06, educ, hompop, age)) %>%
mutate(residincom = lm(income06 ~ hompop + age) %>% resid,
resideduc = lm(educ ~ hompop + age) %>% resid) %>%
furniture::tableC(residincom, resideduc)
##
## ───────────────────────────────────
## [1] [2]
## [1]residincom 1.00
## [2]resideduc 0.365 (<.001) 1.00
## ───────────────────────────────────
## Do not edit this part
set.seed(843)
df <- data_frame(
x = rnorm(100),
y = 2*x + rnorm(100, 2, 5)
)
df %>%
lm(y ~ x, data = .)
##
## Call:
## lm(formula = y ~ x, data = .)
##
## Coefficients:
## (Intercept) x
## 1.912 1.798
This was an introduction to some of the features of multiple regression that we’ll be using throughout the class. Although not much of a workflow here, each piece will play a role in larger analyses.