Chapter 3 introduces multiple linear regression. The following examples help illustrate a number of principles that were discussed. Follow all instructions to complete Chapter 3.
GSS_reduced_example.csv data set from Canvas or tysonbarrett.com/teaching to your computer. Save it in a directory you can access fairly easily.tidyverse package.library(tidyverse)
gss <- read.csv("GSS_reduced_example.csv")
educ) on income (income06) while controlling for home population (hompop).gss %>%
lm(income06 ~ educ + hompop,
data = .)
##
## Call:
## lm(formula = income06 ~ educ + hompop, data = .)
##
## Coefficients:
## (Intercept) educ hompop
## -18417 4286 7125
educ, and another for the slope of hompop. What does the slope of educ mean here (i.e., interpret the value)?To better understand how educ and hompop will change the other’s simple effect to a partial effect, we can first check the correlation between the two. As a reminder, if the correlation between them is non-zero and they both are correlated with the outcome, the simple effect and the partial effect will differ (at least by a little).
gss %>%
furniture::tableC(income06, educ, hompop,
na.rm = TRUE)
##
## ────────────────────────────────────────────────
## [1] [2] [3]
## [1]income06 1.00
## [2]educ 0.341 (<.001) 1.00
## [3]hompop 0.207 (<.001) -0.058 (0.015) 1.00
## ────────────────────────────────────────────────
educ and hompop? That is, run both simple regressions and compare to the results from the multiple regression earlier.gss %>%
lm(income06 ~ educ,
data = .)
##
## Call:
## lm(formula = income06 ~ educ, data = .)
##
## Coefficients:
## (Intercept) educ
## 1742 4127
gss %>%
lm(income06 ~ hompop,
data = .)
##
## Call:
## lm(formula = income06 ~ hompop, data = .)
##
## Coefficients:
## (Intercept) hompop
## 41206 6486
educ) on income (income06) while controlling for home population (hompop) and age (age).gss %>%
lm(income06 ~ educ + hompop + age,
data = .)
##
## Call:
## lm(formula = income06 ~ educ + hompop + age, data = .)
##
## Coefficients:
## (Intercept) educ hompop age
## -33492.0 4319.8 8222.6 251.2
scale(). We first have to grab just the complete cases of the variables first (again, consider why).gss %>%
filter(complete.cases(income06, educ, hompop, age)) %>%
mutate(incomeZ = scale(income06) %>% as.numeric,
educZ = scale(educ) %>% as.numeric,
hompopZ = scale(hompop) %>% as.numeric,
ageZ = scale(age) %>% as.numeric) %>%
lm(incomeZ ~ educZ + hompopZ + ageZ,
data = .)
##
## Call:
## lm(formula = incomeZ ~ educZ + hompopZ + ageZ, data = .)
##
## Coefficients:
## (Intercept) educZ hompopZ ageZ
## -1.196e-16 3.568e-01 2.628e-01 9.756e-02
sds <- gss %>%
filter(complete.cases(income06, educ, hompop, age)) %>%
summarize(s_educ = sd(educ),
s_hom = sd(hompop),
s_age = sd(age),
s_inc = sd(income06))
gss %>%
lm(income06 ~ educ + hompop + age,
data = .) %>%
coef() %>%
.[-1] * sds[,1:3]/sds[[4]]
## s_educ s_hom s_age
## 1 0.3568421 0.2628036 0.0975636
gss %>%
filter(complete.cases(income06, educ, hompop, age)) %>%
mutate(residincom = lm(income06 ~ hompop + age) %>% resid,
resideduc = lm(educ ~ hompop + age) %>% resid) %>%
furniture::tableC(residincom, resideduc)
##
## ───────────────────────────────────
## [1] [2]
## [1]residincom 1.00
## [2]resideduc 0.365 (<.001) 1.00
## ───────────────────────────────────
## Do not edit this part
set.seed(843)
df <- data_frame(
x = rnorm(100),
y = 2*x + rnorm(100, 2, 5)
)
df %>%
lm(y ~ x, data = .)
##
## Call:
## lm(formula = y ~ x, data = .)
##
## Coefficients:
## (Intercept) x
## 1.912 1.798
This was an introduction to some of the features of multiple regression that we’ll be using throughout the class. Although not much of a workflow here, each piece will play a role in larger analyses.