```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
Chapter 2 introduces simple linear regression. The following examples help illustrate a number of principles that were discussed. Follow all instructions to complete Chapter 2.
## Simple Regression
1. Download the `GSS_reduced_example.csv` data set from [Canvas](https://login.usu.edu/cas/login?service=https%3a%2f%2fmy.usu.edu%2f) or [tysonbarrett.com/teaching](https://tysonstanley.github.io/teaching) to your computer. Save it in a directory you can access fairly easily.
2. Open RStudio, start a new R script or RMarkdown document.
3. Load the `tidyverse` package (you can ignore the notes that you see below that it gives you once you load it).
```{r}
library(tidyverse)
```
3. Import it into R.
```{r, eval = FALSE}
gss <- read.csv("GSS_reduced_example.csv")
```
```{r, echo = FALSE}
gss <- read.csv(here::here("GSS_Data/Data/GSS_reduced_example.csv"))
```
4. Use a simple regression model to assess the effect of the number of years of education (`educ`) on income (`income06`).
```{r}
gss %>%
lm(income06 ~ educ,
data = .)
```
5. This output gives you two values, that for the intercept and that for the slope of `educ`. What does the slope of `educ` mean here (i.e., interpret the value)?
## Regression and Correlation
Simple regression and correlation are intimately tied. Let's show that below.
1. Using the GSS data you already imported above, let's look at the correlation between income and years of education.
```{r}
gss %>%
furniture::tableC(income06, educ,
na.rm = TRUE)
```
2. Let's compare this to the regression value after we standardize both variables with `scale()`. We first have to grab just the complete cases of the variables first (consider why).
```{r}
gss %>%
filter(complete.cases(income06, educ)) %>%
mutate(incomeZ = scale(income06) %>% as.numeric,
educZ = scale(educ) %>% as.numeric) %>%
lm(incomeZ ~ educZ,
data = .)
```
3. The intercept is essentially zero and the slope is the same as the correlation before. Is this surprising? Why or why not?
## Residuals
Residuals can help us understand several things about the model and the relations we are assessing. Most of this stuff we'll talk about later but here's some ways we'll access the residuals.
1. Assign the first model to `fit` (or any other name you want to use).
```{r}
fit <- gss %>%
lm(income06 ~ educ,
data = .)
```
2. With that object, use the function `resid()` to produce the residuals of the model. (Note that `head()` was used just to see the first 6 lines instead of all two thousand lines.)
```{r}
resid(fit) %>%
head()
```
3. Often we are going to be looking at the residuals in plots. To do this simply, we'll use `plot()`. You don't need to know what the four plots mean, but know that it uses the residuals from the model to do it.
```{r, eval = FALSE}
plot(fit)
```
```{r, echo = FALSE}
par(mfrow = c(2,2))
plot(fit)
```
## Conclusion
This was an introduction to many of the features of regression that we'll be using throughout the class. Although not much of a workflow here, each piece will play a role in larger analyses.