```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```

## Introduction

Chapter 12 introduces the idea of modeling nonlinear relationships using linear regression. This is one of the many ways regression is flexible to many important situations. A nice, but incomplete, list of useful functions can be found [here](https://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf).

## Nonlinear Relationships

1. Let's start by loading the `tidyverse` package, the `furniture` package, and the `educ7610` package.
```{r}
library(tidyverse)
library(furniture)
library(educ7610)
```
2. Import the `poli` data set into R.
```{r}
data(poli)
```

### Descriptives and Visualizations 

3. Let's get some descriptive statistics of the variables of interest: political knowledge (`pknow`), age (`age`), news use (`npnews`), and SES (`ses`).
```{r}
poli %>%
  table1(pknow, age, npnews, ses,
         type = "condense")
```
4. Let's look at the relationship between political knowledge (`pknow`) and age (`age`).
```{r}
poli %>%
  ggplot(aes(age, pknow)) +
    geom_point() +
    geom_smooth()
```


### Polynomial Regression

5. Let's model the relationship that we see in question 5. That is, `pknow` regressed on `age` where we also include the quadratic effect of `age`. To add the quadratic effect directly in the model we can use `I(age^2)`. 
```{r}
poli %>%
  lm(pknow ~ age + I(age^2),
     data = .) %>%
  summary()
```
6. Is there evidence that there is a quadratic effect of age on political knowledge?
7. We could similarly create a new variable and put that in the model instead as below.
```{r}
poli %>%
  mutate(age2 = age^2) %>%
  lm(pknow ~ age + age2,
     data = .) %>%
  summary()
```
8. With these results, what is the peak age for political knowledge? How do we know that this is a peak (max) and not a min?
9. We haven't mean-centered age, yet. Let's do that and re-run the analysis.
```{r}
poli %>%
  mutate(age_centered = age - mean(age, na.rm = TRUE)) %>%
  lm(pknow ~ age_centered + I(age_centered^2),
     data = .) %>%
  summary()
```
10. What changed from the mean-centered model as compared to the previous one?
11. Does the peak age change from these results compared to what you computed earlier?
12. Finally, let's control for some covariates (SES and news use).
```{r}
poli %>%
  mutate(age_centered = age - mean(age, na.rm = TRUE)) %>%
  lm(pknow ~ age_centered + I(age_centered^2) + ses + npnews,
     data = .) %>%
  summary()
```
13. Is age or age$^2$ significant any more? Why would it change so much?
14. Let's look at the relationship of age with political knowledge after controlling for SES and news use.
```{r}
poli %>%
  mutate(pknow_sesnpnews = resid(lm(pknow ~ ses + npnews))) %>%
  ggplot(aes(age, pknow_sesnpnews)) +
    geom_point() +
    geom_smooth()
```

15. Does this plot look like what the model is telling us? 
16. Let's remove the quadratic effect of age since it doesn't look important once we are controlling for SES and news use.
```{r}
poli %>%
  mutate(age_centered = age - mean(age, na.rm = TRUE)) %>%
  lm(pknow ~ age_centered + ses + npnews,
     data = .) %>%
  summary()
```
17. Using this final model, what is the interpretation of this model? Include the interpretation of the covariates as well. (Note: the relationship between `pknow` and `npnews` is likely non-linear as well but we won't look into that here; see spline regression below.)


### Spline Regression

18. Based on the plot below, there may be two joints: one at `npnews == 1.5` and `npnews == 6`. Let's try these two and see what we find.
```{r}
poli %>%
  ggplot(aes(npnews, pknow)) +
    geom_point() +
    geom_smooth()
```

19. We can define our joints manually for a spline regression. Let's do that below.
```{r}
poli %>%
  mutate(j1 = ifelse(npnews >= 1.5, npnews, 0),
         j2 = ifelse(npnews >= 6.0, npnews, 0)) %>%
  lm(pknow ~ npnews + j1 + j2,
     data = .) %>%
  summary()
```
20. What do these results tell us about the relationship between `npnews` and `pknow`? Is spline regression necessary here?
21. We can further look at the predictions of the model and see how well they fit the observations. First, we'll save a model object and use that to get the predictions.
```{r}
poli %>%
  mutate(j1 = ifelse(npnews >= 1.5, npnews, 0),
         j2 = ifelse(npnews >= 6.0, npnews, 0)) %>%
  mutate(preds = predict(lm(pknow ~ npnews + j1 + j2))) %>%
  ggplot(aes(npnews, preds)) +
    geom_line() +
    geom_point(data = poli, 
               aes(npnews, pknow))
```

22. Does the shape of the line match the results of the model?


### Monotonic Transformations
#### Log

23. Now, let's use a log transformation using `log()`. Since there are some zeros, the natural log isn't defined. To fix this, we add 1 to `pknow` before taking the log. This model would be considered a "log-lin" model.
```{r}
poli %>%
  lm(log(pknow+1) ~ npnews, 
     data = .) %>%
  summary()
```
24. What is the interpretation of the coefficient on `npnews`?
25. Let's also do a log transformation of `npnews` in the previous model. It has the same issue as `pknow` so we can just add 1 to the values. This model would be considered a "log-log" model.
```{r}
poli %>%
  lm(log(pknow+1) ~ log(npnews+1), 
     data = .) %>%
  summary()
```
26. Let's see how well this fits our data. We can do that as we've done before by getting the model predictions and plotting them with the observed values. Let's do the log-lin model.
```{r}
poli %>%
  mutate(preds = predict(lm(log(pknow+1) ~ npnews))) %>%
  ggplot(aes(npnews, preds)) +
    geom_line() +
    geom_point(data = poli, 
               aes(npnews, log(pknow+1)))
```

27. Thought-experiment: What if we decided on a different adjustment so that `log()` will work? Let's try `+2` as well.
```{r}
poli %>%
  lm(log(pknow+2) ~ npnews, 
     data = .) %>%
  summary()
```
28. The estimate changed a little. Why might that be? (Hint: Think of the interpretation of the estimate and what that means for the actual values of the outcome.)

#### Box-Cox

29. Let's take a look at using "Box-Cox" transformations. This approach actually can look for the best *power* for the data. From the `MASS` package, we can use the `boxcox()` function. For this function to work, the outcome must be $>0$.
```{r}
lm(pknow+1 ~ npnews, data = poli) %>%
  MASS::boxcox()
```

30. This figure shows that the optimal power to use for this situation is 1. In other words, a box cox transformation is not necessary here (e.g., `pknow^1`). If the line was at .5, then that would be `pknow^.5` or `sqrt(pknow)` (same thing). The function tests powers between -2 and 2 for us, which is usually sufficient.


## Conclusion

Ironically, linear regression can model non-linear relationships very naturally. Each approach covered here are useful for specific situations and each has a decent interpretation.