Before beginning to answer these questions, consider two continuous variables in your data set. One will be the outcome of interest, the other will be the predictor of interest. Pick ones that are most likely to have a non-linear relationship of the ones in your data (it is alright if there are no such relationships) or most likely to interact with each other (again, it is alright if there isn’t one present in your data).

Chapter 12 - Nonlinear Relationships

12.1. Provide descriptive statistics of the variables of interest of your choosing (using table1() or a similar function) including the outcome and predictor of interest as well as any covariates you are interested in.

12.2. Show a scatter plot with a smoothed line of your outcome and predictor of interest.

Polynomial Regression

12.3. Let’s model the relationship that we see in question 12.2. In the model include the quadratic effect of the predictor of interest as well to test for non-linearities. Is there evidence that there is a quadratic effect?

Answer:

12.4. If the quadratic effect is significant, what is the predictor’s value at the max or min of the outcome? If it wasn’t significant, how would you go about doing that if it were significant?

Answer:

12.5. If you haven’t mean-centered your predictor yet, do so now and re-run the regression model from above. What changed from the mean-centered model as compared to the previous one? Would the max or min change from these results compared to what you computed earlier?

Answer:

12.6. Include in the model you just ran at least one covariate. Is the quadratic effect significant now? If it changed, why would it?

Answer:

12.7. Using this final model, what is the interpretation of this model? Include the interpretation of the covariates as well.

Answer:

Spline Regression

12.8. Based on the plot from 12.2, does it look like spline regression would be useful hear? Why?

Answer:

12.9. How would you do spline regression here, if there were appropriate joints in the relationship? (Show the R code that you would use if there were joints.)

Monotonic Transformations

Log

12.10. Use a log transformation using log(). This model would be considered a “log-lin” model. (If there are some zeros in your outcome variable, the natural log isn’t defined. To fix this, you can add 1 to the outcome before taking the log.) What is the interpretation of the coefficient on your predictor of interest?

Answer:

12.11. Let’s also do a log transformation of your predictor of interest in the previous model. This model would be considered a “log-log” model. Which model fits the data better? Use the anova() function to check this (it isn’t shown in the example so one is provided for you below; note that you will need to change the eval = FALSE to eval = TRUE when you want it to run this chunk).

fit1 <- lm(log(outcome) ~ predictor, data = data)
fit2 <- lm(log(outcome) ~ log(predictor), data = data)
anova(fit1, fit2)

Answer:

Box-Cox

12.12. Let’s take a look at using “Box-Cox” transformations. This approach actually can look for the best power for the data (not statistical power but power in the algebra sense). From the MASS package, we can use the boxcox() function. For this function to work, the outcome must be \(>0\). The figure shows you the optimal power to use in this situation. An example is shown below. (Note you will need to change the eval = FALSE to eval = TRUE when you want it to run this chunk.) From the figure shown, what is the optimal power to use for this situation?

lm(outcome ~ predictor, data = data) %>%
  MASS::boxcox()

Answer:

Chapters 13 and 14 - Interactions (Moderation)

13.1. Interactions are often either theory-driven or data-driven. Of the two, theory-driven is often the most respected reason to test and report on an interaction. If you have a reason to believe an interaction would exist between two predictors, use that for the following examples. Show the potential interaction in a scatter plot.

13.2. Test this interaction by running a linear regression model with the cross product of the two predictors. R is smart enough that if we put predictor1 * predictor2 in the formula (where those are the actual variable names), it will include all the right effects for us. What is the interpretation of each coefficient? Does this interpretation match the figure we created before? Why or why not?

Answer:

13.3. Let’s include some covariates in the model. First, is the interaction still significant? Did it decrease or increase with the covariates added? Next, what does the coefficient on the two predictors mean here?

Answer:

Conditional Effects

13.4. It can be beneficial to mean-center our predictors (if continuous) before doing an interaction. Mean-center your two predictors and re-run the analysis above. Interpret the coefficients again.

Answer:

Probing Interactions

13.5. We will be using the fantastic package called interactions. Here, run a regression with an interaction term. Then use interactions::interact_plot() to visualize the interaction. Interpret the interaction as shown in the figure.

Answer:

13.6. We also want to look at the Johnson-Neyman Interval. Let’s do this with interactions::sim_slopes() on the regression model with an interaction term in 13.5.

Chapters 16 and 17 - Irregularities

Extreme Cases

16.1. Let’s look at some simple descriptives to see if there are any apparent extreme cases. Note that the output will be split into two sections but you can connect the lines by looking at the row number. (Note that for educ7610::diagnostics() to work, we have to put the actual data in the data argument in the lm() function as we did below. And you need to have educ7610 from GitHub installed: devtools::install_github("tysonstanley/educ7610").) Are there any problematic residuals, cooks, or hat values?

Answer:

Assumptions

16.2. There are a few ways to check normality and homoscedasticity. In R the easiest way is to use the plot() function. Assess each plot.

Answer:

  • “Residuals vs Fitted” plot:
  • “Normal Q-Q” plot:
  • “Scale-Location” plot:
  • “Residuals vs Leverage” plot:

16.3. Which of these plots are most important? (Just argue for your case.)

Answer: