Chapter 2

Simple Regression

2.1. Load the tidyverse package (you can ignore the notes that you see below that it gives you once you load it).

2.2. Import your data set into R.

2.3. Use a simple regression model to assess the relationship of interest between a continuous outcome and a continuous predictor (you pick).

2.4. This output gives you two values, that for the intercept and that for the slope of your predictor. What does the slope of the predictor mean here (i.e., interpret the value)?

Regression and Correlation

2.5. Using your data you already imported above, look at the correlation between those same variables. What is the correlation?

2.6. Let’s compare this to the regression value after we standardize both variables with scale(). First have to grab just the complete cases of the variables first.

2.7. The intercept is essentially zero and the slope is the same as the correlation you reported before. Should the intercept be zero?

Residuals

2.8. Assign the first model to fit (or any other name you want to use).

2.9. With that object, use the function resid() to produce the residuals of the model.

2.10. Often we are going to be looking at the residuals in plots. To do this simply, we’ll use plot(). You don’t need to know what the four plots mean, but know that it uses the residuals from the model to do it. Show these plots below.

Chapter 3

Multiple Regression

3.1. Use a multiple regression model to assess the relationship of interest while controlling for another variable of your choosing.

3.2. This output gives you three values, one for the intercept, one for the slope of your predictor of interest, and another for the slope of your covariate. What does the slope of your predictor of interest mean here (i.e., interpret the value)?

Partial Effects (partial regression, partial standardized effect, and partial correlation)

3.3. Using the your data you already imported above, let’s look at the correlation between between these three variables using furniture::tableC().

3.4. Are the covariate and the predictor of interest correlated?

3.5. Now, run the simple regression between your outcome and predictor and compare to the results from the multiple regression earlier. How does the estimate on the predictor of interest change?

3.6. Let’s compare this to the regression value after we standardize both variables with scale(). We first have to grab just the complete cases of the variables first (again, consider why).

3.7. How is the estimate on the predictor of interest interpreted in this case?

3.8. Run a partial correlation by using the residuals as shown in the example for chapter 3.

3.9. Are these values the same as the standardized regression? Should they be the same?

3.10. Consider if we had a variable that we wanted to control for (let’s call it $$c$$) but didn’t have access to it. We report on the regression below with $$x$$ predicting $$y$$. If we know that the correlation between $$x$$ and $$c$$ is positive and the correlation between $$y$$ and $$c$$ is positive, will the estimate on $$x$$ go up, down, or stay the same?

Chapter 4

4.1. Load the furniture package.

4.2. Use your simple regression model from Chapter 2 and use summary() to obtain the F-statistic of the model with it’s accompanying p-value, as well as the standard error, t-value, and p-value of the estimate itself.

4.3. This output gives you the estimate, the standard error of the estimate, the t-value of the estimate, and the p-value of the estimate (where the null is that there is no relationship). Is the relationship statistically significant at the $$\alpha = .05$$ level?

4.4. Let’s run the multiple regressions that you used in Chapter 3. How is the effect of interest interpreted in this case? Is it statistically significant?

4.5. Let’s compare this to the regression value after we standardize the variables with scale(). We first have to grab just the complete cases of the variables first. Is the standardized estimates more/less/same significant as the unstandardized?

4.6. Use your simple regression model (assign it to fit1) and then your multiple regression model (assign it to fit2). Compare the models with anova(). Does this match the significance of the predictor of interest above in the multiple regression?

Check Some Assumptions

4.7. Let’s test these assumptions using plots as we discussed in class. First, let’s see if the relationship between your predictor of interest and the outcome is linear. This one we can take a look at via a scatterplot with a smoothed line showing the relationship. (Note: due to overplotting–points on top of points–we used geom_count() instead of geom_point().) Does the relationship look linear?

4.8. Next take a look at homoscedasticity. Using the plot() function with our model object we get four plots. Does it look homoscedastic?

4.9. Using the plots from above, does it look like normality is violated?

4.10. Remember that if we want to make inference about a conditional mean (essentially a predicted point when we select the values of the X’s), we can use a simple trick: subtrack the values from the original variables and then the information on the intercept is the information for the conditional mean at that point. Let’s try this below by using your multiple regression model from before and centering your predictor of interest and a covariate at a value that you pick (needs to be part of the sample). Let’s say we want to get a confidence interval around that point. What is the standard error of this conditional mean?

4.11. What is the interpretation for the conditional mean intercept?

4.12. Get the confidence interval of this conditional mean using confint(). What does this 95% confidence interval mean?

Chapter 5

Dummy Variables

5.1. R is very helpful in creating dummy variables since it does all the work for you. All you need to do is let R know that the variable is a factor. Select one of your categorical variables and see if R already knows that it should be a factor or not. If, at the top of the column it says says <fct> then R already knows that it is a factor; otherwise we need use the factor() function to tell R that it was a factor. Is it currently a factor?

5.2. Using this categorical variable, let’s predict an outcome of interest (obtain the inferential statistics as well with summary()). The output for the coefficient now says the variable name plus one of the levels meaning the unlist level is the reference category and this is in comparison to that. With that in mind, what is the interpretation of the estimate? Is it significant?