5 One-Way ANOVA

Required Packages

library(tidyverse)    # Loads several very helpful 'tidy' packages
library(furniture)    # Nice tables (by our own Tyson Barrett)
library(car)          # Companion for Applied Regression (and ANOVA)
library(afex)         # Analysis of Factorial Experiments
library(emmeans)      # Estimated marginal means (Least-squares means)
library(multcomp)     # Simultaneous Inference in General Parametric Models 
library(readxl)       # Necessary for reading in an example data set

5.1 Prepare for Modeling

5.1.1 Ensure the Data is in “long” Format

First, the data must be restructured from wide to long format, so that each observation is on its own line. All categorical variables must be declared as fators. We also must add an distinct indicator variable.

# convert the dataset: wide --> long
data_long <- data_wide %>% 
  tidyr::gather(key   = group_IV,                      # new var name = groups
                value = continuous_DV,                 # new var name = measurements
                var_1, var_2, var_3, ... , var_k) %>%  # all old variable names
  dplyr::mutate(id_var = row_number()) %>%             # create a sequential id variable
  dplyr::select(id_var, group_IV, continuous_DV) %>%   # reorder the variables
  dplyr::mutate_at(vars(id_var, group_IV), factor)     # declare factors

data_long %>% head(n = 10)                             # display the top 10 rows only

5.1.2 Compute Summary Statistics

Second, check the summary statistics for each of the \(k\) groups.

# Raw data: summary table
data_long %>% 
  dplyr::group_by(group_IV) %>%          # divide into groups
  furniture::table1(continuous_DV)       # gives M(SD)

5.1.3 Plot the Raw Data

Third, plot the data to eyeball the potential effect. Remember the center line in each box represents the median, not the mean.

# Raw data: boxplots
data_long %>% 
  ggplot(aes(x = group_IV,
             y = continuous_DV)) + 
  geom_boxplot() +
  geom_point() 
# Raw data: Mean-SD plots
data_long %>% 
  ggplot(aes(x = group_IV,
             y = continuous_DV)) + 
  stat_summary() 

5.2 Fitting One-way ANOVA Model

The aov_4() function from the afex package fits ANOVA models (oneway, two-way, repeated measures, and mixed design). It needs at least two arguments:

  1. formula: continuous_DV ~ group_IV + (1|id_var) one observation per subject and id_var is distinct for each subject

  2. dataset: data = . we use the period to signify that the datset is being piped from above

Here is an outline of what your syntax should look like when you fit and save a one-way ANOVA. Of course you will replace the dataset name and the variable names, as well as the name you are saving it as.

NOTE: The aov_4() function works on data in LONG format only. Each observation needs to be on its one line or row with seperate variables for the group membership (categorical factor or fct) and the continuous measurement (numberic or dbl).

# One-way ANOVA: fit and save
aov_name <- data_long %>% 
  afex::aov_4(continuous_DV ~ group_IV + (1|id_var),
              data = .)

5.3 ANOVA Output

By running the name you saved you model under, you will get a brief set of output, including a measure of Effect Size.

NOTE: The ges is the generalized eta squared. In a one-way ANOVA, the eta-squared effect size is only one value, ie. generalized \(\eta_g\) and partial \(\eta_p\) are the same.

# Display basic ANOVA results (includes effect size)
aov_name 

To fully fill out a standard ANOVA table and compute other effect sizes, you will need a more complete set of output, including the Sum of Squares components, you will need to add $Anova at the end of the model name before running it.

NOTE: IGNORE the first line that starts with (Intercept)! Also, the ‘mean sum of squares’ are not included in this table, nor is the Total line at the bottom of the standard ANOVA table. You will need to manually compute these values and add them on the homework page. Remember that Sum of Squares (SS) and degrees of freedom (df) add up, but Mean Sum of Squreas (MS) do not add up. Also: MS = SS/df for each term.

# Display fuller ANOVA results (includes sum of squares)
aov_name$Anova 

5.4 Inho data set example

Import Data, Define Factors, and Compute New Variables

  • Make sure the dataset is saved in the same folder as this file
  • Make sure the that folder is the working directory

NOTE: I added the second line to convert all the variables names to lower case. I still kept the F as a capital letter at the end of the five factor variables.

ihno_clean <- read_excel("Ihno_dataset.xls") %>% 
  dplyr::rename_all(tolower) %>% 
  dplyr::mutate(genderF = factor(gender, 
                                 levels = c(1, 2),
                                 labels = c("Female", 
                                            "Male"))) %>% 
  dplyr::mutate(majorF = factor(major, 
                                levels = c(1, 2, 3, 4,5),
                                labels = c("Psychology",
                                           "Premed",
                                           "Biology",
                                           "Sociology",
                                           "Economics"))) %>% 
  dplyr::mutate(reasonF = factor(reason,
                                 levels = c(1, 2, 3),
                                 labels = c("Program requirement",
                                            "Personal interest",
                                            "Advisor recommendation"))) %>% 
  dplyr::mutate(exp_condF = factor(exp_cond,
                                   levels = c(1, 2, 3, 4),
                                   labels = c("Easy",
                                              "Moderate",
                                              "Difficult",
                                              "Impossible"))) %>% 
  dplyr::mutate(coffeeF = factor(coffee,
                                 levels = c(0, 1),
                                 labels = c("Not a regular coffee drinker",
                                            "Regularly drinks coffee")))  %>% 
  dplyr::mutate(hr_base_bps = hr_base / 60) 

It is important that we test for violations of the assumption of Homogeneity of variance.

# Levene's Test of HOV
data_name %>% 
  car::leveneTest(continuous_var ~ group_var, 
                  data = .,            
                  center = "mean")     

The aov_4() function from the afex package fits ANOVA models (oneway, two-way, repeated measures, and mixed design). It needs at least two arguments:

  1. formula: continuous_var ~ group_var + (1|id_var) one observation per subject and id_var is distinct for each subject

  2. dataset: data = . we use the period to signify that the datset is being piped from above

Here is an outline of what your syntax should look like when you fit and save a one-way ANOVA. Of course you will replace the dataset name and the variable names, as well as the name you are saving it as.

NOTE: The aov_4() function works on data in LONG format only. Each observation needs to be on its one line or row with seperate variables for the group membership (categorical factor or fct) and the continuous measurement (numberic or dbl).

# One-way ANOVA: fit and save
aov_name <- data_name %>% 
  afex::aov_4(continuous_var ~ group_var + (1|id_var),
              data = .)

5.4.1 Basic Output - stored name of model

By running the name you saved you model under, you will get a brief set of output, including a measure of Effect Size.

NOTE: The ges is the generalized eta squared. In a one-way ANOVA, the eta-squared effect size is the same value, ie. generalized (\(\eta_g^2\)) and partial \(\eta_p^2\)) are the same.

# Display basic ANOVA results (includes effect size)
aov_name 

5.4.2 Fuller Output - add $Anova on model name

To fully fill out a standard ANOVA table and compute other effect sizes, you will need a more complete set of output, including the Sum of Squares components, you will need to add $Anova at the end of the model name before running it.

NOTE: IGNORE the first line that starts with (Intercept)! Also, the ‘mean sum of squares’ are not included in this table, nor is the Total line at the bottom of the standard ANOVA table. You will need to manually compute these values and add them on the homework page. Remember that Sum of Squares (SS) and degrees of freedom (df) add up, but Mean Sum of Squreas (MS) do not add up. Also: MS = SS/df for each term.

# Display fuller ANOVA results (includes sum of squares)
aov_name$Anova 

5.4.3 All Pairwise Comparisons with pairs()

There are two steps to conduct all possible pairwise comparisons:

  1. emmeans(~ group_var) - Calculate the Estimated Marinal Means
  2. pairs() - Determine if each pair is significantly different

Within the pairs() function there are several options for controling for multiple comparisons, including:

  • adjust = "none" - Fisher’s LSD
  • adjust = "tukey" - Tukey’s HSD
  • adjust = "bon" - Bonferroni
# Pairwise post hoc: Tukey's HSD adjustment for multiple comparisons
aov_name %>% 
  emmeans::emmeans(~ group_var) %>%        # Calculate Estimated Marinal Means
  pairs(adjust = "tukey")                  # Is each pair signif different?

5.4.4 Contrast Statements with emmeans::contrast()

There are two steps to conduct a contrast comparison:

  1. emmeans(~ group_var) - Calculate the Estimated Marinal Means
  2. contrast() - Determine if each pair is significantly different
Inside the contrast statement, list the named sets of linear contrast weights. We will only be doing one-at-a-time, but we must still use a nested list.

NOTE: You must provide one weight (\(c_i\)) for each of the \(k\) groups. If you wish to ignore a group, that group’s weight is \(c_i = 0\). The sum total of all the weights must be zero (\(\sum c_i = 0\)), so use positive and negative numbers.

# Contrast statement : Impossible vs. Rest
aov_name %>% 
  emmeans::emmeans(~ group_var) %>% 
  emmeans::contrast(list("your contrast name" = c(c_1, c_2, ... , c_k)))