5 One-Way ANOVA: Code
Required Packages
library(tidyverse) # Loads several very helpful 'tidy' packages
library(furniture) # Nice tables (by our own Tyson Barrett)
library(afex) # Analysis of Factorial Experiments
library(emmeans) # Estimated marginal means (Least-squares means)
library(readxl) # Necessary for reading in an example data set
5.1 Prepare Work
5.1.0.1 Hypotheses
H_0: There is no difference in the DV average between IV group
H_a: The DV average is different for at least one IV group
5.2 Exploratory Data Analysis
5.2.1 Compute Summary Statistics
NOTE: Calculate summary statistics of the DV by the groups to the data to eyeball the potential effect or differences in the means. Also check if the standard deviations and samples sizes are somewhat similar in each group.
%>%
data_long ::group_by(group_IV) %>% # divide into groups
dplyr::table1(continuous_DV # gives M(SD)
furnituredigits = 2,
na.rm = FALSE,
total = TRUE,
output = "markdown")
5.2.2 Plot the Observed Data
NOTE: Plot the data to eyeball the potential effect. Remember the center line in each box represents the median, not the mean.
5.2.2.1 Side-by-Side Boxplots
%>%
data_long ggplot(aes(x = group_IV, # groups along the x-axis
y = continuous_DV)) + # DV on the y-axis
geom_boxplot() +
geom_point() +
theme_bw()
5.2.2.2 Mean by Group Dot-Plot
%>%
data_long ggplot(aes(x = group_IV, # groups along the x-axis
y = continuous_DV)) + # DV on the y-axis
stat_summary() +
theme_bw()
5.2.2.3 Violin-&-Boxplots
%>%
data_long ggplot(aes(x = group_IV, # groups along the x-axis
y = continuous_DV)) + # DV on the y-axis
geom_violin(fill = "gray") +
geom_boxplot(width = .07,
fill = "white") +
geom_jitter(position = position_jitter(0.21)) +
stat_summary(fun = mean,
geom = "point",
shape = 18,
color = "red",
size = 5) +
theme_bw()
5.3 Assumption Checks
5.3.1 Independent, Random Samples
The Sample(s) was drawn at random (at least as representative as possible)
NOTE: Nothing can be done to fix NON-representative samples and there is no statistically test to test this.
5.3.2 Normality of the DV in each Group
A variable is said to follow the normal distribution if it resembles the normal curve. Specifically it is symetrical, unimodal, and bell shaped.
The continuous variable has a NORMAL distribution in ALL relevant populations
- Not as important if the sample is large (Central Limit Theorem)
- IF the sample is far from normal &/or small, might want to use a different method
Options to judging normality:
-
Visualization of each sample’s distribution
- Stacked histograms, but is sensitive to binning choices (number or width)
- Side-by-side boxplots, shows median instead of mean as central line
- Seperate QQ plots (straight \(45^\circ\) line), but is sensitive to outliers!
-
Calculate Skewness and Kurtosis
-
Divided each value by its standard error (SE)
-
A result \(\gt \pm 2\) indicates
issues
- Best used when sample is small, n < 30’ish
-
Divided each value by its standard error (SE)
-
Formal Inferential Tests for Normality
- Null-hypothesis: population is normally distributed
- A \(p \lt .05\) ???indicate snon-normality
- For smaller samples, use Shapiro-Wilk’s Test
- For larger samples, use Kolmogorov-Smirnov’s Test
5.3.2.1 Histogram by Groups
%>%
data_long ggplot(aes(continuous_var)) + # DV along the x-axis
geom_histogram(fill = "gray",
color = "black",
alpha = .5) +
facet_grid(group_var ~ ., # Panel by group
labeller = label_both) +
theme_bw()
5.3.2.2 QQ Plot by Group
%>%
data_long ggplot(aes(sample = continuous_var)) + # make sure to include "sample = DV"
geom_qq() +
stat_qq_line() +
facet_wrap(~ group_var, # Panel by group
labeller = label_both) +
theme_bw()
5.3.2.3 Skewness and Kurtosis Values
%>%
data_long ::describeBy(continuous_var ~ group_var, # Formula: DV ~ groups
psychdata = .,
mat = TRUE,
digits = 2)
5.3.2.4 Skapiro-Wilk’s Test
%>%
data_long ::group_by(group_var) %>% # By group
dplyr::summarise(W.stat = shapiro.test(continuous_var)$statistic, # For the DV
dplyrp.value = shapiro.test(continuous_var)$p.value) # For the DV
5.3.2.5 Skewness, Kurtosis, & Shapiro-Wilk’s Test
%>%
data_long ::nest_by(group_var) %>% # By group
dplyr::summarise(summ = list(pastecs::stat.desc(data$continuous_var, # For the DV
dplyrnorm = TRUE))) %>%
::mutate(summ = list(summ %>%
dplyrunlist() %>%
data.frame() %>%
::rename("value" = ".") %>%
dplyr::rownames_to_column(var = "stat"))) %>%
tibble::unnest(summ) %>%
tidyr::filter(stat %in% c("skewness", "skew.2SE",
dplyr"kurtosis", "kurt.2SE",
"normtest.W", "normtest.p")) %>%
::pivot_wider(names_from = stat,
tidyrvalues_from = value)
skew.2SE
- Calculateed as
skew
divided by 2 times the SE of skewness - If > 1, then we can conclude violations to normality
- Calculateed as
kurt.2SE
- Calculateed as
kurt
divided by 2 times the SE of skewness - If > 1, then we can conclude violations to normality
- Calculateed as
5.3.3 Homogeneity of Variance (HOV) in the DV between groups
All three of the Tests below have:
- \(H_0\): all populations’ variances are equal
- \(H_1\) At least two of the populations have different variances
NOTE: When using Bartlett’s test one of the main assumptions data should be normally distributed. In the case of non-normal data, the Levene’s test is an alternative to the Bartlett test. The Fligner-Killeen test is one of the many tests for homogeneity of variances which is most robust against departures from normality.
5.3.3.1 Bartlett’s Test
%>%
df_long bartlett.test(continuous_var ~ group_var, # Formula: DV ~ groups
data = .)
5.4 One-way ANOVA Model
5.4.1 Fit & Save the Model
The aov_4()
function from the afex
package fits ANOVA models (oneway, two-way, repeated measures, and mixed design). It needs at least two arguments:
formula:
continuous_DV ~ group_IV + (1|id_var)
one observation per subject andid_var
is distinct for each subjectdataset:
data = .
we use the period to signify that the datset is being piped from above
Here is an outline of what your syntax should look like when you fit and save a one-way ANOVA. Of course you will replace the dataset name and the variable names, as well as the name you are saving it as.
NOTE: The aov_4()
function works on
data in LONG format only. Each observation needs to be on its one line
or row with seperate variables for the group membership (categorical
factor or fct
) and the continuous measurement (numberic or
dbl
). The (1|id_var)
indicates there is only
ONE observation PER individual.
# One-way ANOVA: fit and save
<- data_long %>%
aov_name ::aov_4(continuous_DV ~ group_IV + (1|id_var), # Formula: DV ~ groups + (1| id_var)
afexdata = .)
5.4.2 Basic Output - stored name of model
By running the name you saved you model under, you will get a brief set of output, including a measure of Effect Size.
NOTE: The ges
is the generalized
eta squared. In a one-way ANOVA, the eta-squared effect size is
only one value, ie. generalized \(\eta_g\) and partial \(\eta_p\) are the same.
# Display basic ANOVA results (includes effect size)
aov_name
5.4.3 Fuller Output - add $Anova on model name
To fully fill out a standard ANOVA table and compute other effect sizes, you will need a more complete set of output, including the Sum of Squares components, you will need to add $Anova
at the end of the model name before running it.
NOTE: IGNORE the first line that starts with
(Intercept)
! Also, the ‘mean sum of squares’ are not
included in this table, nor is the Total line at the
bottom of the standard ANOVA table. You will need to manually compute
these values and add them on the homework page. Remember that
Sum of Squares (SS)
and
degrees of freedom (df)
add up, but
Mean Sum of Squreas (MS)
do not add up. Also:
MS = SS/df
for each term.
# Display fuller ANOVA results (includes sum of squares)
$Anova aov_name
5.5 Followup-tests
5.5.1 Calculate Estimated Marginal Means
NOTE: Estimated Marginal Means (EMM) will we identical WHEN the groups are exactly the same size AND there is only ONE grouping variable.
%>%
aov_name ::emmeans(~ group_var) # by groups emmeans
5.5.2 Plot Estimated Marginal Means
%>%
aov_name ::emmip(~ group_var, # Calculate Estimated Marginal Means
emmeansCIs = TRUE) # Include Confidence Intervals
5.5.3 All Pairwise Comparisons
There are two steps to conduct all possible pairwise comparisons:
emmeans::emmeans(~ group_var)
- Calculate the Estimated Marinal Meanspairs()
- Determine if each pair is significantly different
Within the pairs()
function there are several options for controling for multiple comparisons, including:
adjust = "none"
- Fisher’s LSDadjust = "tukey"
- Tukey’s HSDadjust = "bon"
- Bonferroni
NOTE: Fisher’s LSD approach is valid if you have 2-3 groups. If there are 4 or more groups, Tukey’s HSD adjustment is more appropriate, provided the groups are the same size. Bonferroni’s adjustment is most conservative.
%>%
aov_name ::emmeans(~ group_var) %>% # Calculate Estimated Marginal Means
emmeanspairs(adjust = "tukey") # Is each pair signif different?
5.5.4 Effect Size for Pairwise Comparisons
Cohen’s d like “Standardized Mean Differences”
%>%
aov_name ::emmeans(~ group_var) %>%
emmeans::eff_size(sigma = sigma(aov_name$lm),
emmeansedf = df.residual(aov_name$lm))
5.5.5 Contrast Statements
There are two steps to conduct a contrast comparison:
emmeans(~ group_var)
- Calculate the Estimated Marinal Meanscontrast()
- Determine if each pair is significantly different
Inside the contrast statement, list the named sets of linear contrast weights. We will only be doing one-at-a-time, but we must still use a nested list
.
NOTE: You must provide one weight (\(c_i\)) for each of the \(k\) groups. If you wish to ignore a group, that group’s weight is \(c_i = 0\). The sum total of all the weights must be zero (\(\sum c_i = 0\)), so use positive and negative numbers.
%>%
aov_name ::emmeans(~ group_var) %>%
emmeans::contrast(list("your contrast name" = c(c_1, c_2, ... , c_k))) emmeans
5.6 APA Write-up
5.6.1 Methods Section
- Research Question(s)
- Specific testable hypotheses
- Sample
- participants
- inclusion & exclusion criteria
- Procedure
- setting
- timeline
- intervention & placebo (if applicable)
- Variables
- Measures used
- DV & IV(s)
- Analytic Plan
- Main Method Used, options
- Assumptions & How Tested
- Follow-up tests
- Effect Size
5.6.2 Results Section
- Descriptive Summary
- Table 1 with sample and variables
- Eyeball treands seen
- Main Test Results
- Assumptions of interest
- May need a table(s) and/or plot(s)
- Follow-up Tests
- final conclusion
Make sure to include: * Effect Sizes * Direction of any Significant relationship * Contex (not “reject the null”)