6 Poison Example: 1 & 2-way ANOVA
Required Packages
library(tidyverse) # Loads several very helpful 'tidy' packages
library(furniture) # Nice tables (by our own Tyson Barrett)
library(psych) # Descriptive helpers
library(pastecs) # Normality Tests
library(afex) # Analysis of Factorial Experiments
library(emmeans) # Estimated marginal means (Least-squares means)
library(ggpubr)
|
| | 0%
|
|........ | 11%
ordinary text without R code
|
|................ | 22%
label: unnamed-chunk-172
|
|....................... | 33%
ordinary text without R code
|
|............................... | 44%
label: unnamed-chunk-173
|
|....................................... | 56%
ordinary text without R code
|
|............................................... | 67%
label: unnamed-chunk-174
|
|...................................................... | 78%
ordinary text without R code
|
|.............................................................. | 89%
label: unnamed-chunk-175
|
|......................................................................| 100%
ordinary text without R code
[1] "C:\\Users\\A00315~1\\AppData\\Local\\Temp\\RtmpyU9d73\\file35a8572d40e7"
6.1 Prepare for Modeling
6.1.1 Ensure the Data is in “long” Format
https://www.guru99.com/r-anova-tutorial.html
The “poison” dataset contains 48 rows and 4 variables:
- ID: Guinea pig identification number
- Time: Survival time of the animal
- poison: Type of poison used: factor level: 1,2 and 3
- treat: Type of treatment used: factor level: 1,2 and 3
Before you start to compute the ANOVA test, you need to prepare the data as follow:
- Import the data from online
- Rename the
id
variable - Declare factors label categories of poison type & treatment
<- "https://raw.githubusercontent.com/guru99-edu/R-Programming/master/poisons.csv"
PATH
<- read.csv(PATH) %>%
df_survival ::rename(id = X) %>%
dplyr::mutate(poison = factor(poison)) %>%
dplyr::mutate(treat = factor(treat)) dplyr
::glimpse(df_survival) tibble
Rows: 48
Columns: 4
$ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, …
$ time <dbl> 0.31, 0.45, 0.46, 0.43, 0.36, 0.29, 0.40, 0.23, 0.22, 0.21, 0.1…
$ poison <fct> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 3, …
$ treat <fct> A, A, A, A, A, A, A, A, A, A, A, A, B, B, B, B, B, B, B, B, B, …
str(df_survival)
'data.frame': 48 obs. of 4 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ time : num 0.31 0.45 0.46 0.43 0.36 0.29 0.4 0.23 0.22 0.21 ...
$ poison: Factor w/ 3 levels "1","2","3": 1 1 1 1 2 2 2 2 3 3 ...
$ treat : Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ...
6.2 Example 1) Survival Time by Poison Type
6.2.1 Research Question
We want to know if there is a difference between the mean of the survival time according to the type of poison given to the Guinea pig.
6.2.2 Exploratory Data Analysis
6.2.2.1 Summary Statistics
%>%
df_survival ::group_by(poison) %>%
dplyr::table1("Survival Time" = time,
furnituredigits = 2,
na.rm = FALSE,
total = TRUE,
output = "markdown",
caption = "Summary of Survival Time by Type of Poisson Used")
Total | 1 | 2 | 3 | |
---|---|---|---|---|
n = 48 | n = 16 | n = 16 | n = 16 | |
Survival Time | ||||
0.48 (0.25) | 0.62 (0.21) | 0.54 (0.29) | 0.28 (0.06) |
6.2.2.2 Plot the Observed Data
%>%
df_survival ggplot(aes(x = poison,
y = time)) +
geom_violin(fill = "gray") +
geom_boxplot(width = .07,
fill = "white") +
geom_jitter(position = position_jitter(0.21)) +
stat_summary(fun = mean,
geom = "point",
shape = 18,
color = "red",
size = 5) +
theme_bw() +
labs(x = "Type of Poison",
y = "Observed Survival Time")
6.2.2.3 Summary
Things you should notice, but not a firm conclusion yet!
SAMPLE BALANCE (N)
The sample of 48 guinea pigs is balanced between the three poisons. There are the sample number of animals in each group (n = 16).
CENTER (MEANS)
Guinea pigs given Type 3 poison had shorter survival times (M = 0.28), on average, compared to Type 1 (M = 0.48) and Type 2 (M = 0.54).
SPREAD (STANDARD DEVIATIONS)
There was much less variation in survival times among the 16 guinea pics given Type 3 poison (SD = 0.06) compared to the other types (Type 1, SD = 0.21; Type 2, SD = 0.29).
6.2.3 Assumption Checks
6.2.3.1 Independent, Random Samples
The Sample(s) was drawn at random (at least as representative as possible) and the groups are independent of each other.
- Nothing can be done to fix NON-representative samples!
- Can not for with any statistically test
- Lack of independence requires a different model
6.2.3.2 Normality of the DV in each Group
%>%
df_survival ggplot(aes(time)) +
geom_histogram(binwidth = .05,
fill = "gray",
color = "black",
alpha = .5) +
facet_grid(poison ~ .,
labeller = label_both) +
theme_bw() +
labs(x = "Observed Survival Time")
%>%
df_survival ggplot(aes(sample = time)) +
geom_qq() +
stat_qq_line() +
facet_wrap(~ poison,
labeller = label_both) +
theme_bw()
%>%
df_survival ::group_by(poison) %>%
dplyr::summarise(skewness = psych::skew(time),
dplyrkurtosis = psych::kurtosi(time))
# A tibble: 3 × 3
poison skewness kurtosis
<fct> <dbl> <dbl>
1 1 0.567 -0.543
2 2 1.07 -0.0824
3 3 0.246 -1.38
%>%
df_survival ::group_by(poison) %>%
dplyr::summarise(SW.stat = shapiro.test(time)$statistic,
dplyrp.value = shapiro.test(time)$p.value)
# A tibble: 3 × 3
poison SW.stat p.value
<fct> <dbl> <dbl>
1 1 0.932 0.266
2 2 0.853 0.0149
3 3 0.936 0.302
%>%
df_survival ::nest_by(poison) %>%
dplyr::summarise(summ = list(pastecs::stat.desc(data$time,
dplyrnorm = TRUE))) %>%
::mutate(summ = list(summ %>%
dplyrunlist() %>%
data.frame() %>%
::rename("value" = ".") %>%
dplyr::rownames_to_column(var = "stat"))) %>%
tibble::unnest(summ) %>%
tidyr::filter(stat %in% c("skewness", "skew.2SE",
dplyr"kurtosis", "kurt.2SE",
"normtest.W", "normtest.p")) %>%
::pivot_wider(names_from = stat,
tidyrvalues_from = value)
# A tibble: 3 × 7
# Groups: poison [3]
poison skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.567 0.502 -0.543 -0.249 0.932 0.266
2 2 1.07 0.951 -0.0824 -0.0378 0.853 0.0149
3 3 0.246 0.218 -1.38 -0.634 0.936 0.302
6.2.3.3 Homogeneity of Variance (HOV) in the DV between groups
Since the distribution of survival time might not be normally distributed, Bartlett’s test is not the best.
%>%
df_survival ::leveneTest(time ~ poison,
cardata = .,
center = "mean")
Levene's Test for Homogeneity of Variance (center = "mean")
Df F value Pr(>F)
group 2 8.0941 0.0009937 ***
45
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
%>%
df_survival fligner.test(time ~ poison,
data = .)
Fligner-Killeen test of homogeneity of variances
data: time by poison
Fligner-Killeen:med chi-squared = 9.8895, df = 2, p-value = 0.007121
6.2.3.4 Summary
NORMALITY
The distribution of survival time does not seem to be normal based on visual inspection. For poison Type 2, survival time significantly deviates from normality (skew = 1.07, kurtosis = -0.08), W = 0.85, p = .015.
HOMOGENEITY OF VARIANCE
Survival time is significantly less variable for poison Type 3 (SD = 0.06), compared to the others (Type 1, SD = 0.21; Type 2, SD = 0.29), F(2, 45) = 8.09, p < .001.
6.2.4 Fitting One-way ANOVA Model
<- df_survival %>%
aov_poison ::aov_4(time ~ poison + (1|id),
afexdata = .)
6.2.4.1 Basic Output - stored name of model
::anova_apa(aov_poison) apa
Effect
1 (Intercept) F(1, 45) = 251.70, p < .001, petasq = .85 ***
2 poison F(2, 45) = 11.79, p < .001, petasq = .34 ***
aov_poison
Anova Table (Type 3 tests)
Response: time
Effect df MSE F ges p.value
1 poison 2, 45 0.04 11.79 *** .344 <.001
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
6.2.4.2 Fuller Output - add $Anova on model name
$Anova aov_poison
Anova Table (Type III tests)
Response: dv
Sum Sq Df F value Pr(>F)
(Intercept) 11.0304 1 251.700 < 2.2e-16 ***
poison 1.0330 2 11.786 7.656e-05 ***
Residuals 1.9721 45
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
6.2.5 Followup-tests
6.2.5.1 Estimated Marginal Means with emmeans::emmeans()
%>%
aov_poison ::emmeans(~ poison) emmeans
poison emmean SE df lower.CL upper.CL
1 0.618 0.0523 45 0.512 0.723
2 0.544 0.0523 45 0.439 0.650
3 0.276 0.0523 45 0.171 0.382
Confidence level used: 0.95
%>%
aov_poison ::emmip(~ poison,
emmeansCIs = TRUE) +
theme_bw() +
labs(x = "Type of Poison",
y = "Estmated Marginal Mean: Survival Time")
6.2.5.2 All Pairwise Comparisons with pairs()
Pairwise post hoc: Fisher’s LSD adjustment for multiple comparisons
<- aov_poison %>%
aov_pairs ::emmeans(~ poison) %>%
emmeanspairs(adjust = "none")
aov_pairs
contrast estimate SE df t.ratio p.value
poison1 - poison2 0.0731 0.074 45 0.988 0.3284
poison1 - poison3 0.3412 0.074 45 4.611 <.0001
poison2 - poison3 0.2681 0.074 45 3.623 0.0007
6.2.5.3 Effect Size for Pairs
<- aov_poison %>%
aov_eff ::emmeans(~ poison) %>%
emmeans::eff_size(sigma = sigma(aov_poison$lm),
emmeansedf = df.residual(aov_poison$lm))
aov_eff
contrast effect.size SE df lower.CL upper.CL
poison1 - poison2 0.349 0.355 45 -0.367 1.07
poison1 - poison3 1.630 0.393 45 0.838 2.42
poison2 - poison3 1.281 0.378 45 0.519 2.04
sigma used for effect sizes: 0.2093
Confidence level used: 0.95
6.2.5.4 Contrast Statements with emmeans::contrast()
Compare the mean of Type 3 to the other two types.
%>%
aov_poison ::emmeans(~ poison) %>%
emmeans::contrast(list("1 & 2 vs. 3" = c(1, 1, -2))) emmeans
contrast estimate SE df t.ratio p.value
1 & 2 vs. 3 0.609 0.128 45 4.754 <.0001
6.2.5.5 Summary
PAIRS
Survival time for Type 3 (M = 0.28) is is lower than type 1 (M = 0.62), t(45) = 4.61, p < .001, d = 1.63, and type 2 (M = 0.54), t(45) = 3.62, p < .001, d = 1.28. There is no evidence between types 1 & 2, t(45) = 0.99, p = .328, d = 0.35.
CONTRAST
Survival time for Type 3 (M = 0.28) is is lower than both type 1 (M = 0.62) and type 2 (M = 0.54), t(45) = 4.75, p < .001.
6.2.6 APA Write Up
Note: There are many unkowns regarding the situation of this experiment and can not be include in this write-up.
6.2.6.1 Methods Section
To investigate potency of poisons, 48 guinea pigs were randomized to three types of poison (n = 16) and time till mortality was measured. Survival time (dependent variable) was then subjected to a 1-way independent groups analysis of variance (ANOVA) comparing the three types of poison (independent variable). Prior to analysis, the assumptions of normality and homogeneity of variance were assessed with with Shapiro-Wilks and Levene’s Test respectively. The significant omnibus effect was investigated further with pairwise and appropriate contrast comparisons to determine which of the types different on survival time. Fisher’s method was used for all pairwise tests and effects sizes are given as Cohen’s d-like standardized mean differences.
6.2.6.2 Results Section
The distribution of survival time did not seem to be normal based on visual inspection. For poison Type 2, survival time significantly deviates from normality (skew = 1.07, kurtosis = -0.08), W = 0.85, p = .015. Survival time also was significantly less variable for poison Type 3 (SD = 0.06), compared to the others (Type 1, SD = 0.21; Type 2, SD = 0.29), F(2, 45) = 8.09, p < .001. The ANOVA was still conducted and found statistically significant evidence that survival time do indeed varry by poison type, F(2, 45) = 11.79, p < .001, \(\eta^2\) = 0.34. Specifically, survival time for Type 3 (M = 0.28) was shorter than type 1 (M = 0.62), t(45) = 4.61, p < .001, d = 1.63, and type 2 (M = 0.54), t(45) = 3.62, p < .001, d = 1.28. There was no evidence of a difference in survival time between types 1 & 2, t(45) = 0.99, p = .328, d = 0.35.
Here is a publication worthy plot
<- dplyr::left_join(data.frame(aov_pairs),
aov_poison_label data.frame(aov_eff),
by = "contrast") %>%
::separate(col = contrast,
tidyrinto = c("start", "stop")) %>%
::mutate_at(vars(start, stop),
dplyr::str_sub,
stringrstart = -1) %>%
::mutate(lab = glue::glue("d = {round(effect.size, 2)}, {pval_label_apa(p.value)}"))
dplyr
%>%
aov_poison ::emmeans(~ poison) %>%
emmeansdata.frame() %>%
ggplot(aes(x = poison,
y = emmean)) +
geom_errorbar(aes(ymin = emmean - SE,
ymax = emmean + SE),
width = .3) +
geom_point() +
geom_line(aes(group = 1)) +
theme_bw() +
labs(x = "Poison Type",
y = "Estimated MArginal Mean Survival Time") +
::geom_bracket(data = aov_poison_label %>% dplyr::filter(p.value < .06),
ggpubraes(xmin = start,
xmax = stop,
label = lab),
y.position = 0.7,
step.increase = -0.15,
type = "text")
6.3 Example 2) Survival Time by Treatment Type
6.3.1 Parepare for Analysis
Our objective is to test the following assumption:
H_0: There is no difference in survival time average between group
H_a: The survival time average is different for at least one group.
In other words, you want to know if there is a statistical difference between the mean of the survival time according to the type of treatment given to the Guinea pig.
Variables:
id
Guinea pig identification numbertreat
treatment type, 4-level IV (independent groups)time
survival time, continuous DV (Y, outcome)
Note: ignore
poison
for now
6.3.1.1 Compute Summary Statistics
Second, check the summary statistics for each of the \(k=4\) groups.
%>%
df_survival ::group_by(treat) %>% # divide into groups
dplyr::table1("Survival Time" = time,
furnituredigits = 2, # gives M(SD)
na.rm = FALSE,
total = TRUE,
output = "markdown",
caption = "Summary of Survival Time by Type of Treatment Used")
Total | A | B | C | D | |
---|---|---|---|---|---|
n = 48 | n = 12 | n = 12 | n = 12 | n = 12 | |
Survival Time | |||||
0.48 (0.25) | 0.31 (0.10) | 0.68 (0.32) | 0.39 (0.17) | 0.53 (0.22) |
Interpretion:
6.3.1.2 Plot the Raw Data
Plot the data to eyeball the potential effect. Remember the center line in each box represents the median, not the mean. I have added a red diamond on the mean survival time for each type of poison.
%>%
df_survival ggplot(aes(x = treat,
y = time)) +
geom_violin(fill = "gray") +
geom_boxplot(width = .07,
fill = "white") +
geom_jitter(position = position_jitter(0.21)) +
stat_summary(fun = mean,
geom = "point",
shape = 18,
color = "red",
size = 5) +
theme_bw() +
labs(x = "Type of Treatment",
y = "Observed Survival Time")
6.3.2 Fitting One-way ANOVA Model
<- df_survival %>%
aov_treat ::aov_4(time ~ treat + (1|id),
afexdata = .)
6.3.3 Followup-tests
6.3.3.1 Estimated Marginal Means with emmeans::emmeans()
%>%
aov_treat ::emmeans(~ treat) # Calculate Estimated Marginal Means emmeans
treat emmean SE df lower.CL upper.CL
A 0.314 0.0628 44 0.188 0.441
B 0.677 0.0628 44 0.550 0.803
C 0.393 0.0628 44 0.266 0.519
D 0.534 0.0628 44 0.408 0.661
Confidence level used: 0.95
%>%
aov_treat ::emmip(~ treat,
emmeansCIs = TRUE) +
theme_bw() +
labs(x = "Type of Treatment",
y = "Estmated Marginal Mean: Survival Time")
6.3.3.2 All Pairwise Comparisons with pairs()
Pairwise post hoc: Fisher’s LSD adjustment for multiple comparisons
%>%
aov_treat ::emmeans(~ treat) %>% # Calculate Estimated Marginal Means
emmeanspairs(adjust = "none") # Is each pair signif different?
contrast estimate SE df t.ratio p.value
A - B -0.3625 0.0888 44 -4.080 0.0002
A - C -0.0783 0.0888 44 -0.882 0.3827
A - D -0.2200 0.0888 44 -2.476 0.0172
B - C 0.2842 0.0888 44 3.198 0.0026
B - D 0.1425 0.0888 44 1.604 0.1159
C - D -0.1417 0.0888 44 -1.595 0.1180
6.4 Example 3) Survival Time by Both Poisson & Treatment Type
6.4.1 Parepare for Analysis
Our objective is to test the following assumption:
H_0: There is no difference in survival time average between group
H_a: The survival time average is different for at least one group
In other words, you want to know if there is a statistical difference between the mean of the survival time according to BOTH type of poisons and treatment given to the Guinea pig.
Variables:
id
Guinea pig identification numberpoison
poison type, 3-level IV (independent groups)treat
treatment type, 4-level IV (independent groups)time
survival time, continuous DV (Y, outcome)
6.4.1.1 Compute Summary Statistics
Second, check the summary statistics for each of the \(k\) groups.
%>%
df_survival ::group_by(treat, poison) %>% # divide into groups
dplyr::summarise(n = n(),
dplyrmean = mean(time),
sd = sd(time)) %>%
::gt(caption = "Summary of Survival Time by Type of Treatment Used") gt
poison | n | mean | sd |
---|---|---|---|
A | |||
1 | 4 | 0.4125 | 0.06946222 |
2 | 4 | 0.3200 | 0.07527727 |
3 | 4 | 0.2100 | 0.02160247 |
B | |||
1 | 4 | 0.8800 | 0.16083117 |
2 | 4 | 0.8150 | 0.33630343 |
3 | 4 | 0.3350 | 0.04654747 |
C | |||
1 | 4 | 0.5675 | 0.15671099 |
2 | 4 | 0.3750 | 0.05686241 |
3 | 4 | 0.2350 | 0.01290994 |
D | |||
1 | 4 | 0.6100 | 0.11284207 |
2 | 4 | 0.6675 | 0.27097048 |
3 | 4 | 0.3250 | 0.02645751 |
6.4.1.2 Plot the Raw Data
Third, plot the data to eyeball the potential effect. Remember the center line in each box represents the median, not the mean. I have added a red diamond on the mean survival time for each type of poison.
%>%
df_survival ggplot(aes(x = treat,
y = time)) +
geom_violin(fill = "gray") +
geom_boxplot(width = .07,
fill = "white") +
geom_jitter(position = position_jitter(0.21)) +
stat_summary(fun = mean,
geom = "point",
shape = 18,
color = "red",
size = 5) +
theme_bw() +
labs(x = "Type of Treatment",
y = "Observed Survival Time") +
facet_wrap(~ poison, labeller = label_both)
6.4.1.3 Assumption Check: Normality
%>%
df_survival ggplot(aes(time)) +
geom_density(fill = "gray",
alpha = .5) +
facet_grid(treat ~ poison,
labeller = label_both) +
theme_bw() +
labs(x = "Observed Survival Time")
6.4.1.4 Assumption Check: HOV
Levene’s Test of HOV
%>%
df_survival ::leveneTest(time ~ treat*poison,
cardata = .,
center = "mean")
Levene's Test for Homogeneity of Variance (center = "mean")
Df F value Pr(>F)
group 11 4.8535 0.0001442 ***
36
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
6.4.2 Fitting Two-way ANOVA Model
<- df_survival %>%
aov_survival2 ::aov_4(time ~ poison*treat + (1|id),
afexdata = .)
6.4.2.1 Basic Output - stored name of model
aov_survival2
Anova Table (Type 3 tests)
Response: time
Effect df MSE F ges p.value
1 poison 2, 36 0.02 23.22 *** .563 <.001
2 treat 3, 36 0.02 13.81 *** .535 <.001
3 poison:treat 6, 36 0.02 1.87 .238 .112
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
6.4.2.2 Fuller Output - add $Anova on model name
$Anova aov_survival2
Anova Table (Type III tests)
Response: dv
Sum Sq Df F value Pr(>F)
(Intercept) 11.0304 1 495.9194 < 2.2e-16 ***
poison 1.0330 2 23.2217 3.331e-07 ***
treat 0.9212 3 13.8056 3.777e-06 ***
poison:treat 0.2501 6 1.8743 0.1123
Residuals 0.8007 36
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
6.4.3 Followup-tests
6.4.3.2 All Pairwise Comparisons for Main Effects of Poison
%>%
aov_survival2 ::emmeans(~ poison) %>% # Calculate Estimated Marginal Means
emmeanspairs(adjust = "none") # Is each pair signif different?
contrast estimate SE df t.ratio p.value
poison1 - poison2 0.0731 0.0527 36 1.387 0.1740
poison1 - poison3 0.3412 0.0527 36 6.472 <.0001
poison2 - poison3 0.2681 0.0527 36 5.085 <.0001
Results are averaged over the levels of: treat
%>%
aov_survival2 ::emmip( ~ poison,
emmeansCIs = TRUE) +
theme_bw() +
labs(x = "Type of Poison",
y = "Estmated Marginal Mean: Survival Time")
6.4.3.3 All Pairwise Comparisons for Main Effects of Treatment
%>%
aov_survival2 ::emmeans(~ treat) %>% # Calculate Estimated Marginal Means
emmeanspairs(adjust = "tukey") # Is each pair signif different?
contrast estimate SE df t.ratio p.value
A - B -0.3625 0.0609 36 -5.954 <.0001
A - C -0.0783 0.0609 36 -1.287 0.5772
A - D -0.2200 0.0609 36 -3.613 0.0049
B - C 0.2842 0.0609 36 4.667 0.0002
B - D 0.1425 0.0609 36 2.340 0.1077
C - D -0.1417 0.0609 36 -2.327 0.1108
Results are averaged over the levels of: poison
P value adjustment: tukey method for comparing a family of 4 estimates
%>%
aov_survival2 ::emmip( ~ treat,
emmeansCIs = TRUE) +
theme_bw() +
labs(x = "Type of
Treatment",
y = "Estmated Marginal Mean: Survival Time")