class: center, middle, inverse, title-slide # Two Independent Sample Means: t Test ## Cohen Chapter 7
.small[EDUC/PSY 6600] --- class: center, middle ## “We cannot solve problems by using the same kind of thinking that we used when we created them.” ### -- Albert Einstein --- ## Introduction > Same continuous (interval, ratio) .bluer[Dependent Variable (DV)] compared across 2 independent (random) samples -- ### .dcoral[Research Questions] * Is there a significant .nicegreen[difference] between the 2 group means? * Do 2 samples come from .nicegreen[different] normal distributions with the same mean? -- ### .dcoral[Also called] * Independent-groups design * Between-subjects design * Between-groups design * Randomized-groups design --- <!-- Research by Design: How to Compare Two Samples Using t Tests (11-1) (5 min)--> <iframe width="1000" height="750" src="https://www.youtube.com/embed/uMLPR8aTYgQ?controls=0&start=2" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- <!-- Dr. Nic: Hypothesis testing: step-by-step, p-value, t-test for difference of two means - Statistics Help (7.5 min)--> <iframe width="1000" height="750" src="https://www.youtube.com/embed/0zZYBALbZgg?controls=0&start=2" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- ### Hypothesis Test Steps .pull-left[ 1. .nicegreen[State the **Hypotheses**] <br><br> 2. .nicegreen[Select the **Statistical Test** & Significance Level] <br><br> 3. .nicegreen[Select random **samples** and collect data] <br><br> 4. .nicegreen[Find the **Region of Rejection**] <br><br> 5. .nicegreen[Calculate the **Test Statistic**] <br><br> 6. .nicegreen[Make the **Statistical Decision**] <br><br> ] -- .pull-left[ * Null: no mean difference .dcoral[*difference = 0*] * Alternative: there *is* a mean difference <br><br> * α level, *default = .05* * One vs. Two tails, *default = 2 tails* <br><br> * 2 simple random samples, *independent* * 1 sample *divided* into independent groups <br><br> * Based on α & # of tails * *only really do if working by hand* <br><br> * Examples include: z, t, F, χ2 * .dcoral[*use a p-value more by computer*] <br><br> * big p-value: "no evidence of a difference" * tiny p-value: "evidence of a difference" * .coral[make sure it is **in context**] ] --- ### Test Statistic Format `$$\text{Test Statistic} = \frac{\text{Estimate} - \text{Hypothesis}}{SE}$$` -- .pull-left[ **For a Single MEAN** .dcoral[ `$$\text{Test Statistic} = \frac{\text{Estimate}_{\text{Mean}} - \text{Hypothesis}_\text{Mean}}{SE_{\text{Mean}}}$$` ] * Assume the population's SD... `$$z = \frac{\bar{x} - \mu_0}{\frac{\sigma}{\sqrt{N}}}$$` * Use the sample's SD... `$$t=\frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{N}}}$$` `$$df = N - 1$$` ] -- .pull-right[ **For the DIFFERENCE in Two Means** .nicegreen[ `$$\text{Test Statistic} = \frac{\text{Estimate}_{\text{Difference}} - \text{Hypothesis}_\text{Difference}}{SE_{\text{Difference}}}$$` ] `$$t = \frac{(\overline{x_1} - \overline{x_2}) - 0}{SED}$$` > Note: use *z* instead of *t* **IF** N's > 100 * Null Hypothesis is .nicegreen["No Difference"] `$$H_0:\overline{D}=0$$` `$$H_0:\mu_1-\mu_2=0$$` * The .coral[Standard Error for the Difference (SED)] & .coral[degrees of freedom] is more complex ] --- ### Homogeneity of Variance (HOV) > **HOV**: both populations have the SAME spread **Levene's Test** used to check for evidence that HOV is violated (Null = nothing going on...HOV is fine) $$ H_0: \sigma_1 = \sigma_2 $$ $$ H_1: \sigma_1 \ne \sigma_2 $$ > Note: Levene's doesn't work as well for small samples -- .pull-left[ .dcoral[**IF p-value is BIGGER than alpha**] - *F*(1, 98) = 0.78, *p* = .377 Does .nicegreen[**NOT**] provide evidence that HOV is violated ] -- .pull-right[ .dcoral[**IF p-value is SMALLER than alpha**] - *F*(1, 77) = 4.58, *p* = .013 .nicegreen[**DOES**] provide evidence that HOV is violated ] --- .pull-left[ ### .dcoral[Pooled Variance Test] **Requirement** * groups have the **SAME SD's** .bluer[(HOV)] **Default in R's `t.test()`** * `var.equal = TRUE` **Degrees of Freedom** `$$df_{pooled}=n_1+n_2-2$$` **Standard Error** `$$SE_{pooled}=\sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}$$` > first, pool standard deviations `$$s_p^2=\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}$$` ] -- .pull-right[ ### .dcoral[Seperate Variance Test] **Requirement** * groups may have **different SD's** **Optional in R's `t.test()`** * `var.equal = FALSE` **Degrees of Freedom** `$$min(n_1,n_2)-1<df_{SV}<n_1+n_2-2$$` **Standard Error** `$$SE_{SV}=\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$$` .nicegreen[ - Use is somewhat controversial - Welch's *df* are more complex - Easier to compute the SE by hand - Use with unequal N's, especially small ] ] --- ## Confidence Intervals > .blouer[We construct the confidence interval for the DIFFERENCE between the 2 means] -- **General Form for All Confidence Intervals** `$$\text{Point Estimate} \quad \pm \quad CV_{\text{Estimate}} \times SE_{\text{Estimate}}$$` -- **General Form for CIs for the Difference in Means** `$$(\; \overline{x_1} - \overline{x_2} \;) \quad \pm \quad t_{CV} \times SE_{Diff}$$` -- .pull-left[ ### .dcoral[Pooled Variance CI] * groups have the **SAME SD's** .bluer[(HOV)] `$$(\; \overline{x_1} - \overline{x_2} \;) \quad \pm \quad t_{CV} \times \sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}$$` ] -- .pull-right[ ### .dcoral[Seperate Variance CI] * groups may have **different SD's** `$$(\; \overline{x_1} - \overline{x_2} \;) \quad \pm \quad t_{CV} \times \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$$` ] --- <img src="figures/formula_sheet_p1.png" width="85%" style="display: block; margin: auto;" /> --- <img src="figures/formula_sheet_p1_1sample.jpg" width="85%" style="display: block; margin: auto;" /> --- <img src="figures/formula_sheet_p1_2samples.jpg" width="85%" style="display: block; margin: auto;" /> --- ### Example 1) A Fully Randomized Experiment In order to assess the efficacy of a new antidepressant drug, .dcoral[**10**] clinically depressed participants are **randomly assigned** to one of **TWO groups**. -- * Assume that prior to introducing the treatments, the experimenter confirmed that the level of depression in the 2 groups was equal. -- After 6 months, all participants are rated by a psychiatrist (**blind to participant assignment**) on their level of depression. -- .pull-left[ .dcoral[Five] participants are assigned to Group 1, which is administered the .nicegreen[**antidepressant drug**] for 6 months. .center[ .coral[ 11, 1, 0, 2, 0 ] ] ] -- .pull-right[ The other .dcoral[five] participants are assigned to Group 2, which is administered a .nicegreen[**placebo**] during the same 6 month period. .center[ .coral[ 11, 11, 5, 8, 4 ] ] ] -- > **Research Question**: Is there evidence that the antidepressant drug reduces depression more than a placebo? --- ### Example 1) Entering Small Dataset by Hand .pull-leftbig[ .dcoral[**Load Packages**] ```r library(tidyverse) library(furniture) library(car) ``` .dcoral[**Enter the data**] > Two people per line, but don't really pair up ```r df1_wide <- data.frame(drug = c(11, 1, 3, 4, 0), placebo = c(11, 11, 5, 8, 4)) ``` ] -- .pull-leftsmall[ .dcoral[**View the data frame**] ```r df1_wide ``` ``` drug placebo 1 11 11 2 1 11 3 3 5 4 4 8 5 0 4 ``` ] --- ### Example 1) Pivoting Wide-to-Long .pull-leftbig[ .dcoral[**Restructure the data frame**] > We want one line per person ```r df1_long <- df1_wide %>% tidyr::pivot_longer(cols = c(drug, placebo), names_to = "pill", names_ptypes = list(group = factor()), values_to = "depression") ``` ] -- .pull-rightsmall[ .dcoral[**View the data frame**] ```r df1_long ``` ``` # A tibble: 10 x 2 pill depression <chr> <dbl> 1 drug 11 2 placebo 11 3 drug 1 4 placebo 11 5 drug 3 6 placebo 5 7 drug 4 8 placebo 8 9 drug 0 10 placebo 4 ``` ] --- ### Example 1) Exploratory Data Analysis .pull-left[ .dcoral[**Summarize:** *N, M, SD*] ```r df1_long %>% dplyr::group_by(pill) %>% furniture::table1(depression, digits = 2, na.rm = FALSE, output = "markdown") ``` | | drug | placebo | |------------|-------------|-------------| | | n = 5 | n = 5 | | depression | | | | | 3.80 (4.32) | 7.80 (3.27) | .nicegreen[Note: Normality is nearly impossible to assess in very small samples] ] -- .pull-right[ .dcoral[**Visualize:** *boxplots*] ```r df1_long %>% ggplot(aes(x = pill, y = depression)) + geom_boxplot() ``` <img src="ch7_t-test_2means_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ] ] --- ### Example 1) Levene's Test for HOV .dcoral[**BEFORE the t-test, check for violations of HOV**] -- ```r df1_long %>% # name of the data car::leveneTest(depression ~ pill, # continuous DV ~ 2-group IV data = ., # pipe the data from above center = mean) # default is median ``` ``` Levene's Test for Homogeneity of Variance (center = mean) Df F value Pr(>F) group 1 0.0526 0.8244 8 ``` <br><br> -- .dcoral[**Conclusion**] -- No evidence of violations of HOV were found, *F*(1, 8) = 0.05, *p* = .824. -- > Choose to do the pooled variance t-test (`var.equal = TRUE`) --- ### Example 1) Run a Pooled Variance t-Test .dcoral[**Run the POOLED variance t-test**] -- ```r df1_long %>% # name of the data t.test(depression ~ pill, # continuous DV ~ 2-group IV data = ., # pipe the data from above var.equal = TRUE) # do the pooled-variance version ``` ``` Two Sample t-test data: depression by pill t = -1.6496, df = 8, p-value = 0.1376 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -9.591763 1.591763 sample estimates: mean in group drug mean in group placebo 3.8 7.8 ``` -- .dcoral[**Conclusion**] -- No evidence of a difference in depression was found, *t*(8) = 1.80, *p* = .110. --- ### Example 1) Writing Methods & Results .dcoral[**Methods Section - analysis**] -- To test the effectiveness of the drug at reducing depression, an .nicegreen[**independent samples t-test**] was performed, with pill type *(antidepressant drug vs. placebo)* functioning as the .nicegreen[independent variable] and depression score as the .nicegreen[dependent variable]. -- .nicegreen[**Levene's test**] assessed for the assumption of homogeneity of variance to determine if a .nicegreen[pooled] or .nicegreen[seperate variance] test should be performed. <br><br> -- .dcoral[**Results Section**] -- After six months, the .nicegreen[five] participants taking the drug scored numerically lower on the depression scale .dcoral[(*M* = 2.80, *SD* = 4.66)], compared their .nicegreen[five] counter parts taking placebo .dcoral[(*M* = 7.80, *SD* = 3.27)]. -- The assumption of homogeneity of variances was tested and satisfied.dcoral[, *F*(1, 8) = 0.05, *p* = .824], thus the pooled test was conducted. -- No evidence was found to support the claim that this antidepressant change depression.dcoral[, *t*(8) = 1.80, *p* = .138]. --- ## Assumptions of a 2-sample t-Test .large[**1. BOTH Samples were drawn at .dcoral[RANDOM] -AND- are .dcoral[INDEPENDENT] of each other** *(at least as representative as possible)*] - Nothing can be done to fix NON-representative samples & you can .bluer[**NOT**] statistically test - If not independent samples, use a *Paired-Samples t-Test* -- .large[**2. The Continuous DV follows a .dcoral[NORMAL] distribution in BOTH populations**] - .bluer[**NOT**] as important if the sample is large, due to the **Central Limit Theorem** - .bluer[**CAN**] statistically test: - Visual inspection of a .nicegreen[histogram], .nicegreen[boxplot], and/or .nicegreen[QQ plot] *(straight 45 degree line)* - Calculate the Skewness & Kurtosis... less clear guidelines - Conduct .nicegreen[Shapiro-Wilks] test *(p < .05 ??? not normal)* -- .large[**3. .dcoral[Homogeneity of Variance] (HOV)**] - BOTH populations have the .bluer[SAME] spreads (*SD*s) - Use .dcoral[Levene's Test] to assess for HOV (Null Hypothesis is HOV is valid) --- ## Violating the Assumptions .pull-left[ .dcoral[**Equal Size Groups**] violations "hurt" less Only violate .nicegreen[HOV] - Small effect: *p* off by `\(\pm.02\)` Only violate .nicegreen[Normaity] - Small effect: *p* off by `\(\pm.02\)` - .bluer[**HOWEVER**: If samples are highly skewed or are skewed in opposite directions p-values can be **very** inaccurate!] Violate .nicegreen[Both] - Moderate effects if N is large, p-value can be inaccurate - Large effects if N is small, p-value can be **very** inaccurate ] -- .pull-right[ .dcoral[**UN-equal Size Groups**] violations have "BIG" impacts Only violate .nicegreen[HOV] - Large effect Only violate .nicegreen[Normaity] - Large effect Violate .nicegreen[Both] - Huge effect > .bluer[p-values can be **very** inaccurate with unequal *n*s AND violations of assumptions, especially when *N* is small] ] --- ## Alternatives for when you Violate Assumptions Violation of .dcoral[Normality] or your DV is .dcoral[ordinal]... > None of these are covered in this class - Two Sample Wilcoxon test *(aka, .nicegreen[Mann-Whitney U Test])* - Sample re-use methods - Rely on empirical, rather than theoretical, probability distributions * Exact statistical methods * Permutation and randomization tests * Bootstrapping --- ## Random Assignment & Limits on Conclusions Random assignment to groups DECREASES experimenter biases & confounding - Cases are enumerated and numbers drawn to assign groups > Randomization does NOT ensure **equality** of group characteristics, but does allow for group differences to be **attributed** to the randomized factor. -- **True Experiment** - Random assignment & manipulation of IV - Even better if it is .nicegreen[double blind] **Quasi-experiment** - Either randomization or manipulation - Could randomize groups or clusters **Non-experiment or Observational Study** - Neither randomization or manipulation - Participants self-select or form naturally occurring groups --- ### Example 2) A Fully Randomized Experiment An industrial psychologist is investigation the effects of different types of **motivation** on the performance of simulated clerical **tasks**. -- The **ten** participants in the .coral[“individual motivation”] sample are told that they will be rewarded according to how many tasks .dcoral[**they**] successfully complete. -- The **12** participants in the .coral[“group motivation”] sample are told that they will be rewarded according to the average number of tasks completed .dcoral[**by all the participants**] in their sample. -- The number of tasks completed by each participant are as follows: * Individual Motivation (`Self`): .nicegreen[11, 17, 14, 12, 11, 15, 13, 12, 15, 16] * Group Motivation (`Collective`): .nicegreen[10, 15, 4, 8, 9, 14, 6, 15, 7, 11, 13, 5] -- > **Research Question**: Is there evidence that performance on clerical tasks is effected by the type of motivation? --- ### Example 2) Entering Small Dataset by Hand > Do NOT worry about this...I will do this part in the assignments when needed ```r df_self <- data.frame(motivation = "Self", tasks_done = c(11, 17, 14, 12, 11, 15, 13, 12, 15, 16)) ``` ```r df_coll <- data.frame(motivation = "Collective", tasks_done = c(10, 15, 4, 8, 9, 14, 6, 15, 7, 11, 13, 5)) ``` ```r df2_long <- dplyr::full_join(df_self, df_coll) %>% dplyr::mutate(motivation = factor(motivation)) ``` ```r tibble::glimpse(df2_long) ``` ``` Rows: 22 Columns: 2 $ motivation <fct> Self, Self, Self, Self, Self, Self, Self, Self, Self, Se... $ tasks_done <dbl> 11, 17, 14, 12, 11, 15, 13, 12, 15, 16, 10, 15, 4, 8, 9,... ``` --- ### Example 2) Exploratory Data Analysis .pull-left[ .dcoral[**Summarize:** *N, M, SD*] ```r df2_long %>% dplyr::group_by(motivation) %>% furniture::table1(tasks_done, digits = 2, output = "markdown") ``` | | Collective | Self | |------------|-------------|--------------| | | n = 12 | n = 10 | | tasks_done | | | | | 9.75 (3.89) | 13.60 (2.12) | ] -- .pull-right[ .dcoral[**Visualize:** *boxplots*] ```r df2_long %>% ggplot(aes(x = motivation, y = tasks_done)) + geom_boxplot() ``` <img src="ch7_t-test_2means_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> ] ] --- ### Example 2) Levene's Test for HOV .dcoral[**BEFORE the t-test, check for violations of HOV**] -- ```r df2_long %>% # name of the data car::leveneTest(tasks_done ~ motivation, # continuous DV ~ 2-group IV data = ., # pipe the data from above center = mean) # default is median ``` ``` Levene's Test for Homogeneity of Variance (center = mean) Df F value Pr(>F) group 1 4.8287 0.03994 * 20 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` <br><br> -- .dcoral[**Conclusion**] -- Evidence found of violations of HOV, *F*(1, 20) = 4.83, *p* = .040. Also, samples are both small and unequal in size. -- > Choose to do the seperate variance t-test (`var.equal = FALSE`) --- ### Example 2) Run a Seperate Variance t-Test .dcoral[**Run the SEPERATE variance t-test**] -- ```r df2_long %>% # name of the data t.test(tasks_done ~ motivation, # continuous DV ~ 2-group IV data = ., # pipe the data from above var.equal = FALSE) # do the seperate-variance version ``` ``` Welch Two Sample t-test data: tasks_done by motivation t = -2.9456, df = 17.518, p-value = 0.008833 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -6.601413 -1.098587 sample estimates: mean in group Collective mean in group Self 9.75 13.60 ``` -- .dcoral[**Conclusion**] -- Evidence of a difference in performace was found, *t*(17.52) = 2.95, *p* = .009. --- ### Example 2) Writing Methods & Results .dcoral[**Methods Section - analysis**] -- To test if performance on clerical tasks is effected by the type of motivation, an .nicegreen[**independent samples t-test**] was performed, with type of motivation *(self vs. collective)* functioning as the .nicegreen[independent variable] and number of tasks completed as the .nicegreen[dependent variable]. -- .nicegreen[**Levene's test**] assessed for the assumption of homogeneity of variance to determine if a .nicegreen[pooled] or .nicegreen[seperate variance] test should be performed. <br> -- .dcoral[**Results Section**] -- The individually motivated participants completed more clerical tasks .dcoral[(*n* = 10, *M* = 13.60, *SD* = 2.12)], compared to the participants being motivated by their group's collective results .dcoral[(*n* = 12, *M* = 9.75, *SD* = 3.89)]. -- The assumption of homogeneity of variances was tested and evidence of violations were found.dcoral[, *F*(1, 20) = 4.83, *p* = .040], thus the seperated variance test was conducted. -- Statistically significant evidence was found to support the claim that type of motivation effects performance. Thus, individual motivation does result in a mean .coral[3.85] additional tasks completed compared to group motivation.dcoral[, *t*(17.52) = 2.95, *p* = .009, 95% *CI* [1.10, 6.60]].