Two Independent Sample Means: t Test

class: center, middle, inverse, title-slide

# Two Independent Sample Means: t Test
## Cohen Chapter 7 .small[EDUC/PSY 6600]

---

class: center, middle

## “We cannot solve problems by using the same kind of thinking that we used when we created them.”

### -- Albert Einstein

---

## Introduction

> Same continuous (interval, ratio) .bluer[Dependent Variable (DV)] compared across 2 independent (random) samples

### .dcoral[Research Questions]

* Is there a significant .nicegreen[difference] between the 2 group means?

* Do 2 samples come from .nicegreen[different] normal distributions with the same mean?

### .dcoral[Also called]

* Independent-groups design
* Between-subjects design
* Between-groups design
* Randomized-groups design

---

---

---

### Hypothesis Test Steps

.pull-left[
1. .nicegreen[State the **Hypotheses**] 
 
2. .nicegreen[Select the **Statistical Test** & Significance Level] 
 
3. .nicegreen[Select random **samples** and collect data] 
 
4. .nicegreen[Find the **Region of Rejection**] 
 
5. .nicegreen[Calculate the **Test Statistic**] 
 
6. .nicegreen[Make the **Statistical Decision**]
 
]

.pull-left[

* Null: no mean difference .dcoral[*difference = 0*]
* Alternative: there *is* a mean difference
 
* α level, *default = .05*
* One vs. Two tails, *default = 2 tails*
 
* 2 simple random samples, *independent*
* 1 sample *divided* into independent groups
 
* Based on α & # of tails
* *only really do if working by hand*
 
* Examples include: z, t, F, χ2
* .dcoral[*use a p-value more by computer*]
 
* big p-value: "no evidence of a difference"
* tiny p-value: "evidence of a difference"
* .coral[make sure it is **in context**]
]

---

### Test Statistic Format

`$$\text{Test Statistic} = \frac{\text{Estimate} - \text{Hypothesis}}{SE}$$`
--

.pull-left[

**For a Single MEAN**

.dcoral[
`$$\text{Test Statistic} = \frac{\text{Estimate}_{\text{Mean}} - \text{Hypothesis}_\text{Mean}}{SE_{\text{Mean}}}$$`
]

* Assume the population's SD...

`$$z = \frac{\bar{x} - \mu_0}{\frac{\sigma}{\sqrt{N}}}$$`
* Use the sample's SD...

`$$t=\frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{N}}}$$`

`$$df = N - 1$$`
]

.pull-right[

**For the DIFFERENCE in Two Means**

.nicegreen[
`$$\text{Test Statistic} = \frac{\text{Estimate}_{\text{Difference}} - \text{Hypothesis}_\text{Difference}}{SE_{\text{Difference}}}$$`
]

`$$t = \frac{(\overline{x_1} - \overline{x_2}) - 0}{SED}$$`
> Note: use *z* instead of *t* **IF** N's > 100

* Null Hypothesis is .nicegreen["No Difference"]

`$$H_0:\overline{D}=0$$`
`$$H_0:\mu_1-\mu_2=0$$`

* The .coral[Standard Error for the Difference (SED)] & .coral[degrees of freedom] is more complex

]

---

### Homogeneity of Variance (HOV)

> **HOV**: both populations have the SAME spread

**Levene's Test** used to check for evidence that HOV is violated (Null = nothing going on...HOV is fine)

$$
H_0: \sigma_1 = \sigma_2
$$

$$
H_1: \sigma_1 \ne \sigma_2
$$

> Note: Levene's doesn't work as well for small samples

.pull-left[
.dcoral[**IF p-value is BIGGER than alpha**]

- *F*(1, 98) = 0.78, *p* = .377

Does .nicegreen[**NOT**] provide evidence that HOV is violated

]

.pull-right[
.dcoral[**IF p-value is SMALLER than alpha**]

- *F*(1, 77) = 4.58, *p* = .013

.nicegreen[**DOES**] provide evidence that HOV is violated

]

---

.pull-left[

### .dcoral[Pooled Variance Test]

**Requirement**

* groups have the **SAME SD's** .bluer[(HOV)]

**Default in R's `t.test()`**

* `var.equal = TRUE`

**Degrees of Freedom**

`$$df_{pooled}=n_1+n_2-2$$`

**Standard Error**

`$$SE_{pooled}=\sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}$$`

> first, pool standard deviations

`$$s_p^2=\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}$$`

]

.pull-right[

### .dcoral[Seperate Variance Test]

**Requirement**

* groups may have **different SD's**

**Optional in R's `t.test()`**

* `var.equal = FALSE`

**Degrees of Freedom**

`$$min(n_1,n_2)-1<df_{SV}<n_1+n_2-2$$`

**Standard Error**

`$$SE_{SV}=\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$$`

.nicegreen[
- Use is somewhat controversial
- Welch's *df* are more complex
- Easier to compute the SE by hand
- Use with unequal N's, especially small
]

]

---

## Confidence Intervals

> .blouer[We construct the confidence interval for the DIFFERENCE between the 2 means]

**General Form for All Confidence Intervals**

`$$\text{Point Estimate} \quad \pm \quad CV_{\text{Estimate}} \times SE_{\text{Estimate}}$$`

**General Form for CIs for the Difference in Means**

`$$(\; \overline{x_1} - \overline{x_2} \;) \quad \pm \quad t_{CV} \times SE_{Diff}$$`

.pull-left[

### .dcoral[Pooled Variance CI]

* groups have the **SAME SD's** .bluer[(HOV)]

`$$(\; \overline{x_1} - \overline{x_2} \;) \quad \pm \quad t_{CV} \times \sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}$$`
]

.pull-right[

### .dcoral[Seperate Variance CI]

* groups may have **different SD's**

`$$(\; \overline{x_1} - \overline{x_2} \;) \quad \pm \quad t_{CV} \times \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$$`

]

---

---

---

---

### Example 1) A Fully Randomized Experiment

In order to assess the efficacy of a new antidepressant drug, .dcoral[**10**] clinically depressed participants are **randomly assigned** to one of **TWO groups**.

* Assume that prior to introducing the treatments, the experimenter confirmed that the level of depression in the 2 groups was equal.

After 6 months, all participants are rated by a psychiatrist (**blind to participant assignment**) on their level of depression.

.pull-left[
.dcoral[Five] participants are assigned to Group 1, which is administered the .nicegreen[**antidepressant drug**] for 6 months.

.center[
.coral[
11, 1, 0, 2, 0
]
]

]

.pull-right[
The other .dcoral[five] participants are assigned to Group 2, which is administered a .nicegreen[**placebo**] during the same 6 month period.

.center[
.coral[
11, 11, 5, 8, 4
]
]

]

> **Research Question**: Is there evidence that the antidepressant drug reduces depression more than a placebo?

---

### Example 1) Entering Small Dataset by Hand

.pull-leftbig[
.dcoral[**Load Packages**]

```r
library(tidyverse)
library(furniture)
library(car)
```

.dcoral[**Enter the data**]

> Two people per line, but don't really pair up

```r
df1_wide <- data.frame(drug = c(11, 1, 3, 4, 0),
 placebo = c(11, 11, 5, 8, 4))
```

]

.pull-leftsmall[

.dcoral[**View the data frame**]

```r
df1_wide
```

```
  drug placebo
1   11      11
2    1      11
3    3       5
4    4       8
5    0       4
```

]

---

### Example 1) Pivoting Wide-to-Long

.pull-leftbig[

.dcoral[**Restructure the data frame**]

> We want one line per person

```r
df1_long <- df1_wide %>% 
 tidyr::pivot_longer(cols = c(drug, placebo),
 names_to = "pill",
 names_ptypes = list(group = factor()),
 values_to = "depression")
```

]

.pull-rightsmall[

.dcoral[**View the data frame**]

```r
df1_long
```

```
# A tibble: 10 x 2
 pill depression
 <chr> <dbl>
 1 drug 11
 2 placebo 11
 3 drug 1
 4 placebo 11
 5 drug 3
 6 placebo 5
 7 drug 4
 8 placebo 8
 9 drug 0
10 placebo 4
```

]

---

### Example 1) Exploratory Data Analysis

.pull-left[

.dcoral[**Summarize:** *N, M, SD*]

```r
df1_long %>% 
  dplyr::group_by(pill) %>% 
  furniture::table1(depression,
                    digits = 2,
                    na.rm = FALSE,
                    output = "markdown")
```

|            |    drug     |   placebo   |
|------------|-------------|-------------|
|            |    n = 5    |    n = 5    |
| depression |             |             |
|            | 3.80 (4.32) | 7.80 (3.27) |

.nicegreen[Note: Normality is nearly impossible to assess in very small samples]

]

.pull-right[

.dcoral[**Visualize:** *boxplots*]

```r
df1_long %>% 
  ggplot(aes(x = pill,
             y = depression)) +
  geom_boxplot()
```

]

---

### Example 1) Levene's Test for HOV

.dcoral[**BEFORE the t-test, check for violations of HOV**]

```r
df1_long %>%                            # name of the data
  car::leveneTest(depression ~ pill,   # continuous DV ~ 2-group IV
                  data = .,            # pipe the data from above
                  center = mean)       # default is median
```

```
Levene's Test for Homogeneity of Variance (center = mean)
      Df F value Pr(>F)
group  1  0.0526 0.8244
       8               
```

.dcoral[**Conclusion**]

No evidence of violations of HOV were found, *F*(1, 8) = 0.05, *p* = .824.

> Choose to do the pooled variance t-test (`var.equal = TRUE`)

---

### Example 1) Run a Pooled Variance t-Test

.dcoral[**Run the POOLED variance t-test**]

```r
df1_long %>%                    # name of the data
  t.test(depression ~ pill,    # continuous DV ~ 2-group IV        
         data = .,             # pipe the data from above
         var.equal = TRUE)     # do the pooled-variance version
```

```

Two Sample t-test

data:  depression by pill
t = -1.6496, df = 8, p-value = 0.1376
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -9.591763  1.591763
sample estimates:
   mean in group drug mean in group placebo 
                  3.8                   7.8 
```

.dcoral[**Conclusion**]

No evidence of a difference in depression was found, *t*(8) = 1.80, *p* = .110.

---

### Example 1) Writing Methods & Results

.dcoral[**Methods Section - analysis**]

To test the effectiveness of the drug at reducing depression, an .nicegreen[**independent samples t-test**] was performed, with pill type *(antidepressant drug vs. placebo)* functioning as the .nicegreen[independent variable] and depression score as the .nicegreen[dependent variable].

.nicegreen[**Levene's test**] assessed for the assumption of homogeneity of variance to determine if a .nicegreen[pooled] or .nicegreen[seperate variance] test should be performed.

--

.dcoral[**Results Section**]

After six months, the .nicegreen[five] participants taking the drug scored numerically lower on the depression scale .dcoral[(*M* = 2.80, *SD* = 4.66)], compared their .nicegreen[five] counter parts taking placebo .dcoral[(*M* = 7.80, *SD* = 3.27)].

The assumption of homogeneity of variances was tested and satisfied.dcoral[, *F*(1, 8) = 0.05, *p* = .824], thus the pooled test was conducted.

No evidence was found to support the claim that this antidepressant change depression.dcoral[, *t*(8) = 1.80, *p* = .138].

---

## Assumptions of a 2-sample t-Test

.large[**1. BOTH Samples were drawn at .dcoral[RANDOM] -AND- are .dcoral[INDEPENDENT] of each other** *(at least as representative as possible)*]

- Nothing can be done to fix NON-representative samples & you can .bluer[**NOT**] statistically test 
- If not independent samples, use a *Paired-Samples t-Test*

.large[**2. The Continuous DV follows a .dcoral[NORMAL] distribution in BOTH populations**]

- .bluer[**NOT**] as important if the sample is large, due to the **Central Limit Theorem**
- .bluer[**CAN**] statistically test:
 - Visual inspection of a .nicegreen[histogram], .nicegreen[boxplot], and/or .nicegreen[QQ plot] *(straight 45 degree line)*
 - Calculate the Skewness & Kurtosis... less clear guidelines
 - Conduct .nicegreen[Shapiro-Wilks] test *(p < .05 ??? not normal)*

.large[**3. .dcoral[Homogeneity of Variance] (HOV)**]

- BOTH populations have the .bluer[SAME] spreads (*SD*s)
- Use .dcoral[Levene's Test] to assess for HOV (Null Hypothesis is HOV is valid)

---

## Violating the Assumptions

.pull-left[
.dcoral[**Equal Size Groups**] violations "hurt" less

Only violate .nicegreen[HOV]
- Small effect: *p* off by `$\pm.02$`

Only violate .nicegreen[Normaity]
- Small effect: *p* off by `$\pm.02$`
- .bluer[**HOWEVER**: If samples are highly skewed or are skewed in opposite directions p-values can be **very** inaccurate!]

Violate .nicegreen[Both]
- Moderate effects if N is large, p-value can be inaccurate
- Large effects if N is small, p-value can be **very** inaccurate

]

.pull-right[
.dcoral[**UN-equal Size Groups**] violations have "BIG" impacts

Only violate .nicegreen[HOV]
- Large effect

Only violate .nicegreen[Normaity]
- Large effect

Violate .nicegreen[Both]
- Huge effect

> .bluer[p-values can be **very** inaccurate with unequal *n*s AND violations of assumptions, especially when *N* is small]

]

---

## Alternatives for when you Violate Assumptions

Violation of .dcoral[Normality] or your DV is .dcoral[ordinal]...

> None of these are covered in this class

- Two Sample Wilcoxon test *(aka, .nicegreen[Mann-Whitney U Test])*

- Sample re-use methods

- Rely on empirical, rather than theoretical, probability distributions

* Exact statistical methods 
    * Permutation and randomization tests
    * Bootstrapping

---

## Random Assignment & Limits on Conclusions

Random assignment to groups DECREASES experimenter biases & confounding
- Cases are enumerated and numbers drawn to assign groups

> Randomization does NOT ensure **equality** of group characteristics, but does allow for group differences to be **attributed** to the randomized factor.

**True Experiment**

- Random assignment & manipulation of IV
- Even better if it is .nicegreen[double blind]

**Quasi-experiment**

- Either randomization or manipulation
- Could randomize groups or clusters

**Non-experiment or Observational Study**

- Neither randomization or manipulation
- Participants self-select or form naturally occurring groups

---

### Example 2) A Fully Randomized Experiment

An industrial psychologist is investigation the effects of different types of **motivation** on the performance of simulated clerical **tasks**.

The **ten** participants in the .coral[“individual motivation”] sample are told that they will be rewarded according to how many tasks .dcoral[**they**] successfully complete.

The **12** participants in the .coral[“group motivation”] sample are told that they will be rewarded according to the average number of tasks completed .dcoral[**by all the participants**] in their sample.

The number of tasks completed by each participant are as follows:

* Individual Motivation (`Self`):  	.nicegreen[11, 17, 14, 12, 11, 15, 13, 12, 15, 16]
* Group Motivation (`Collective`):       	.nicegreen[10, 15, 4, 8, 9, 14, 6, 15, 7, 11, 13, 5]

> **Research Question**: Is there evidence that performance on clerical tasks is effected by the type of motivation?

---

### Example 2) Entering Small Dataset by Hand

> Do NOT worry about this...I will do this part in the assignments when needed

```r
df_self <- data.frame(motivation = "Self",
 tasks_done = c(11, 17, 14, 12, 11, 15, 13, 12, 15, 16))
```

```r
df_coll <- data.frame(motivation = "Collective",
 tasks_done = c(10, 15, 4, 8, 9, 14, 6, 15, 7, 11, 13, 5))
```

```r
df2_long <- dplyr::full_join(df_self, df_coll) %>% 
 dplyr::mutate(motivation = factor(motivation))
```

```r
tibble::glimpse(df2_long)
```

```
Rows: 22
Columns: 2
$ motivation <fct> Self, Self, Self, Self, Self, Self, Self, Self, Self, Se...
$ tasks_done <dbl> 11, 17, 14, 12, 11, 15, 13, 12, 15, 16, 10, 15, 4, 8, 9,...
```

---

### Example 2) Exploratory Data Analysis

.pull-left[

.dcoral[**Summarize:** *N, M, SD*]

```r
df2_long %>% 
  dplyr::group_by(motivation) %>% 
  furniture::table1(tasks_done,
                    digits = 2,
                    output = "markdown")
```

|            | Collective  |     Self     |
|------------|-------------|--------------|
|            |   n = 12    |    n = 10    |
| tasks_done |             |              |
|            | 9.75 (3.89) | 13.60 (2.12) |

]

.pull-right[

.dcoral[**Visualize:** *boxplots*]

```r
df2_long %>% 
  ggplot(aes(x = motivation,
             y = tasks_done)) +
  geom_boxplot()
```

]

---

### Example 2) Levene's Test for HOV

.dcoral[**BEFORE the t-test, check for violations of HOV**]

```r
df2_long %>%                                 # name of the data
  car::leveneTest(tasks_done ~ motivation,   # continuous DV ~ 2-group IV
                  data = .,                  # pipe the data from above
                  center = mean)             # default is median
```

```
Levene's Test for Homogeneity of Variance (center = mean)
      Df F value  Pr(>F)  
group  1  4.8287 0.03994 *
      20                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

.dcoral[**Conclusion**]

Evidence found of violations of HOV, *F*(1, 20) = 4.83, *p* = .040.
Also, samples are both small and unequal in size.

> Choose to do the seperate variance t-test (`var.equal = FALSE`)

---

### Example 2) Run a Seperate Variance t-Test

.dcoral[**Run the SEPERATE variance t-test**]

```r
df2_long %>%                        # name of the data
  t.test(tasks_done ~ motivation,   # continuous DV ~ 2-group IV        
         data = .,                  # pipe the data from above
         var.equal = FALSE)         # do the seperate-variance version
```

```

Welch Two Sample t-test

data:  tasks_done by motivation
t = -2.9456, df = 17.518, p-value = 0.008833
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -6.601413 -1.098587
sample estimates:
mean in group Collective       mean in group Self 
                    9.75                    13.60 
```

.dcoral[**Conclusion**]

Evidence of a difference in performace was found, *t*(17.52) = 2.95, *p* = .009.

---

### Example 2) Writing Methods & Results

.dcoral[**Methods Section - analysis**]

To test if performance on clerical tasks is effected by the type of motivation, an .nicegreen[**independent samples t-test**] was performed, with type of motivation *(self vs. collective)* functioning as the .nicegreen[independent variable] and number of tasks completed as the .nicegreen[dependent variable].

.nicegreen[**Levene's test**] assessed for the assumption of homogeneity of variance to determine if a .nicegreen[pooled] or .nicegreen[seperate variance] test should be performed.

--

.dcoral[**Results Section**]

The individually motivated participants completed more clerical tasks .dcoral[(*n* = 10, *M* = 13.60, *SD* = 2.12)], compared to the participants being motivated by their group's collective results .dcoral[(*n* = 12, *M* = 9.75, *SD* = 3.89)].

The assumption of homogeneity of variances was tested and evidence of violations were found.dcoral[, *F*(1, 20) = 4.83, *p* = .040], thus the seperated variance test was conducted.

Statistically significant evidence was found to support the claim that type of motivation effects performance.  Thus, individual motivation does result in a mean .coral[3.85] additional tasks completed compared to group motivation.dcoral[, *t*(17.52) = 2.95, *p* = .009, 95% *CI* [1.10, 6.60]].