Matched t-Tests

class: center, middle, inverse, title-slide

# Matched t-Tests
## Cohen Chapter 11 <br><br> .small[EDUC/PSY 6600]

---

class: center, middle

## "…we are suffering from a plethora of surmise, conjecture, and hypothesis. The difficulty is to detach the framework of fact *– of absolute undeniable fact –* from the embellishments of theorists and reporters."

#### - Sherlock Holmes, *Silver Blaze*

---
## Compare Two Means

### Independent Groups *t*-Test (Chap. 7)

- Assumes two .bluer["Simple Random Samples"] (SRS)

- Each sample is at least **representive** of its corresponding population

- All members of **EACH** the samples are .dcoral[**INDEPENDENT**] of all others

- .nicegreen[**There is no connection between members of the two samples**]

### Dependent Groups *t*-Test (Chap. 11)

- The assumption of .dcoral[**independence**] has been VIOLATED, resulting in a .nicegreen[**DEPENDENCY*]* across groups

- Variance of DV smaller as groups consist of same or closely matched cases

---

## "All models are wrong, some models are useful."

The first record of .nicegreen[George Box] saying .dcoral[**"all models are wrong"**] is in a 1976 paper published in the *Journal of the American Statistical Association*. The 1976 paper contains the aphorism twice. The two sections of the paper that contain the aphorism are copied below.

<br>

--
**Parsimony**

Since all models are wrong the scientist cannot obtain a "correct" one by excessive elaboration. On the contrary following .nicegreen[William of Occam] he should seek an economical description of natural phenomena. Just as the ability to devise .dcoral[**simple but evocative models**] is the signature of the great scientist so .bluer[overelaboration] and .bluer[overparameterization] is often the mark of mediocrity.

**Worrying Selectively**

Since all models are wrong the scientist must be alert to .dcoral[**what is importantly wrong*]**. It is inappropriate to be concerned about mice when there are tigers abroad.

--
.centered[.dcoral[.large[
**IF** there is correlation, **THEN** your model should leverage it!
]]]

---

.bluer[**Observational - Longitudinal Repeated Measures: Collect Same Measure Before-and-After**]

Dr. Filburn wishes to assess the **effectiveness** of a leadership workshop for 60 middle managers. The 60 managers are rated by their immediate supervisors on the Leadership Rating Form, .coral[**BEFORE**] **and** .nicegreen[**AFTER**] the workshop.

.bluer[**Observational - Repeated Measures: Simultaneously Collected Different Measures**]

Dr. Clarke is interested in determining if workers are more concerned with job security **or** pay. He gains the cooperation of 30 individuals who work in different settings and asks each employee to rate his or her concern about **BOTH**.coral[**SALARY LEVEL**] **and** .nicegreen[**JOB SECURITY**], on a scale from 1 to 10.

.bluer[**Observational - Pairs: Pre-existing**]

Dr. Gale questions whether husbands **or** wives with infertility problems feel equally anxious. She recruits 24 infertile couples and then administers the Infertility Anxiety Measure (IAM) to both the .coral[**HUSBANDS**] **and** the .nicegreen[**WIVES**].

.bluer[**Experimental - Matched Pairs: Randomized within dyads**]

Dr. Smith has developed a new strategy for teaching fractions to second graders.  **Before** assigning the 30 students to receive group instruction **EITHER** by the .coral[**OLD WAY**] **OR** .nicegreen[**NEW WAY**], the researcher **first MATCHES** the students into pairs with **similar prior math achievement**.  One randomly selected student in **each PAIR** is selected to receive the new method while the other is taught be the old method.

---

---
## Comparing Means: Matched or Paired *t*-Test

> What is the difference from an independent groups *t*-Test?

.pull-left[
**1) Incorporation of Correlation**

- eliminates variance from .dcoral[extraneous factors]

- The stronger the correlation, the smaller the variability in the difference scores

- The denominator of the *t*-statistic is smaller

.nicegreen[**observed *t* value =  MORE EXTREME**]

]

.pull-right[
**2) Reduction in Degrees of Freedom**

Sample size is the number of .dcoral[PAIRS] not the number of observations.

- The degrees of freedom are half

.nicegreen[**critical *t* = little more extreme**]

]

dcoral[.large[
> **IF** there is correlation...  
>   
> **THEN** a matched/paired *t*-Test is more **powerful** than ignoring the correlation and erroneously performing an independent-groups *t*-Test.

]]

---

## Observational Studies:

.dcoral[**Pre-existing Pairs**] - naturally *related*, *correlated*, or *dependent* based on the .nicegreen[**nature of the situation**]

- married couples madeup of husbands & wives
- parent-child dyads
   
--

.dcoral[**Repeated Measures**] - a single sample where each person has .nicegreen[**two measurements**]

.pull-left[
- **Before-&-After** study of the same measure  
   
- Same measure under **different stimuli**
   
- Different measures administered **successively**  
   
- Different measures administered **simultaneously**  
]

.pull-right[
- math achievement pre-test & post-test

- balance *'sway'* measured in the dark & light

- Depression inventory & Anxiety inventory

- Conner's ADHD Scale: sub-scores for inattention & hyperactivity

]

---
### Observational Studies: Before-&-After Design

.large[
> There is .nicegreen[**No control group**] and .nicegreen[**Only 2 time points**]
]

- .dcoral[History] – Experiences outside the study may affects the measurements before and after a treatment

- .dcoral[Maturity] – Biological changes in participants affect the measurements before and after a treatment

- .dcoral[Attrition] – Any individual that leaves the study before a post-measurement can be taken is excluded

- .dcoral[Regression to the mean] – People who score extremely high or low on some measurement have a tendency to score closer to the average next time, despite the treatment they partake in.

- .dcoral[All change is TRUE change] - measurements have random fluctuation/error

- .dcoral[Ceiling and Floor effects] - If before or after scores are skewed, then the change scores will not be normally distributed

---
### Repeated-Measures: Successive designs non-longitudinal

- **Cross-over designs** each participant gets **BOTH** conditions
   - Order effects? like fatigue or learning
   - IF so, .dcoral[counterbalance] order
    
--

- .dcoral[**Counterbalancing**] random subset of cases 
    - Half get: A then B
    - Half get: B then A
    - May not eliminate **carry-over effects**
    - May need a **Wash-out period**

---

## Experimental Studies: Cross-over Design

.nicegreen[**Why Bother?**]

Both "groups" are made up of the same participants, so in a way .nicegreen[each subject acts as their own control].

- minimizes the risk of confounding

- requires .dcoral[less study participants] than 2 independent groups

<br>

.nicegreen[**Potential Problems**]

Each study .dcoral[participant is required to do twice as much].  This can lead to .nicegreen[fatigue and attrition].

- hard to know **how long** the .dcoral[wash-out period] should be

- risk of **lingering** .dcoral[carry-over effects], effects might not fully wash-out ever

- entire process takes .dcoral[more time!]

---

.pull-left[
### Simple Randomization

> Use an Indepdended groups *t*-Test

For example, a lot of outcomes are gender and age specific. Therefore, matching individuals on these 2 variables will help .dcoral[improve the validity of the study by reducing bias].
]

.pull-right[
### Matching Pairs Design

> Use a Matched-Pairs *t*-Test

]

---

## Experimental Studies: The Matched Pairs Design

**Potential Problems**

- What characteristics are .dcoral[relevant] to match on?  How .dcoral[many] variables should you match no? Can they be .dcoral[reliably measured]?

- Does your sample contain participants that .dcoral[actually match well]?  The more variables you match on, the higher the risk some people will be poorly matched up or not even match anyone.

- You must exclude people who do not have a .nicegreen[*'good match'*], thus reducing your sample size used and statistical power.

- You can not randomize subjects in rolling-enrollment.  You must .dcoraql[wait] till all subjects are enrolled and you have gathered the matching variables BEFORE you can randomize into groups.

.large[
> .dcoral[Picking the wrong matching variables is problematic as it is **IRREVERSIBLE**. In other words, we CANNOT explore alternative causal hypotheses since the design is definitive and CANNOT be changed.]
]
---

---

---

---
.pull-left[
### .dcoral[Direct Difference Approach]

- **Variables**: subtract each pair of values
    + Person 1: `$D = x_1 - x_2$`
    + Person 2: `$D = x_1 - x_2$`
    + Person 3: `$D = x_1 - x_2$`
    + `$\dots$`
    
   
- **Summary Stats**: for the .dcoral[DIFFERENCES]
    + M: `$\overline{D}$`
    + SD: `$s_D$`
    + *correlation is just descriptive*
    
- **Test Statistic**:  .dcoral[*regular* One-Sample t-Test]    
    +  Degrees of Freedom: `$df = n - 1$`
    
`$$t=\frac{\overline{D}-\mu_0}{\frac{s_D}{\sqrt{n}}}$$`

]

.pull-right[
### .nicegreen[Correlation Approach]

- **Variables**: keep the values separate
  + Person 1: `$x_1$`, `$x_2$`
  + Person 2: `$x_1$`, `$x_2$`
  + Person 3: `$x_1$`, `$x_2$`
  + `$\dots$`
    
- **Summary Stats**: for .nicegreen[EACH VARIABLE]
    + M: `$\overline{x_1}$`, `$\overline{x_2}$`
    + SD: `$s_1$`, `$s_2$`
    + COR: `$r$`
    
- **Test Statistic**:   .nicegreen[*adjusted* Two-Sample t-Test]   
    + Degrees of Freedom: `$df = n - 1$`

`$$t=\frac{\overline{D}-\mu_0}{\sqrt{\frac{s_1^2 + s_2^2}{n}- \frac{2rs_1s_2}{n}}}$$`

]

---

---
class: inverse, center, middle

# Let's Apply This to the Cancer Dataset

### Matched *t* Tests

---
# Read in the Data

```r
library(tidyverse)    # Loads several very helpful 'tidy' packages
library(haven)        # Read in SPSS datasets
library(furniture)    # Nice tables (by our own Tyson Barrett)
library(psych)        # Lots of nice tid-bits
```

```r
cancer_raw <- haven::read_spss("cancer.sav")
```

### And Clean It

```r
cancer_clean <- cancer_raw %>% 
  dplyr::rename_all(tolower) %>% 
  dplyr::mutate(id = factor(id)) %>% 
  dplyr::mutate(trt = factor(trt,
                             labels = c("Placebo", 
                                        "Aloe Juice"))) %>% 
  dplyr::mutate(stage = factor(stage))
```

---

## The Cancer Dataset

- `id` indicates the participant number
- `totalcin` as the first measurement, a.k.a. the **pre-test**
- `totalcw6` as the last measurement, a.k.a. the **post-test**

> IGNORE all other variables and other time points for now...

<div id="htmlwidget-39189cf8416f35c1cb29" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-39189cf8416f35c1cb29">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25"],["1","5","6","9","11","15","21","26","31","35","39","41","45","2","12","14","16","22","24","34","37","42","44","50","58"],["Placebo","Placebo","Placebo","Placebo","Placebo","Placebo","Placebo","Placebo","Placebo","Placebo","Placebo","Placebo","Placebo","Placebo","Aloe Juice","Aloe Juice","Aloe Juice","Aloe Juice","Aloe Juice","Aloe Juice","Aloe Juice","Aloe Juice","Aloe Juice","Aloe Juice","Aloe Juice"],[52,77,60,61,59,69,67,56,61,51,46,65,67,46,56,42,44,27,68,77,86,73,67,60,54],[124,160,136.5,179.6,175.8,167.6,186,158,212.8,189,149,157,186,163.8,227.2,162.6,261.4,225.4,226,164,140,181.5,187,164,172.8],["2","1","4","1","2","1","1","3","1","1","4","1","1","2","4","1","2","1","4","2","1","0","1","2","4"],[6,9,7,6,6,6,6,6,6,6,7,6,8,7,6,4,6,6,12,5,6,8,5,6,7],[6,6,9,7,7,6,11,11,9,4,8,6,8,16,10,6,11,7,11,7,7,11,7,8,8],[6,10,17,9,16,6,11,15,6,8,11,9,9,9,11,8,11,6,12,13,7,16,7,16,10],[7,9,19,3,13,11,10,15,8,7,11,6,10,10,9,7,14,6,9,12,7,null,7,null,8]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>id<\/th>\n      <th>trt<\/th>\n      <th>age<\/th>\n      <th>weighin<\/th>\n      <th>stage<\/th>\n      <th>totalcin<\/th>\n      <th>totalcw2<\/th>\n      <th>totalcw4<\/th>\n      <th>totalcw6<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":[3,4,6,7,8,9]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>

---

### Example 1) Data Wrangling - Create a "DIFFERENCE" score

> .bluer[One line PER PERSON...time points side-by-side on the same line]

```r
cancer_new <- cancer_clean %>% 
  dplyr::mutate(`totalc_diff = totalcw6 - totalcin`) %>%   # gain score = difference
  dplyr::filter(complete.cases(totalcin, totalcw6)) %>%   # Requires complete data
  dplyr::select(id, totalcin, totalcw6, `totalc_diff`)
```

<div id="htmlwidget-ba85203fd8a9f1ef8207" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-ba85203fd8a9f1ef8207">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23"],["1","5","6","9","11","15","21","26","31","35","39","41","45","2","12","14","16","22","24","34","37","44","58"],[52,77,60,61,59,69,67,56,61,51,46,65,67,46,56,42,44,27,68,77,86,67,54],[124,160,136.5,179.6,175.8,167.6,186,158,212.8,189,149,157,186,163.8,227.2,162.6,261.4,225.4,226,164,140,187,172.8],[6,9,7,6,6,6,6,6,6,6,7,6,8,7,6,4,6,6,12,5,6,5,7],[7,9,19,3,13,11,10,15,8,7,11,6,10,10,9,7,14,6,9,12,7,7,8],[1,0,12,-3,7,5,4,9,2,1,4,0,2,3,3,3,8,0,-3,7,1,2,1]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>id<\/th>\n      <th>age<\/th>\n      <th>weighin<\/th>\n      <th>totalcin<\/th>\n      <th>totalcw6<\/th>\n      <th>totalc_diff<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":[2,3,4,5,6]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>

---

### Example 1) Data Wrangling - Restructure to "LONG" format

> .bluer[One line PER TIME POINT per person...each person will have TWO lines]

```r
cancer_long <- cancer_new %>% 
  tidyr::pivot_longer(cols = c(`totalcin`, `totalcw6`),
                      names_to = c(".value", `"time"`),
                      names_pattern = "(.*)(..)") %>% 
  dplyr::mutate(time = factor(time) %>% 
                  forcats::fct_recode("Intake" = "in", "Week 6" = "w6"))
```

<div id="htmlwidget-b68f8b6318bea24a369b" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-b68f8b6318bea24a369b">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46"],["1","1","5","5","6","6","9","9","11","11","15","15","21","21","26","26","31","31","35","35","39","39","41","41","45","45","2","2","12","12","14","14","16","16","22","22","24","24","34","34","37","37","44","44","58","58"],[52,52,77,77,60,60,61,61,59,59,69,69,67,67,56,56,61,61,51,51,46,46,65,65,67,67,46,46,56,56,42,42,44,44,27,27,68,68,77,77,86,86,67,67,54,54],[124,124,160,160,136.5,136.5,179.6,179.6,175.8,175.8,167.6,167.6,186,186,158,158,212.8,212.8,189,189,149,149,157,157,186,186,163.8,163.8,227.2,227.2,162.6,162.6,261.4,261.4,225.4,225.4,226,226,164,164,140,140,187,187,172.8,172.8],["Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6","Intake","Week 6"],[6,7,9,9,7,19,6,3,6,13,6,11,6,10,6,15,6,8,6,7,7,11,6,6,8,10,7,10,6,9,4,7,6,14,6,6,12,9,5,12,6,7,5,7,7,8]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>id<\/th>\n      <th>age<\/th>\n      <th>weighin<\/th>\n      <th>time<\/th>\n      <th>totalc<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":[2,3,5]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>

---

### Example 1) Both Time points: Histograms & Boxplots

.pull-left[

**Separate Histograms**
 
> Ignores Pairing!

]

.pull-right[

**Side-by-Side Boxplots**

> Ignores pairing!

]

---

### Example 1) Both Time points: Line Plot and Scatter Plot

.pull-left[

**Line Plot for Paired t-Test**
 
> Is there a Mean Difference?

]

.pull-right[

**Scatter Plot for Correlation/Regression**
 
> Is there a Correlation?

]

---

### Example 1) Change Scores: Histograms & Boxplots

.pull-left[

**Histogram of Change in Scores**

]

.pull-right[

**Boxplot of Change in Scores**

]

---
### Example 1) Summary Statistics

.pull-left[
**Means and Standard Deviations**

```r
cancer_new %>% 
  `furniture::table1`("Pre"    = totalcin, 
                    "Post"   = totalcw6,
                    "Change" = totalc_diff,
                    digits = 2)
```

```

--------------------------
        Mean/Count (SD/%)
        n = 23           
 Pre                     
        6.48 (1.56)      
 Post                    
        9.48 (3.49)      
 Change                  
        3.00 (3.68)      
--------------------------
```
]

.pull-right[

**Pearson's Product-Moment Correlation**

```r
cancer_new %>% 
  `cor.test`(`~` totalcin `+` totalcw6,
           data = .)
```

```

Pearson's product-moment correlation

data:  totalcin and totalcw6
t = 0.45064, df = 21, p-value = 0.6569
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.3275491  0.4902901
sample estimates:
       cor 
0.09786664 
```

]

---

### Example 1) Direct Difference Method (Ch 7. 1-sample test)

```r
cancer_new %>% 
  `dplyr::pull`(totalc_diff) %>% 
  `t.test`(`mu = 0`)
```

```

One Sample t-test

data:  .
t = 3.9092, df = 22, p-value = 0.0007524
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 1.408469 4.591531
sample estimates:
mean of x 
        3 
```

.dcoral[**Interpretation**]

> The 23 participants' oral condition was measured as intake .nicegreen[(*M* = 6.48, *SD* = 1.56)] and re-evaluated six weeks later .nicegreen[(*M* = 9.48, *SD* = 3.49)].  A paired samples t-Test on the repeated measures .nicegreen[(*r* = .098, *p* = .657)] found a this to be a statistically significantly deterioration, .dcoral[*t*(22) = 3.91, *p* < .001, 95% *CI* [1.41, 4.59]].

---

### Example 1) Paired t-Test Method (adjust indept t-test)

.pull-left[

```r
cancer_long %>% 
  `t.test`(totalc `~` time,  # DV_cont ~ IV_time
         data = .,
         `paired = TRUE`)
```
]

.pull-right[

```

Paired t-test

data:  totalc by time
t = -3.9092, df = 22, p-value = 0.0007524
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.591531 -1.408469
sample estimates:
mean of the differences 
                     -3 
```
]

.dcoral[**Interpretation**]

.dcoral[**NOTE** *We usually report positive values for the test statistic (t), as well as the confidence interval values.*]

---

### Example 1) 1-Sided Alternative and CIs

.pull-left[
.nicegreen[**Two-sided Alterative**]

```r
cancer_long %>% 
  t.test(totalc ~ time, 
         data = .,
         `alternative = "two.sided"`,
         paired = TRUE)
```

.nicegreen[**One-sided Alterative**]

```re
cancer_long %>% 
  t.test(totalc ~ time,
         data = .,
         `alternative = "less"`,
         paired = TRUE)
```

.dcoral[**NOTE** You MUST use the 2-sided test to get a meaningful Confidence Interval!]

]

.pull-right[

```

Paired t-test

```

Paired t-test

data:  totalc by time
t = -3.9092, df = 22, p-value = 0.0003762
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
     -Inf -1.68223
sample estimates:
mean of the differences 
                     -3 
```

]

---

### Example 1) Effect Size: Cohen's d for MATCHED samples

> This will be .dcoral[**bias**] for the populations effect size when based on sample data

.pull-left[
Step 1) Compute Cohen's d as **IF** we had .nicegreen[independent samples]

`$$d_{orig} = \frac{\overline{D}}{SD_{diff}}\tag{8.2}$$`

```r
d = 3/(3.68)   
d
```

```
[1] 0.8152174
```

]

.pull-right[

Step 2) .nicegreen[Adjust] for the correlation between measures (lack of independence)

`$$d_{matched} = d_{orig} \sqrt{\frac{1}{2(1 - \rho)}}\tag{11.5}$$`

```r
d*sqrt(1/(2*(1 - .098)))
```

```
[1] 0.6069531
```

]

.dcoral[**Interpretation**]

> After accounting for correlation between repeated-measures, the oral condition increased by 0.61 standard deviations.

---

### Example 1) Effect Size: Hedge's g for MATCHED samples

> This will be LESS bias for the populations effect size when based on sample data

`$$g_{matched} = t \sqrt{\frac{1}{n}}\tag{11.6}$$`
--

```r
3.9092*sqrt(1/23)
```

```
[1] 0.8151245
```

.dcoral[**Interpretation**]

After accounting for correlation between repeated-measures, the oral condition increased by 0.82 standard deviations.

.nicegreen[**Read the paragraph at the end of page 351 (after formula 11.6)!**]

**.center[Please correctly label your effect size!]**

---
class: inverse, center, middle

# Questions?

---
class: inverse, center, middle

# Next Topic

### One-Way ANOVAs