Center and Spread

class: center, middle, inverse, title-slide

# Center and Spread
## Cohen Chapter 3 <br><br> .small[EDUC/PSY 6600]

---

class: center, middle

## "You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. *Individuals vary*, but percentages remain constant. So says the statistician." 
### -- Sherlock Holmes, *The Sign of Four*

---
background-image: url(figures/fig_dist_examples.png)
background-position: 50% 90%
background-size: 750px

# Distributions Examples

---
background-image: url(figures/fig_3centers.png)
background-position: 50% 70%
background-size: 1000px

# Three Measures of Center

---

## Areas in Distribution Plot

.pull-left[

Frequency Histogram
<img src="figures/textbook_fig_3.2.PNG" width="833" style="display: block; margin: auto;" />

]

.pull-right[
Frequency Polygon
<img src="figures/textbook_fig_3.3.PNG" width="991" style="display: block; margin: auto;" />
]

---
background-image: url(figures/fulcrum.png)
background-position: 50% 80%
background-size: 850px

# Mean vs. Median

.large[.large[
.nicegreen[Median]: the center point, half of values are on each side, not affected by the skew, the "typical value"

.dcoral[Mean]: the "balance" point, pulled to the side of the skew, not typical

<br><br><br>
]]

.large[If distribution is symmetrical: mean = median]

---
background-image: url(figures/fig_dist_income_2010.png)
background-position: 50% 70%
background-size: 1000px

---
# Distributions and Numbers

.pull-left[
.large[
- The MEDIAN is **resistant** & doesn't change much
- The MEAN is **influenced** & changes more!
- Average does NOT mean typical
- Average moves when we remove the high point
]]

.pull-right[
<img src="ch3_center_spread_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" />
]

---
# Distributions and Numbers

.pull-left[
.large[
- The MEDIAN is **resistant** & doesn't change much
- The MEAN is **influenced** & changes more!
- Average does NOT mean typical
- Average moves when we remove the high point
- Median doesn't move when we remove the high point
]]

.pull-right[
<img src="ch3_center_spread_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" />
]

---
background-image: url(figures/fig_three_spreads.jpg)
background-position: 50% 70%
background-size: 1100px

# Three Measures of Spread

---
# Best Summary of the Data?

.huge[
"... the perfect estimator does not exist." -- Rand Wilcox, 2001
]

.pull-left[
.large[
## .bluer[Median and SIR]

Skewed data or outliers
]]

.pull-right[
.large[
## .nicegreen[Mean and SD]

Symmetrical and no outliers
]]

<br>

.large[.large[
A .dcoral[graph gives the best overall picture of a distribution]
]]

---

## Properties of the Mean and SD

.pull-left[

Add 10 to Every Value

]

.pull-right[

Multiply Every Value by 10

]

---
# Skewness

$$
Skewness = \frac{N}{N - 2}\frac{\sum_{i=1}^n (X_i - \bar{X})^3}{(N - 1)s^3}
$$

.pull-left[
.large[
- Degree of .dcoral[symmetry]
- Can detect **visually**
- Skewness statistic
    - Based on cubed deviations from the mean
    - Divided by SE of skewness
    - `$> \pm 2$` is a sign of skewed data

]]

.pull-right[
.large[

- Interpreting skewness statistic
    - Pos value = positive skew
    - Neg value = negative skew
    - Zero = no skew
]]

---
# Skewness

---
# Kurtosis

$$
Kurtosis = \frac{N(N+1)}{(N - 2)(N - 3)}\frac{\sum_{i=1}^n (X_i - \bar{X})^4}{(N - 1)s^4} - 3 \frac{(N - 1)(N - 1)}{(N - 2)(N - 3)}
$$

.pull-left[
.large[
- Degree of .dcoral[flatness] in distribution
- Harder to detect visually
- Kurtosis statistic
    - Based on deviations from the mean (raised to 4th power)
    - Divided by SE of kurtosis
    - `$> \pm 2$` is a sign of problems with kurtosis
]]

.pull-right[
.large[
- Interpreting kurtosis statistic
    - Pos value = peaked (leptokurtic)
    - Neg value = flat (platykurtic)
    - Zero = normal (mesokurtic)
]]

---
background-image: url(figures/fig_kurtosis.png)
background-position: 50% 70%
background-size: 1000px

# Kurtosis

---
## [Are the Skewness and Kurtosis Useful Statistics?](https://www.spcforexcel.com/knowledge/basic-statistics/are-skewness-and-kurtosis-useful-statistics)

The .coral[skewness] and .coral[kurtosis] statistics appear to be very dependent on the sample size. In fact, even several hundred data points didn't give very good estimates of the true kurtosis and skewness. Smaller sample sizes can give results that are very misleading.

.large[
.nicegreen[

> "In short, skewness and kurtosis are practically **worthless**."

> "The statistics for skewness and kurtosis simply do not provide any useful information beyond that already given by the measures of location *(center)* and dispersion *(spread)*."

]
]

So, don't put much emphasis on skewness and kurtosis values you may see. And remember, the more data you have, the better you can describe the shape of the distribution.  But, in general, it appears there is little reason to pay much attention to skewness and kurtosis statistics.

.coral[ **Just look at the histogram.  It often gives you all the information you need.** ]

---

---
background-image: url(figures/fig_5sum_2.png)
background-position: 50% 50%

# Five-Number Summary

---
background-image: url(figures/fig_5sum_3.png)
background-position: 50% 50%

# Five-Number Summary - Median

---
background-image: url(figures/fig_5sum_4.png)
background-position: 50% 50%

# Five-Number Summary - Quartiles

---
background-image: url(figures/fig_5sum_5.png)
background-position: 50%50%

# Boxplots (Modified) - Lines

---
background-image: url(figures/fig_5sum_6.png)
background-position: 50% 50%

# Boxplots (Modified) - IQR and SIQR

---
background-image: url(figures/fig_boxplot_hist.png)
background-position: 50% 70%
background-size: 1000px

# Boxplot vs. Histogram

---
# Boxplots by Group

---
# Density Plots

---
# Quantile-Quantile (Q-Q) Plot

---
class: inverse, center, middle

# Interactive Apps

[Describing and Exploring Quantitative Variables](https://istats.shinyapps.io/EDA_quantitative/ )

[Mean versus Median](https://istats.shinyapps.io/MeanvsMedian/)

---
class: inverse, center, middle

# Let's Apply This To the Cancer Dataset <br> (on Canvas)

---
# Read in the Data

```r
library(tidyverse)    # Loads several very helpful 'tidy' packages
library(haven)        # Read in SPSS datasets
library(furniture)    # Nice tables (by our own Tyson Barrett)
library(psych)        # Lots of nice tid-bits
```

```r
cancer_raw <- haven::read_sav("cancer.sav")
```

--
### And Clean It

```r
cancer_clean <- cancer_raw %>% 
  dplyr::rename_all(tolower) %>% 
  dplyr::mutate(id = factor(id)) %>% 
  dplyr::mutate(trt = factor(trt,
                             levels = c(0, 1),
                             labels = c("Placebo", 
                                        "Aloe Juice"))) %>% 
  dplyr::mutate(stage = factor(stage))
```

---
## Frequency Tables with `furniture::tableF()`

.pull-left[

```r
cancer_clean %>%
  furniture::tableF(age, n = 8)
```

```

----------------------------------
 age Freq CumFreq Percent CumPerc
 27  1    1       4.00%   4.00%  
 42  1    2       4.00%   8.00%  
 44  1    3       4.00%   12.00% 
 46  2    5       8.00%   20.00% 
 ... ...  ...     ...     ...    
 68  1    20      4.00%   80.00% 
 69  1    21      4.00%   84.00% 
 73  1    22      4.00%   88.00% 
 77  2    24      8.00%   96.00% 
 86  1    25      4.00%   100.00%
----------------------------------
```
]

.pull-right[

```r
cancer_clean %>%
  furniture::tableF(trt)
```

```

-----------------------------------------
 trt        Freq CumFreq Percent CumPerc
 Placebo    14   14      56.00%  56.00% 
 Aloe Juice 11   25      44.00%  100.00%
-----------------------------------------
```
]

---
## Extensive Descriptive Stats: `psych:describe()`

```r
cancer_clean %>% 
  dplyr::select(age, weighin, totalcin, totalcw2, totalcw4, totalcw6) %>%
  psych::describe()
```

```
         vars  n   mean    sd median trimmed   mad min   max range  skew
age         1 25  59.64 12.93   60.0   59.95 11.86  27  86.0  59.0 -0.31
weighin     2 25 178.28 31.98  172.8  176.57 21.05 124 261.4 137.4  0.73
totalcin    3 25   6.52  1.53    6.0    6.33  0.00   4  12.0   8.0  1.80
totalcw2    4 25   8.28  2.54    8.0    8.10  2.97   4  16.0  12.0  1.01
totalcw4    5 25  10.36  3.47   10.0   10.19  2.97   6  17.0  11.0  0.49
totalcw6    6 23   9.48  3.49    9.0    9.21  2.97   3  19.0  16.0  0.77
         kurtosis   se
age         -0.01 2.59
weighin      0.07 6.40
totalcin     4.30 0.31
totalcw2     1.14 0.51
totalcw4    -1.00 0.69
totalcw6     0.53 0.73
```

---
## Brief Descriptive Stats: `furniture::table1()`

.pull-left[
Defaults

```r
cancer_clean %>%
  furniture::table1(trt, age, weighin)
```

```

---------------------------------
               Mean/Count (SD/%)
               n = 25           
 trt                            
    Placebo    14 (56%)         
    Aloe Juice 11 (44%)         
 age                            
               59.6 (12.9)      
 weighin                        
               178.3 (32.0)     
---------------------------------
```
]

.pull-left[
Add Text

```r
cancer_clean %>%
  furniture::table1("Treatment" = trt, 
                    "Age, years" = age, 
                    "Weight, lbs" = weighin)
```

```

---------------------------------
               Mean/Count (SD/%)
               n = 25           
 Treatment                      
    Placebo    14 (56%)         
    Aloe Juice 11 (44%)         
 Age, years                     
               59.6 (12.9)      
 Weight, lbs                    
               178.3 (32.0)     
---------------------------------
```
]

---
## Stratified Stats: `furniture::table1()`

.pull-left[
Defaults, but increase the number of digits

```r
cancer_clean %>%
  dplyr::group_by(trt) %>%        
  furniture::table1(age, weighin,
                    digits = 2)
```

```

---------------------------------------
                    trt 
         Placebo        Aloe Juice    
         n = 14         n = 11        
 age                                  
         59.79 (8.98)   59.45 (17.22) 
 weighin                              
         167.51 (23.01) 191.99 (37.37)
---------------------------------------
```
]

.pull-right[
Add Text

```r
cancer_clean %>%
  dplyr::group_by("Treatment" = trt) %>%        
  furniture::table1("Age, years" = age, 
                    "Weight, lbs" = weighin,
                    total = TRUE)
```

```

----------------------------------------------------
                               Treatment 
             Total        Placebo      Aloe Juice  
             n = 25       n = 14       n = 11      
 Age, years                                        
             59.6 (12.9)  59.8 (9.0)   59.5 (17.2) 
 Weight, lbs                                       
             178.3 (32.0) 167.5 (23.0) 192.0 (37.4)
----------------------------------------------------
```
]

---
## Boxplot, one one `geom_boxplot()`

```r
cancer_clean %>%
  ggplot(aes(x = "Full Sample",   # x = "quoted text"
             y = age)) +          # y = contin_var (no quotes)
  geom_boxplot()
```

---
## Boxplots, by groups - (1) fill color

```r
cancer_clean %>%
  ggplot(aes(x = "Full Sample",           # x = "quoted text"
             y = age,                     # y = contin_var (no quotes)
             fill = trt)) +               # fill = group_var (no quotes) 
  geom_boxplot()
```

---
## Boxplots, by groups - (2) x-axis breaks

```r
cancer_clean %>%
  ggplot(aes(x = trt,             # x = group_var (no quotes)  
             y = age)) +          # y = contin_var (no quotes)
  geom_boxplot()
```

---
## Boxplots, by groups - (3) seperate panels

```r
cancer_clean %>%
  ggplot(aes(x = "Full Sample",   # x = "quoted text"
             y = age)) +          # y = contin_var (no quotes)
  geom_boxplot() +
  facet_grid(. ~ trt)             # . ~ group_var (no quotes)
```

---
## Boxplot for a Subset - 1 requirement

```r
cancer_clean %>%                # Less than 172 Pound at baseline
  dplyr::filter(weighin < 172) %>%
  ggplot(aes(x = "Weigh At Baseline < 172", 
             y = age)) +
  geom_boxplot()
```

---
## Boxplot for a Subset - 2 requirements

```r
cancer_clean %>%           # At least 150 pounds AND not in Aloe group
  dplyr::filter(weighin >= 150 & trt == "Placebo") %>%
  ggplot(aes(x = "Placebo and at least 150 Pounds", 
             y = age)) +
  geom_boxplot()
```

---
## Boxplot for a Subset - 2 requirements (`%in%`)

```r
cancer_clean %>%          # In Aloe group, but only stages 2-4
  dplyr::filter(trt == "Aloe Juice" & stage %in% c(2, 3, 4)) %>%
  ggplot(aes(x = "On Aloe Juice and Stage 2-4", 
             y = weighin)) +
  geom_boxplot()
```

---
## Boxplot for Repeated Measures

.pull-left[

```r
cancer_clean %>%
  tidyr::pivot_longer(cols = c(totalcw2, 
                               totalcw4, 
                               totalcw6),
                      names_to = "week",
                      names_pattern = "totalcw(.)",
                      values_to = "condition") %>%
  ggplot(aes(x = week, 
             y = condition)) +
  geom_boxplot()
```
]

.pull-right[
<img src="ch3_center_spread_files/figure-html/unnamed-chunk-31-1.png" style="display: block; margin: auto;" />
]

---
## Boxplot: COMPLICATED!

.pull-left[

```r
cancer_clean %>%
  dplyr::filter(weighin > 130 & 
                  stage %in% c(2, 4)) %>%
  tidyr::pivot_longer(cols = c(totalcw2, 
                               totalcw4, 
                               totalcw6),
                      names_to = "week",
                      names_pattern = "totalcw(.)",
                      values_to = "condition") %>%
  ggplot(aes(x = week, 
             y = condition, 
             fill = stage)) +
  geom_boxplot() +
  facet_grid(. ~ trt)
```
]

.pull-right[
<img src="ch3_center_spread_files/figure-html/unnamed-chunk-33-1.png" style="display: block; margin: auto;" />
]

---

## Alternative: Violin Plots

.pull-left[

```r
cancer_clean %>%
  ggplot(aes(x = trt,           
             y = age)) +     
  geom_violin(fill = "gray") +
  geom_boxplot(fill = "white",
               alpha = .75,
               width = .25) +
  stat_summary(fun = mean,
               geom = "point",
               size = 5) +
  theme_bw() +
  labs(x = NULL,
       y = "Age in Years") +
  theme(legend.position = "none") 
```
]

.pull-right[
<img src="ch3_center_spread_files/figure-html/unnamed-chunk-35-1.png" width="100%" style="display: block; margin: auto;" />
]

---
class: inverse, center, middle

# Questions?

---
class: inverse, center, middle

# Next Topic

### Standard and Normal