3 ONE SAMPLE t-TEST: for the MEAN
Using the t.test()
function
library(psych) # lots of nice tidbits
library(car) # Compantion
3.1 Exploratory Data Analysis: i.e. the eyeball method
Is the baseline weight more than 165 pounds?
3.1.1 Mean and SD
%>%
cancer_clean ::table1(weighin,
furniturena.rm = FALSE)
───────────────────────────
Mean/Count (SD/%)
n = 25
weighin
178.3 (32.0)
───────────────────────────
32 / sqrt(25)
[1] 6.4
Since the stadard deviation (\(s_X\)) is 32.0, the standard error for the mean (SEM = SE = \(s_{\overline{X}}\)) is 6.4. So even though the observed average of 178.3 is a higher number than 165, it may or may not we statistically significant.
3.2 Assumptions
3.2.1 Random Sampling
The Sample was drawn at random (at least as representative as possible)
- Nothing can be done to fix NON-representative samples!
- Can not for with any statistically test
3.2.2 Normality
A variable is said to follow the normal distribution if it resembles the normal curve. Specifically it is symetrical, unimodal, and bell shaped.
The continuous variable has a NORMAL distribution in BOTH populations
- Not as important if the sample is large (Central Limit Theorem)
- IF the sample is far from normal &/or small, might want to use a different method
Options to judging normality:
- Visualization of each sample’s distribution
- Stacked histograms, but is sensitive to binning choices (number or width)
- Side-by-side boxplots, shows median instead of mean as central line
- Seperate QQ plots (straight \(45^\circ\) line), but is sensitive to outliers!
- Calculate Skewness and Kurtosis
- Divided each value by its standard error (SE)
- A result \(\gt \pm 2\) indicates issues
- Divided each value by its standard error (SE)
- Formal Inferencial Tests for Normality
- Null-hypothesis: population is normally distributed
- A \(p \lt .05\) ???indicate snon-normality
- For smaller samples, use Shapiro-Wilk’s Test
- For larger samples, use Kolmogorov-Smirnov’s Test
%>%
cancer_clean ggplot(aes(weighin)) +
geom_histogram(binwidth = 12) +
geom_vline(xintercept = 165, # Add a thick red line at the grand mean of 165 pounds
color = "red",
size = 1)
The histogram is not truely normal, but it is fairly unimodal and somewhat bell shaped. There are mild concerns regarding the values above the mean.
%>%
cancer_clean ggplot(aes(sample = weighin)) + # make sure to include "sample = "
geom_qq() + # layer on the dots
stat_qq_line() # layer on the line
The Q-Q Plot displays a fairly linear pattern, but there are mild concerns at the highter values.
%>%
cancer_clean ::select(age, weighin) %>% # we have to select MORE than one variable
dplyr::describe() psych
vars n mean sd median trimmed mad min max range skew
age 1 25 59.64 12.93 60.0 59.95 11.86 27 86.0 59.0 -0.31
weighin 2 25 178.28 31.98 172.8 176.57 21.05 124 261.4 137.4 0.73
kurtosis se
age -0.01 2.59
weighin 0.07 6.40
The skew is \(0.73\) which is close to \(1\), but the kurtosis is \(0.07\) which is NOT close to \(1\). This reflects that the distribution is fairly symetrical, but more spread out and not as peaked as a truely normal distribution.
%>%
cancer_clean ::pull(weighin) %>% # extract the continuous variable
dplyrshapiro.test() # test for normality (from base R)
Shapiro-Wilk normality test
data: .
W = 0.93899, p-value = 0.1403
The Shapiro-Wilk’s test yielded NO evidence that weight is not normaly distributed at baseline, \(W = .939, p = .140\),.
3.3 Inference
Formal Statistical Test: t-Test for Difference in Independent Group Means
Use the t.test()
funtion for a single sample.
Before you can run the t Test, you must seperate out or ‘PULL’ your variable out of the dataset.
Use the dplyr::pull(continuous_variable)
step befor
running the t Test
Inside the funtion you need to specify one option:
-
the null-hypothesis value:
mu = ##
(replace with your number)
You MAY need/want to specify some or all of the following options you may way to leave as the default or override:
-
Number of tails:
-
alternative = “two.sided”
Default Allows for a 2-sided alternative -
alternative = “less”
Only Allows: group 1 < group 2 -
alternative = “greater”
Only Allows: group 1 > group 2
-
-
Confidence level:
-
conf.level = 0.95
Default Computes the 95% confidence inverval
-
conf.level = 0.90
Changes to a 90% confidence interval
-
3.3.1 All Defaults
Is there evidence the population mean weight is DIFFERENT than 165?
%>%
cancer_clean ::pull(weighin) %>% # pull the continuous varaible out
dplyrt.test(mu = 165) # specify the null hypothesis value
One Sample t-test
data: .
t = 2.0765, df = 24, p-value = 0.04872
alternative hypothesis: true mean is not equal to 165
95 percent confidence interval:
165.0807 191.4793
sample estimates:
mean of x
178.28
There is evidence that cancer patients weight more (N = 25, M = 178.28) now than the historic average of 165 pound, \(t(24) = 2.077, p = .049, 95% CI: 165.08, 191.48\).
3.3.2 Confidence Level other than 95%
Find a 99% confience level for the population mean weight.
%>%
cancer_clean ::pull(weighin) %>% # pull the continuous varaible out
dplyrt.test(mu = 165, # specify the null hypothesis value
conf.level = 0.99) # over-ride the default of 95% CI
One Sample t-test
data: .
t = 2.0765, df = 24, p-value = 0.04872
alternative hypothesis: true mean is not equal to 165
99 percent confidence interval:
160.3927 196.1673
sample estimates:
mean of x
178.28
There is evidence that cancer patients weight more (N = 25, M = 178.28) now than the historic average of 165 pound, \(t(24) = 2.077, p = .049, 99% CI: 160.39, 196.17\).
3.3.3 One-Sided Test, instead of Two
Is there evidence the population mean weight is GREATER than 165?
%>%
cancer_clean ::pull(weighin) %>% # pull the continuous varaible out
dplyrt.test(mu = 165, # specify the null hypothesis value
alternative = "greater") # over-ride the default of 95% CI
One Sample t-test
data: .
t = 2.0765, df = 24, p-value = 0.02436
alternative hypothesis: true mean is greater than 165
95 percent confidence interval:
167.3384 Inf
sample estimates:
mean of x
178.28
Notice than one end of the confidence interval is Inf
for infinity. This always happens when you specify a one-tail test, so you should IGNORE the conficence interval reported when you specify alternative =
.
There is evidence that cancer patients weight more (N = 25, M = 178.28) now than the historic average of 165 pound, \(t(24) = 2.077, p = .024\).
3.3.4 Restrict to a Subsample
Do the patients with stage 3 and 4 cancer weigh more than 165 pounds at intake, on average?
%>%
cancer_clean ::filter(stage %in% c("3", "4")) %>% # select a sub-sample
dplyr::pull(weighin) %>% # pull the continuous varaible out
dplyrt.test(mu = 165) # specify the null hypothesis value
One Sample t-test
data: .
t = 0.82627, df = 5, p-value = 0.4463
alternative hypothesis: true mean is not equal to 165
95 percent confidence interval:
137.0283 219.4717
sample estimates:
mean of x
178.25
There is NO evidence that stage three and four cancer (n = 6, M = 178.25) patients weight more now than the historic average of 165 pound, \(t(24) = 0.826, p = .446\).
3.4 Inho example
From Baron H. Cohen’s Explaining Psychological Statistics, page 196.
To review, we can easily run a one-sample t-test with a few simple lines of code.
%>%
data_ihno ::pull(hr_base) %>%
dplyrt.test(mu = 72.5)
One Sample t-test
data: .
t = -0.71525, df = 99, p-value = 0.4761
alternative hypothesis: true mean is not equal to 72.5
95 percent confidence interval:
71.63194 72.90806
sample estimates:
mean of x
72.27
We can repeat this process to test any number of variables against a specified population parameter:
%>%
data_ihno ::pull(hr_pre) %>%
dplyrt.test(mu = 72.5)
One Sample t-test
data: .
t = 2.6309, df = 99, p-value = 0.009878
alternative hypothesis: true mean is not equal to 72.5
95 percent confidence interval:
72.83183 74.86817
sample estimates:
mean of x
73.85
%>%
data_ihno ::pull(hr_post) %>%
dplyrt.test(mu = 72.5)
One Sample t-test
data: .
t = 0.63295, df = 99, p-value = 0.5282
alternative hypothesis: true mean is not equal to 72.5
95 percent confidence interval:
71.85954 73.74046
sample estimates:
mean of x
72.8