3 Summary Statistics
Using the psych::describe()
function
The describe()
function from the psych
package returns an extensive listing of basic summary statistics for every variable in a dataset (Revelle 2020).
vars
number order of the variables in this tablen
how many non-missing values there aremean
the average or arithmetic meansd
the standard deviationmedian
the 50th percentile or Q2trimmed
the mean after removing the top and bottom 10% of valuesmad
median absolute deviation (from the median) DO NOT WORRY ABOUT!min
the minimum or lowest valuemax
the maximum or highest valuerange
full range of values, max - minskew
skewness (no SE for skewness given)kurtosis
kurtosis (no SE for kurtosis given)se
the standard error for the MEAN, not the skewness or kurtosis
3.1 All Variables in a Dataset
# A tibble: 9 x 13
vars n mean sd median trimmed mad min max range skew
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 25 13 7.36 13 13 8.90 1 25 24 0
2 2 25 1.44 0.507 1 1.43 0 1 2 1 0.227
3 3 25 59.6 12.9 60 60.0 11.9 27 86 59 -0.307
4 4 25 178. 32.0 173. 177. 21.1 124 261. 137. 0.730
5 5 25 2.88 1.24 2 2.81 1.48 1 5 4 0.726
6 6 25 6.52 1.53 6 6.33 0 4 12 8 1.80
7 7 25 8.28 2.54 8 8.10 2.97 4 16 12 1.01
8 8 25 10.4 3.47 10 10.2 2.97 6 17 11 0.487
9 9 23 9.48 3.49 9 9.21 2.97 3 19 16 0.770
# ... with 2 more variables: kurtosis <dbl>, se <dbl>
NOTE The names of categorical variables (factors) are followed by an astrics to indicate that summary statistics should not be evaluated since the variable is not continuous or on an interval scale.
3.2 A Subset of Varaibles in a Datasets
It is better to avoid calculating summary statistics for categorical variables in the first place by first restricting the dataset to only continuous variables using a dplyr::select()
step.
Make sure to use a
dplyr::select(var1, var2, ..., var12)
step to select only the variables of interest.
cancer_clean %>%
dplyr::select(age, weighin, totalcin, totalcw2, totalcw4, totalcw6) %>%
psych::describe()
# A tibble: 6 x 13
vars n mean sd median trimmed mad min max range skew
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 25 59.6 12.9 60 60.0 11.9 27 86 59 -0.307
2 2 25 178. 32.0 173. 177. 21.1 124 261. 137. 0.730
3 3 25 6.52 1.53 6 6.33 0 4 12 8 1.80
4 4 25 8.28 2.54 8 8.10 2.97 4 16 12 1.01
5 5 25 10.4 3.47 10 10.2 2.97 6 17 11 0.487
6 6 23 9.48 3.49 9 9.21 2.97 3 19 16 0.770
# ... with 2 more variables: kurtosis <dbl>, se <dbl>