3 Summary Statistics
Using the psych::describe() function
The describe() function from the psych package returns an extensive listing of basic summary statistics for every variable in a dataset (Revelle 2020).
varsnumber order of the variables in this tablenhow many non-missing values there aremeanthe average or arithmetic meansdthe standard deviationmedianthe 50th percentile or Q2trimmedthe mean after removing the top and bottom 10% of valuesmadmedian absolute deviation (from the median) DO NOT WORRY ABOUT!minthe minimum or lowest valuemaxthe maximum or highest valuerangefull range of values, max - minskewskewness (no SE for skewness given)kurtosiskurtosis (no SE for kurtosis given)sethe standard error for the MEAN, not the skewness or kurtosis
3.1 All Variables in a Dataset
# A tibble: 9 x 13
vars n mean sd median trimmed mad min max range skew
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 25 13 7.36 13 13 8.90 1 25 24 0
2 2 25 1.44 0.507 1 1.43 0 1 2 1 0.227
3 3 25 59.6 12.9 60 60.0 11.9 27 86 59 -0.307
4 4 25 178. 32.0 173. 177. 21.1 124 261. 137. 0.730
5 5 25 2.88 1.24 2 2.81 1.48 1 5 4 0.726
6 6 25 6.52 1.53 6 6.33 0 4 12 8 1.80
7 7 25 8.28 2.54 8 8.10 2.97 4 16 12 1.01
8 8 25 10.4 3.47 10 10.2 2.97 6 17 11 0.487
9 9 23 9.48 3.49 9 9.21 2.97 3 19 16 0.770
# ... with 2 more variables: kurtosis <dbl>, se <dbl>
NOTE The names of categorical variables (factors) are followed by an astrics to indicate that summary statistics should not be evaluated since the variable is not continuous or on an interval scale.
3.2 A Subset of Varaibles in a Datasets
It is better to avoid calculating summary statistics for categorical variables in the first place by first restricting the dataset to only continuous variables using a dplyr::select() step.
Make sure to use a
dplyr::select(var1, var2, ..., var12)step to select only the variables of interest.
cancer_clean %>%
dplyr::select(age, weighin, totalcin, totalcw2, totalcw4, totalcw6) %>%
psych::describe()# A tibble: 6 x 13
vars n mean sd median trimmed mad min max range skew
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 25 59.6 12.9 60 60.0 11.9 27 86 59 -0.307
2 2 25 178. 32.0 173. 177. 21.1 124 261. 137. 0.730
3 3 25 6.52 1.53 6 6.33 0 4 12 8 1.80
4 4 25 8.28 2.54 8 8.10 2.97 4 16 12 1.01
5 5 25 10.4 3.47 10 10.2 2.97 6 17 11 0.487
6 6 23 9.48 3.49 9 9.21 2.97 3 19 16 0.770
# ... with 2 more variables: kurtosis <dbl>, se <dbl>