• Vol. 2: Exploring Data
  • Welcome
    • Blocked Notes
    • Code and Output
    • The Authors
      • Why choose R ?
      • FYI
  • 1 Intro: Ihno’s Experiemnt
    • 1.1 Background of Data
    • 1.2 Import & Wrangle the Data
  • 2 Intro: Cancer Experiemnt
    • 2.1 Source of Data
    • 2.2 Description of the Study
    • 2.3 Variables
    • 2.4 Import Data
    • 2.5 Wrangle Data
  • 3 Summary Statistics
    • 3.1 All Variables in a Dataset
    • 3.2 A Subset of Varaibles in a Datasets
  • 4 Frequency Distribution Tables
    • 4.1 For a Single Categorical Variable
    • 4.2 For a Single Continuous Variable
    • 4.3 Stratify -by- a categorical factor
  • 5 Simple and Stratified Descriptive Tables
    • 5.1 For a Single Categorical Variable
    • 5.2 For a Single Continuous Variable
    • 5.3 For a Subset of Variables
    • 5.4 Customize the Variable Labels
    • 5.5 Stratify -by- a categorical factor
    • 5.6 Add a TOTAL Column
  • 6 Barplots
    • 6.1 Single categorical variable
      • 6.1.1 Default bins/binwidth
  • 7 Histograms
    • 7.1 Single continuous variable
      • 7.1.1 Default bins/binwidth
      • 7.1.2 Specify number of bins
      • 7.1.3 Specify bin width
    • 7.2 Fill in the color -by- a factor
  • 8 Boxplots
    • 8.1 Single continuous variable
    • 8.2 Single Box -for- a Subset
      • 8.2.1 One Requirement
      • 8.2.2 Two Requirements
      • 8.2.3 A Requirement Specified with a List
    • 8.3 Multiple Boxes -by- a Factor
      • 8.3.1 Use fill = var_name
      • 8.3.2 Use x = var_name
      • 8.3.3 Use facet_grid(. ~ var_name)
    • 8.4 Multiple Boxes -by- 2 Factors
      • 8.4.1 Use fill = var_name_1 and x = var_name_2
      • 8.4.2 Use x = var_name_1 and facet_grid(. ~ var_name_2)
      • 8.4.3 Use fill = var_name_1 and facet_grid(. ~ var_name_2)
    • 8.5 Multiple Boxes -for- a Subset AND -by- 2 Factors
    • 8.6 Multiple Boxes -for- Repeated Measurements
      • 8.6.1 Multiple Boxes -for- Repeated Measurements AND -by- a Factor
  • 9 Scatterplots
    • 9.1 Two continuous variables
  • 10 Binomial Sign Test
    • 10.1 The Binomial Distribution
      • 10.1.1 Effect of Hypothesised “P” Value
      • 10.1.2 Effect of Sample Size
      • 10.1.3 Probability of 4 (or more) out of 5
      • 10.1.4 Probability of 8 (or more) out of 10
      • 10.1.5 Probability of 80 (or more) out of 100
    • 10.2 Example: Fair Coin?
      • 10.2.1 Enter tabulated data (subtotals)
      • 10.2.2 Two-sided Test
      • 10.2.3 One-sided Test
  • 11 1-way Chi Square Test, “Goodness of Fit”
    • 11.1 Chi Squared Distribution
    • 11.2 Chi Squared Test
    • 11.3 Null Hypothesis: Equally Likely
      • 11.3.1 Ex. Senator Support
      • 11.3.2 Ex. Books
      • 11.3.3 Ex. M & M Colors
    • 11.4 Null Hypothesis: a Specific Distribution
      • 11.4.1 Ex. M & M Colors
  • 12 2-way Chi Square Test “Independence”
    • 12.1 Example: Violent Crime and Abuse (2x2)
    • 12.2 Example: Schizophrenic Treatment (Cohen textbook)
  • References
  • Published with bookdown

Encyclopedia of Quantitative Methods in R, vol. 1: Data Wrangling

9 Scatterplots

Using the ggplot2::geom_point() function

library(tidyverse)       # super helpful everything, including ggplot

9.1 Two continuous variables

data_ihno %>% 
  ggplot() +
  aes(x = mathquiz,
      y = statquiz) +
  geom_point()

data_ihno %>% 
  dplyr::mutate(phobia_cut3 = cut(phobia, 
                                  breaks = c(0, 2, 4, 10),
                                  include.lowest = TRUE)) %>% 
  ggplot() +
  aes(x = mathquiz,
      y = statquiz,
      color = phobia_cut3) +
  geom_point()

data_ihno %>% 
  dplyr::mutate(phobia_cut3 = cut(phobia, 
                                  breaks = c(0, 2, 4, 10),
                                  include.lowest = TRUE)) %>% 
  ggplot() +
  aes(x = mathquiz,
      y = statquiz) +
  geom_count() +
  facet_grid(. ~ phobia_cut3)