Getting Started with R

class: center, middle, inverse, title-slide

# Getting Started with R
## Basics of R .small[EDUC/PSY 6600]
### Spring 2021

---

# Download Software

Directions: [Encyclopedia, Vol. 0: Install Software](https://cehs-research.github.io/eBook_install/install-software.html)

---

# `R` vs. `R Studio`

.huge[If `R` is like a car's engine, then `RStudio` is the steering wheel, the pedals, and the comfortable seat]

--
.pull-left[.huge[.center[
`R`

[www.r-project.org](https://cran.cnr.berkeley.edu/)
]]]

.pull-right[.huge[.center[
`RStudio`

[www.rstudio.com](https://www.rstudio.com/products/rstudio/download/)
]]]

---
# Focus on what is needed in `R`

.center[.Huge[
".dcoral[Success] is neither magical nor mysterious. Success is the natural consequence of consistently applying the .nicegreen[basic fundamentals]." --- Jim Rohn
]]

.footnote[https://www.brainyquote.com/quotes/jim_rohn_122132?src=t_fundamentals]

---
# Intro to `R`

.pull-left[.huge[
Why Use `R`?
]
.large[
- .coral[**Free**]ly available
- Almost always up-to-date
- Best .bluer[data visualizations]
- .nicegreen[Syntax oriented] (easy to reproduce analyses)
- Gets updated regularly
- You can make your .gray1[own functionality] (e.g., `table1()`)
]]

.pull-right[
.huge[`table1()` produces 👇] 
 
![](figures/table1_fig.png)
]

---
count: false
# Intro to `R`

.pull-right[.huge[
Any Issues?
]
.large[
- Learning curve
- Too extensive (people often focus on too much)
- Advisor's sometimes don't use it

The pro's heavily outweight the con's *if* you are willing to use it. It can save you tons of time in the long-run.
]]

---

## .coral[Objects] are Nouns

Virtual objects are like physical objects (e.g., a car is good to travel in, a table not so much)

`R` uses virtual objects

- a `vector` is values concatinated together
- a `data.frame` is a group of vectors of the same length

.pull-left[

.large[.bluer[Vector]]

concatinate or combine

```r
c(1, 3, 5, 7, 11, 13)
```

```
> [1]  1  3  5  7 11 13
```

]

.pull-right[

.large[.bluer[Data Frame]]

named vectors of the same length

```r
data.frame(x = c(1, 3, 5, 7),
           y = c(2, 4, 6, 8))
```

```
>   x y
> 1 1 2
> 2 3 4
> 3 5 6
> 4 7 8
```
]

---

## .coral[Objects]: save into global environment

.pull-left[

.large[Assignment symbol: .coral[<-]]

.large[.bluer[Constant]]

```r
fred <- 13
```

```r
fred
```

```
> [1] 13
```

.large[.bluer[Vector]]

```r
odds <- c(1, 3, 5, 7, 11, 13)
```

```r
odds
```

```
> [1]  1  3  5  7 11 13
```

]

.pull-right[

.large[.bluer[Data Frame]]

```r
df <- data.frame(x = c(1, 3, 5, 7),
 y = c(2, 4, 6, 8))
```

```r
df
```

```
>   x y
> 1 1 2
> 2 3 4
> 3 5 6
> 4 7 8
```

]

---

## .coral[Class] of a variable: .nicegreen[numeric (dbl)]

.pull-leftbig[

```r
df <- data.frame(id = c(100, 101, 102, 103),
 name = c("Joe", "Jill", "Meg", "Pat"),
 age = c(10, 9, 10, 11),
 trt = as.factor(c("A", "B", "A", "B")))
```

```r
df
```

```
>    id name age trt
> 1 100  Joe  10   A
> 2 101 Jill   9   B
> 3 102  Meg  10   A
> 4 103  Pat  11   B
```

]

.pull-rightsmall[

.large[dataframe.coral[$]variable]

```r
df$id
```

```
> [1] 100 101 102 103
```

.large[numbers are not quoted]

```r
class(df$id)
```

```
> [1] "numeric"
```

]

---

## .coral[Class] of a variable: .nicegreen[numeric (dbl)]

.pull-leftbig[

```r
df
```

```
>    id name age trt
> 1 100  Joe  10   A
> 2 101 Jill   9   B
> 3 102  Meg  10   A
> 4 103  Pat  11   B
```

]

.pull-rightsmall[

.large[dataframe.coral[$]variable]

```r
df$age
```

```
> [1] 10  9 10 11
```

.large[numbers are not quoted]

```r
class(df$age)
```

```
> [1] "numeric"
```

]

---

## .coral[Class] of a variable: .nicegreen[character (chr)]

.pull-leftbig[

```r
df
```

```
>    id name age trt
> 1 100  Joe  10   A
> 2 101 Jill   9   B
> 3 102  Meg  10   A
> 4 103  Pat  11   B
```

]

.pull-rightsmall[

.large[dataframe.coral[$]variable]

```r
df$name
```

```
> [1] "Joe"  "Jill" "Meg"  "Pat"
```

.large[text is quoted]

```r
class(df$name)
```

```
> [1] "character"
```

]

---

## .coral[Class] of a variable: .nicegreen[factor (fct)]

.pull-leftbig[

```r
df
```

```
>    id name age trt
> 1 100  Joe  10   A
> 2 101 Jill   9   B
> 3 102  Meg  10   A
> 4 103  Pat  11   B
```

]

.pull-rightsmall[

.large[dataframe.coral[$]variable]

```r
df$trt
```

```
> [1] A B A B
> Levels: A B
```

.large[factors have levels]

```r
class(df$trt)
```

```
> [1] "factor"
```

]

---
## .coral[Functions] are Verbs

.pull-left[.large[

Generally looks like:

`stuff(arg1, arg2)`
...or...

`pkg::stuff(arg1, arg2)`

- `pkg` package the function is from
- `stuff` function's name
- `()` surround the arguments
- `arg1` and `arg2` arguments or options
]]

.pull-right[
.large[Example: "mean" function

- `mean()` 
- 1 argument required: a variable
- returns variable's average
]

```r
## This is a comment 
mean(df$age)
```

```
> [1] 10
```

...does the same thing...

```r
base::mean(df$age)
```

```
> [1] 10
```

]

---

## .coral[Functions] are Verbs

---
## .coral[Missing Values] are lack of data

- `NA` represents missing or blank values
- there are no quotes around the `NA`

.pull-leftbig[

```r
df <- data.frame(id = c(100, 101, 102, 103),
 name = c("Joe", "Jill", "Meg", "Pat"),
 age = c(10, 9.5, NA, 11),
 trt = as.factor(c("A", "B", "A", "B")))
```

```r
df
```

```
>    id name  age trt
> 1 100  Joe 10.0   A
> 2 101 Jill  9.5   B
> 3 102  Meg   NA   A
> 4 103  Pat 11.0   B
```

]

.pull-rightsmall[

```r
mean(df$age)
```

```
> [1] NA
```

default: 
- keeps in missing values
- NA in data cases NA in mean

```r
mean(df$age, na.rm = TRUE)
```

```
> [1] 10.16667
```

]

---

## .coral[Functions]: what package is it from?

.pull-leftsmall[

Function in Base R

```r
dim(df)
```

```
> [1] 4 4
```

```r
names(df)
```

```
> [1] "id"   "name" "age"  "trt"
```

]

.pull-rightbig[

Function from a specific package

```r
tibble::glimpse(df)
```

```
> Rows: 4
> Columns: 4
> $ id <dbl> 100, 101, 102, 103
> $ name <chr> "Joe", "Jill", "Meg", "Pat"
> $ age <dbl> 10.0, 9.5, NA, 11.0
> $ trt <fct> A, B, A, B
```

]

---

## The .coral[Tidyverse]

---

## The .coral[Tidyverse]

https://rworkshop.uni.lu/lectures/lecture05_dplyr.html#1

```r
library(tidyverse)
```

---
## The .coral[Pipe] links steps

The symbol is typed .nicegreen[`%>%`]

.pull-left[

```r
mean(df$age, na.rm = TRUE)
```

```
> [1] 10.16667
```

```r
df$age %>% mean(na.rm = TRUE)
```

```
> [1] 10.16667
```

]

.pull-right[

]

---

---

---
## Build a .coral[Pipeline]

---

---

---

---

---

## Build a .coral[Pipeline]

---
# Important first steps in `R`

.large[.large[
1. Read in Data
2. Quickly assess the data
3. Clean the data
4. Analyze the data
]]

.large[We'll show each of these over the next few weeks starting with reading in, assessing and cleaning the data]

---
# Read in the data

.huge[Data comes in various files:]

.pull-left[.large[
- CSV
- tab-delimited
- SPSS
- Excel
]]
.pull-right[.large[
- SAS
- Stata
- Mplus
- etc.
]]

--
### `R` can read all types

.huge[Generally, it all works in a similar way]

---
# Read in data: FORMAT

.large[Text: .bluer[Comma-seperated]]

```r
my_data <- read.csv("my_data_file.csv")
```

.large[Text: .bluer[tab-delimited]]

```r
my_data <- read.delim("my_data_file.txt")
```

.large[.bluer[Excel]]: .xls, .xlsx

```r
library(readxl)
my_data <- readxl::read_excel("my_data_file.xlsx")
```

.large[.bluer[SPSS]]: .sav

```r
library(haven)
my_data <- haven::read_spss("my_data_file.sav")
```

---
background-image: url(figures/fig_inho_data_desc.png)
background-position: 50% 90%
background-size: 800px

# Let's use the data from the book

---
# Steps for .dcoral[Preparing] the Data for Analysis

.large[.large[
1. .coral[Get the data]
2. .nicegreen[Prep the data]
    - Variable Labels 	
    - Value Labels	
    - Missing Values
3. .bluer[Compute] new variables and values (fill in missing codes, recode scores, categorize/group values, combine)
4. Get .dcoral[descriptives] using `tableF()` and `table1()` from the `furniture` package to check what is going on
]]

---
# Step 1: Get the data

Data is in .xls form (Excel) saved in the folder with .Rmd file

```r
library(readxl)
d <- readxl::read_excel("Ihno_dataset.xls")
d
```

```
> # A tibble: 100 x 18
> Sub_num Gender Major Reason Exp_cond Coffee Num_cups Phobia Prevmath Mathquiz
> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
> 1 1 1 1 3 1 1 0 1 3 43
> 2 2 1 1 2 1 0 0 1 4 49
> 3 3 1 1 1 1 0 0 4 1 26
> 4 4 1 1 1 1 0 0 4 0 29
> 5 5 1 1 1 1 0 1 10 1 31
> 6 6 1 1 1 2 1 1 4 1 20
> 7 7 1 1 1 2 0 0 4 2 13
> 8 8 1 1 3 2 1 2 4 1 23
> 9 9 1 1 1 2 0 0 4 1 38
> 10 10 1 1 1 2 1 2 5 0 NA
> # ... with 90 more rows, and 8 more variables: Statquiz <dbl>, Exp_sqz <dbl>,
> # Hr_base <dbl>, Hr_pre <dbl>, Hr_post <dbl>, Anx_base <dbl>, Anx_pre <dbl>,
> # Anx_post <dbl>
```

---
# Step 2: Prep the data

```r
library(tidyverse)

d_clean <- d %>%
 dplyr::mutate(MajorF = factor(Major, 
 levels = c(1, 2, 3, 4, 5),
 labels = c("Psychology", "Premed",
 "Biology", "Sociology",
 "Economics"))) %>%
 dplyr::mutate(Coffee = factor(Coffee))

d_clean
```

```
> # A tibble: 100 x 19
> Sub_num Gender Major Reason Exp_cond Coffee Num_cups Phobia Prevmath Mathquiz
> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
> 1 1 1 1 3 1 1 0 1 3 43
> 2 2 1 1 2 1 0 0 1 4 49
> 3 3 1 1 1 1 0 0 4 1 26
> 4 4 1 1 1 1 0 0 4 0 29
> 5 5 1 1 1 1 0 1 10 1 31
> 6 6 1 1 1 2 1 1 4 1 20
> 7 7 1 1 1 2 0 0 4 2 13
> 8 8 1 1 3 2 1 2 4 1 23
> 9 9 1 1 1 2 0 0 4 1 38
> 10 10 1 1 1 2 1 2 5 0 NA
> # ... with 90 more rows, and 9 more variables: Statquiz <dbl>, Exp_sqz <dbl>,
> # Hr_base <dbl>, Hr_pre <dbl>, Hr_post <dbl>, Anx_base <dbl>, Anx_pre <dbl>,
> # Anx_post <dbl>, MajorF <fct>
```

---
# Step 3: Compute new variables/values

```r
d_clean <- d_clean %>%
 dplyr::mutate(newVar = Mathquiz / 2)
```
*Note that I removed the other variables just to show you but in reality all the variables are still there. Also I only want to show the first few rows.

```r
d_clean %>% dplyr::select(Mathquiz, newVar) %>% head()
```

```
> # A tibble: 6 x 2
> Mathquiz newVar
> <dbl> <dbl>
> 1 43 21.5
> 2 49 24.5
> 3 26 13 
> 4 29 14.5
> 5 31 15.5
> 6 20 10
```

---
# Step 4: Get descriptives

.pull-left[

```r
library(furniture)

furniture::tableF(d_clean, MajorF)
```

```
> 
> -----------------------------------------
>  MajorF     Freq CumFreq Percent CumPerc
>  Psychology 29   29      29.00%  29.00% 
>  Premed     25   54      25.00%  54.00% 
>  Biology    21   75      21.00%  75.00% 
>  Sociology  15   90      15.00%  90.00% 
>  Economics  10   100     10.00%  100.00%
> -----------------------------------------
```
]

.pull-right[

```r
furniture::tableF(d_clean, Phobia)
```

```
> 
> -------------------------------------
>  Phobia Freq CumFreq Percent CumPerc
>  0      12   12      12.00%  12.00% 
>  1      15   27      15.00%  27.00% 
>  2      12   39      12.00%  39.00% 
>  3      16   55      16.00%  55.00% 
>  4      21   76      21.00%  76.00% 
>  5      11   87      11.00%  87.00% 
>  6      1    88      1.00%   88.00% 
>  7      4    92      4.00%   92.00% 
>  8      4    96      4.00%   96.00% 
>  9      1    97      1.00%   97.00% 
>  10     3    100     3.00%   100.00%
> -------------------------------------
```
]

---
# Step 4: Get descriptives

```r
d_clean %>% 
  dplyr::group_by(MajorF) %>% 
  furniture::table1(Mathquiz, Phobia, Coffee)
```

```
> Using dplyr::group_by() groups: MajorF
```

```
> 
> -------------------------------------------------------------------
>                                  MajorF 
>           Psychology Premed     Biology     Sociology   Economics 
>           n = 25     n = 21     n = 19      n = 12      n = 8     
>  Mathquiz                                                         
>           29.6 (8.9) 31.0 (8.2) 24.2 (10.4) 28.0 (10.4) 35.4 (6.3)
>  Phobia                                                           
>           3.6 (2.4)  3.0 (2.5)  3.7 (2.6)   3.2 (3.0)   1.8 (1.4) 
>  Coffee                                                           
>     0     17 (68%)   11 (52.4%) 10 (52.6%)  8 (66.7%)   4 (50%)   
>     1     8 (32%)    10 (47.6%) 9 (47.4%)   4 (33.3%)   4 (50%)   
> -------------------------------------------------------------------
```

---
class: inverse, center, middle

# Questions?

---
class: inverse, center, middle

# Next Topics

### More Data Manipulation
### Understanding Data via Figures