Part 1 - R Basics

October 8, 2020

First things first

Set up your machine to run R
- Install R (on a Mac and on a PC)
- Install R Studio (on a Mac and on a PC)

Some things you should know

Typical uses for R
- Data analysis
- Data visualization
- Reshaping data
- Automation
Mental models and programming
- Excel/SPSS/Stata –> computational programming tools like R
What all R programmers do
- Google for answers
- Borrow code
- Ask friends and pros for help
Useful references (see Reference Materials in the syllabus for more)
- help()
- ??
- R Bloggers
- stackoverflow

Quick Tour of R Studio

Open and save (locally) an R Notebook
Run code from the notebook and the console
Make sure you can install packages
- 2 ways to install packages
  - Command line/Console
  - R Studio GUI

Quick Tour of R Studio

Open and save (locally) an R Notebook
Make sure you can install packages
- 2 ways to install packages
  - Command line/Console
  - R Studio GUI

install.packages('devtools')

Quick Tour of R Studio

Open and save (locally) an R Notebook
Make sure you can install packages
- 2 ways to install packages
  - Command line/Console
  - R Studio GUI

Libraries, functions, data

Libraries (or packages) are collections of functions (and datasets)
Over 10,000 libraries on CRAN

install.packages('tidyverse')
install.packages(c('ggthemes', 'officer'))
library(tidyverse)

Libraries, functions, data

Functions perform operations in R

foo <- c(1,2,4)
foo %>% min()
foo %>% mean()
foo %>% max()
foo %>% sd()

Libraries, functions, data

Functions perform operations in R

foo <- c(1,2,4)
foo %>% min()
## [1] 1
foo %>% mean()
## [1] 2.333333
foo %>% max()
## [1] 4
foo %>% sd()
## [1] 1.527525

Libraries, functions, data

Function arguments
- Let’s look at arguments in the glm and min functions
- Arguments are inputs
- Arguments are comma separated

help(glm)
help(min)

cat_function <- function(love = TRUE){
    if(love == TRUE){
        print('I love cats!')
    }
    else {
        print('I am not a cool person.')
    }
}

Libraries, functions, data

Function arguments
- Let’s look at arguments in the glm and min functions
- Arguments are inputs
- Arguments are comma separated

foo <- c(1,2,NA, 4)
foo %>% min()
foo %>% mean()
foo %>% max()
foo %>% sd()

Libraries, functions, data

Function arguments
- Let’s look at arguments in the glm and min functions
- Arguments are inputs
- Arguments are comma separated

foo <- c(1,2,NA, 4)
foo %>% min()
## [1] NA
foo %>% mean()
## [1] NA
foo %>% max()
## [1] NA
foo %>% sd()
## [1] NA

Libraries, functions, data

Function arguments
- Let’s look at arguments in the glm and min functions
- Arguments are inputs
- Arguments are comma separated

foo <- c(1,2,NA, 4)
foo %>% min(na.rm= TRUE)
## [1] 1
foo %>% mean(na.rm= TRUE)
## [1] 2.333333
foo %>% max(na.rm= TRUE)
## [1] 4
foo %>% sd(na.rm= TRUE)
## [1] 1.527525

Libraries, functions, data

Oh my, so many types of data!
- Tibbles, data frames, matrices, vectors
- Integers, numbers, characters, factors
Confirm data type

c('Washington', 'Oregon', 'Idaho') %>% class()
c('Washington', 'Oregon', 'Idaho') %>% is.character()
c('Washington', 'Oregon', 'Idaho') %>% is.factor()

Coerce data type

c('Washington', 'Oregon', 'Idaho') %>% as.factor()
c('Washington', 'Oregon', 'Idaho') %>% as.factor() %>% class()

Libraries, functions, data

Oh my, so many types of data!
- Vectors, tibbles, data frames, matrices
- Integers, numbers, characters, factors
Confirm data type

c('Washington', 'Oregon', 'Idaho') %>% class()
## [1] "character"
c('Washington', 'Oregon', 'Idaho') %>% is.character()
## [1] TRUE
c('Washington', 'Oregon', 'Idaho') %>% is.factor()
## [1] FALSE

Coerce data type

c('Washington', 'Oregon', 'Idaho') %>% as.factor()
c('Washington', 'Oregon', 'Idaho') %>% as.factor() %>% class()

Libraries, functions, data

Oh my, so many types of data!
- Vectors, tibbles, data frames, matrices
- Integers, numbers, characters, factors
Confirm data type

c('Washington', 'Oregon', 'Idaho') %>% class()
## [1] "character"
c('Washington', 'Oregon', 'Idaho') %>% is.character()
## [1] TRUE
c('Washington', 'Oregon', 'Idaho') %>% is.factor()
## [1] FALSE

Coerce data type

c('Washington', 'Oregon', 'Idaho') %>% as.factor()
## [1] Washington Oregon     Idaho     
## Levels: Idaho Oregon Washington
c('Washington', 'Oregon', 'Idaho') %>% as.factor() %>% class()
## [1] "factor"

Libraries, functions, data

Oh my, so many types of data!
- Vectors, tibbles, data frames, matrices
Vectors contain 1 or more values in a string
Call a specific element by its location in a vector

c('Washington', 'Oregon', 'Idaho')
## [1] "Washington" "Oregon"     "Idaho"
1:10
##  [1]  1  2  3  4  5  6  7  8  9 10
rep(1:2, times = 2)
## [1] 1 2 1 2
seq(from = 0, to = 100, by = 10)
##  [1]   0  10  20  30  40  50  60  70  80  90 100

Libraries, functions, data

Oh my, so many types of data!
- Vectors, tibbles, data frames, matrices
Vectors contain 1 or more values in a string
Call a specific element by its location in a vector

c('Washington', 'Oregon', 'Idaho')[2]
seq(from = 0, to = 100, by = 10)[6]

Libraries, functions, data

Oh my, so many types of data!
- Vectors, tibbles, data frames, matrices
Vectors contain 1 or more values in a string
Call a specific element by its location in a vector

c('Washington', 'Oregon', 'Idaho')[2]
## [1] "Oregon"
seq(from = 0, to = 100, by = 10)[6]
## [1] 50

Libraries, functions, data

Oh my, so many types of data!
- Vectors, tibbles, data frames, matrices
Tibbles look a little like spreadsheets

tibble(x = c(1:3), y = c(4:6), z = c('Washington', 'Oregon', 'Idaho'))
## # A tibble: 3 x 3
##       x     y z         
##   <int> <int> <chr>     
## 1     1     4 Washington
## 2     2     5 Oregon    
## 3     3     6 Idaho

Libraries, functions, data

Call a specific variable by name or location in the tibble
- Variables are vectors
- Use the $ between the tibble name and the variable name
- pull() functionizes $
- Also able to call the column (or row) by index number

cars$speed %>% head(7)
## [1]  4  4  7  7  8  9 10

cars %>% pull(speed) %>% head(7)
## [1]  4  4  7  7  8  9 10

cars[,1] %>% head(7)
## [1]  4  4  7  7  8  9 10

Libraries, functions, data

Oh my, so many types of data!
- Vectors, tibbles, data frames, matrices
Data frames are like tibbles minus metadata and truncated print

data.frame(x = c(1:3), y = c(4:6), z = c('Washington', 'Oregon', 'Idaho'))
##   x y          z
## 1 1 4 Washington
## 2 2 5     Oregon
## 3 3 6      Idaho

Libraries, functions, data

Oh my, so many types of data!
- Vectors, tibbles, data frames, matrices
Matrices and tibbles have similar shapes
Traditionally matrices contained data of a single type
When it comes down to using a matrix or tibbles in our class, tibbles are the way to go

matrix(data = 1:6, nrow = 3, ncol = 2)
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

Let’s apply what we learned to real data

You are an analyst at a large global health organization. Organizational leadership needs to report pubically on the Covid-19 data that you’ve collected. Over the next month, you will be asked to prepare findings that you observe in the data.