Part 1 - R Basics

October 6, 2018

First things first

Set up your machine to run R
- Install R (on a Mac and on a PC)
- Install R Studio (on a Mac and on a PC)

Some things you should know

Typical uses for R
- Data analysis
- Data visualization
- Reshaping data
- Automation
Mental models and programming
- Excel/SPSS/Stata –> R/Python/Julia/JavaScript
What all R programmers do
- Google for answers
- Borrow code
- Ask friends and pros for help
Useful references (see Reference Materials in the syllabus for more)
- help()
- ??
- R Bloggers
- stackoverflow

Quick Tour of R Studio

Open and save (locally) an R Notebook
Make sure you can install packages
- 2 ways to install packages
  - Command line/Console
  - R Studio GUI

Quick Tour of R Studio

Open and save (locally) an R Notebook
Make sure you can install packages
- 2 ways to install packages
  - Command line/Console
  - R Studio GUI

install.packages('devtools')

Quick Tour of R Studio

Open and save (locally) an R Notebook
Make sure you can install packages
- 2 ways to install packages
  - Command line/Console
  - R Studio GUI

Libraries, functions, data

Libraries (or packages) are collections of functions (and datasets)
Over 10,000 libraries on CRAN

install.packages('tidyverse')
install.packages(c('ggthemes', 'rmarkdown'))
library(tidyverse)

Libraries, functions, data

Functions perform operations in R

foo <- c(1,2,4)
foo %>% min()
foo %>% mean()
foo %>% max()
foo %>% sd()

Libraries, functions, data

Functions perform operations in R

foo <- c(1,2,4)
foo %>% min()
## [1] 1
foo %>% mean()
## [1] 2.333333
foo %>% max()
## [1] 4
foo %>% sd()
## [1] 1.527525

Libraries, functions, data

Function arguments
- Let’s look at arguments in the lm and min functions
- Arguments are inputs
- Arguments are comma separated

help(lm)
help(min)

cat_function <- function(love=TRUE){
    if(love==TRUE){
        print('I love cats!')
    }
    else {
        print('I am not a cool person.')
    }
}

Libraries, functions, data

Function arguments
- Let’s look at arguments in the lm and min functions
- Arguments are inputs
- Arguments are comma separated

foo <- c(1,2,NA, 4)
foo %>% min()
foo %>% mean()
foo %>% max()
foo %>% sd()

Libraries, functions, data

Function arguments
- Let’s look at arguments in the lm and min functions
- Arguments are inputs
- Arguments are comma separated

foo <- c(1,2,NA, 4)
foo %>% min()
## [1] NA
foo %>% mean()
## [1] NA
foo %>% max()
## [1] NA
foo %>% sd()
## [1] NA

Libraries, functions, data

Function arguments
- Let’s look at arguments in the lm and min functions
- Arguments are inputs
- Arguments are comma separated

foo <- c(1,2,NA, 4)
min(foo, na.rm = TRUE)
## [1] 1
mean(foo, na.rm = TRUE)
## [1] 2.333333
max(foo, na.rm = TRUE)
## [1] 4
sd(foo, na.rm = TRUE)
## [1] 1.527525

Libraries, functions, data

Oh my, so many types of data!
- Data frames, matrices, vectors
- Integers, numbers, characters, factors
Confirm data type

c('foo', 'moo', 'boo') %>% class()
c('foo', 'moo', 'boo') %>% is.character()
c('foo', 'moo', 'boo') %>% is.factor()

Coerce data type

c('foo', 'moo', 'boo') %>% as.factor()
c('foo', 'moo', 'boo') %>% as.factor() %>% class()

Libraries, functions, data

Oh my, so many types of data!
- Data frames, matrices, vectors
- Integers, numbers, characters, factors
Confirm data type

c('foo', 'moo', 'boo') %>% class()
## [1] "character"
c('foo', 'moo', 'boo') %>% is.character()
## [1] TRUE
c('foo', 'moo', 'boo') %>% is.factor()
## [1] FALSE

Coerce data type

c('foo', 'moo', 'boo') %>% as.factor()
c('foo', 'moo', 'boo') %>% as.factor() %>% class()

Libraries, functions, data

Oh my, so many types of data!
- Data frames, matrices, vectors
- Integers, numbers, characters, factors
Confirm data type

c('foo', 'moo', 'boo') %>% class()
## [1] "character"
c('foo', 'moo', 'boo') %>% is.character()
## [1] TRUE
c('foo', 'moo', 'boo') %>% is.factor()
## [1] FALSE

Coerce data type

c('foo', 'moo', 'boo') %>% as.factor()
## [1] foo moo boo
## Levels: boo foo moo
c('foo', 'moo', 'boo') %>% as.factor() %>% class()
## [1] "factor"

Libraries, functions, data

Oh my, so many types of data!
- Data frames, matrices, vectors
Data frames look a little like spreadsheets

data_frame(x = c(1:3), y = c(4:6), z = c('foo', 'boo', 'moo'))
## # A tibble: 3 x 3
##       x     y z    
##   <int> <int> <chr>
## 1     1     4 foo  
## 2     2     5 boo  
## 3     3     6 moo

Libraries, functions, data

Call a specific variable by name or location in the data frame
- Use the $ between the data frame name and the variable name
- Call the column (or row) by index number

cars$speed %>% head(7)
## [1]  4  4  7  7  8  9 10

cars[,1] %>% head(7)
## [1]  4  4  7  7  8  9 10

Libraries, functions, data

Oh my, so many types of data!
- Data frames, matrices, vectors
Matrices and data frames have similar shapes
Traditionally matrices contained data of a single type
When it comes down to using a matrix or data frame in our class, data frames are the way to go

matrix(data = 1:6, nrow = 3, ncol = 2)
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

Libraries, functions, data

Oh my, so many types of data!
- Data frames, matrices, vectors
Vectors contain 1 or more values in a string

c('foo', 'moo', 'boo')
## [1] "foo" "moo" "boo"
1:10
##  [1]  1  2  3  4  5  6  7  8  9 10
rep(1:2, times = 2)
## [1] 1 2 1 2
rep(c(1,2), times = 2)
## [1] 1 2 1 2
seq(from = 0, to = 100, by = 10)
##  [1]   0  10  20  30  40  50  60  70  80  90 100
seq(0, 100, 10)
##  [1]   0  10  20  30  40  50  60  70  80  90 100
cars$speed
##  [1]  4  4  7  7  8  9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14
## [24] 15 15 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24
## [47] 24 24 24 25

Libraries, functions, data

Oh my, so many types of data!
- Data frames, matrices, vectors
Call a specific element by its location in a vector

c('foo', 'moo', 'boo')[2]
seq(from = 0, to = 100, by = 10)[6]

Libraries, functions, data

Oh my, so many types of data!
- Data frames, matrices, vectors
Call a specific element by its location in a vector

c('foo', 'moo', 'boo')[2]
## [1] "moo"
seq(from = 0, to = 100, by = 10)[6]
## [1] 50

Let’s apply what we learned to real data

Exercise - 5 minutes
- What type of data object is candi?
- What are the min() and mean() for amount?
- What is the max() for election_year?

To answer the questions, import the political contributions dataset

candi <- read_csv('https://goo.gl/GTRqZs') %>% as.data.frame()

Let’s apply what we learned to real data

Exercise - 5 minutes
- What type of data object is candi?
- What are the min() and mean() for amount?
- What is the max() for election_year?

candi %>% class()
## [1] "data.frame"

Let’s apply what we learned to real data

Exercise - 5 minutes
- What type of data object is candi?
- What are the min() and mean() for amount?
- What is the max() for election_year?

candi %>% class()
## [1] "data.frame"
candi$amount %>% min(na.rm = TRUE)
## [1] -225000
candi$amount %>% mean(na.rm = TRUE)
## [1] 358.1753

Let’s apply what we learned to real data

Exercise - 5 minutes
- What type of data object is candi?
- What are the min() and mean() for amount?
- What is the max() for election_year?

candi %>% class()
## [1] "data.frame"
candi$amount %>% min(na.rm = TRUE)
## [1] -225000
candi$amount %>% mean(na.rm = TRUE)
## [1] 358.1753
candi$election_year %>% max(na.rm = TRUE)
## [1] 2023