October 5, 2019
install.packages('devtools')
install.packages('tidyverse')
install.packages(c('ggthemes', 'officer'))
library(tidyverse)
foo <- c(1,2,4) foo %>% min() foo %>% mean() foo %>% max() foo %>% sd()
foo <- c(1,2,4) foo %>% min() ## [1] 1 foo %>% mean() ## [1] 2.333333 foo %>% max() ## [1] 4 foo %>% sd() ## [1] 1.527525
lm and min functionshelp(lm)
help(min)
cat_function <- function(love=TRUE){
if(love==TRUE){
print('I love cats!')
}
else {
print('I am not a cool person.')
}
}
lm and min functionsfoo <- c(1,2,NA, 4) foo %>% min() foo %>% mean() foo %>% max() foo %>% sd()
lm and min functionsfoo <- c(1,2,NA, 4) foo %>% min() ## [1] NA foo %>% mean() ## [1] NA foo %>% max() ## [1] NA foo %>% sd() ## [1] NA
lm and min functionsfoo <- c(1,2,NA, 4) min(foo, na.rm = TRUE) ## [1] 1 mean(foo, na.rm = TRUE) ## [1] 2.333333 max(foo, na.rm = TRUE) ## [1] 4 sd(foo, na.rm = TRUE) ## [1] 1.527525
c('foo', 'moo', 'boo') %>% class()
c('foo', 'moo', 'boo') %>% is.character()
c('foo', 'moo', 'boo') %>% is.factor()
c('foo', 'moo', 'boo') %>% as.factor()
c('foo', 'moo', 'boo') %>% as.factor() %>% class()
c('foo', 'moo', 'boo') %>% class()
## [1] "character"
c('foo', 'moo', 'boo') %>% is.character()
## [1] TRUE
c('foo', 'moo', 'boo') %>% is.factor()
## [1] FALSE
c('foo', 'moo', 'boo') %>% as.factor()
c('foo', 'moo', 'boo') %>% as.factor() %>% class()
c('foo', 'moo', 'boo') %>% class()
## [1] "character"
c('foo', 'moo', 'boo') %>% is.character()
## [1] TRUE
c('foo', 'moo', 'boo') %>% is.factor()
## [1] FALSE
c('foo', 'moo', 'boo') %>% as.factor()
## [1] foo moo boo
## Levels: boo foo moo
c('foo', 'moo', 'boo') %>% as.factor() %>% class()
## [1] "factor"
data_frame(x = c(1:3), y = c(4:6), z = c('foo', 'boo', 'moo'))
## # A tibble: 3 x 3
## x y z
## <int> <int> <chr>
## 1 1 4 foo
## 2 2 5 boo
## 3 3 6 moo
$ between the data frame name and the variable namepull() functionizes $cars$speed %>% head(7) ## [1] 4 4 7 7 8 9 10
cars %>% pull(speed) %>% head(7) ## [1] 4 4 7 7 8 9 10
cars[,1] %>% head(7) ## [1] 4 4 7 7 8 9 10
matrix(data = 1:6, nrow = 3, ncol = 2) ## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6
c('foo', 'moo', 'boo')
## [1] "foo" "moo" "boo"
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
rep(1:2, times = 2)
## [1] 1 2 1 2
rep(c(1,2), times = 2)
## [1] 1 2 1 2
seq(from = 0, to = 100, by = 10)
## [1] 0 10 20 30 40 50 60 70 80 90 100
seq(0, 100, 10)
## [1] 0 10 20 30 40 50 60 70 80 90 100
cars$speed
## [1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14
## [24] 15 15 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24
## [47] 24 24 24 25
c('foo', 'moo', 'boo')[2]
seq(from = 0, to = 100, by = 10)[6]
c('foo', 'moo', 'boo')[2]
## [1] "moo"
seq(from = 0, to = 100, by = 10)[6]
## [1] 50
Exercise - 5 minutes
- What type of data object is contr?
- What are the min() and mean() for amount?
- What is the max() for election_year?
To answer the questions, import the political contributions dataset
contr <- read_csv('https://bit.ly/2lQySrQ') %>% as.data.frame()
Exercise - 5 minutes
- What type of data object is contr?
- What are the min() and mean() for amount?
- What is the max() for election_year?
contr %>% class() ## [1] "data.frame"
Exercise - 5 minutes
- What type of data object is contr?
- What are the min() and mean() for amount?
- What is the max() for election_year?
contr %>% class() ## [1] "data.frame" contr$amount %>% min(na.rm = TRUE) ## [1] 0 contr$amount %>% mean(na.rm = TRUE) ## [1] 241.1717
Exercise - 5 minutes
- What type of data object is contr?
- What are the min() and mean() for amount?
- What is the max() for election_year?
contr %>% class() ## [1] "data.frame" contr$amount %>% min(na.rm = TRUE) ## [1] 0 contr$amount %>% mean(na.rm = TRUE) ## [1] 241.1717 contr$election_year %>% max(na.rm = TRUE) ## [1] 2023
head() shows you the top subset of a data in a data frame
head() argument and defaults at 6tail() shows the bottom subset of data in a data framesummary() shows summary statistics on all variabes in a dataset
ls() shows all variables in a data frame
ls() without calling an object between the parentheses to see all objects in your workspacestr() tells the variable type and selected variable values in a data frame for all variablesdim() tells you the dimensions of your dataset
nrow() reports the number of rows onlyncol() reports the number of columns onlyExercise - 5 minutes
- How many observations (or rows) are in contr?
- What is the median() value for amount?
- How many variables are in contr?
Hint: There are multiple ways to answers these questions with the functions you know
Exercise - 5 minutes
- How many observations (or rows) are in contr?
- What is the median value for amount?
- How many variables are in contr?
contr %>% dim() ## [1] 45000 22
Exercise - 5 minutes
- How many observations (or rows) are in contr?
- What is the median value for amount?
- How many variables are in contr?
contr %>% dim() ## [1] 45000 22 contr$amount %>% median(na.rm = TRUE) ## [1] 100
Exercise - 5 minutes
- How many observations (or rows) are in contr?
- What is the median value for amount?
- How many variables are in contr?
contr %>% dim() ## [1] 45000 22 contr$amount %>% median(na.rm = TRUE) ## [1] 100 contr %>% ncol() ## [1] 22
Exercise - 5 minutes
- How many observations (or rows) are in contr?
- What is the median value for amount?
- How many variables are in contr?
contr %>% dim() ## [1] 45000 22 contr$amount %>% median(na.rm = TRUE) ## [1] 100 contr %>% ncol() ## [1] 22
Other methods to answer questions
contr %>% summary() contr %>% ls()
table() shows you the distribution of values in a vectorlength() tells you in the number of elements in a vectorunique() shows you the unique values in a vectorsummary() shows you descriptive statistics for a vector
summary() can run on a vector or data frameExercise - 7 minutes
- How many ‘DEMOCRAT’ values are there in party?
- Which value in contributor_state is most frequent?
- Is ‘Mayoral Race’ a value you from the type variable?
- How many distinct contributor_zip values are there?
Hint: There are multiple ways to answers these questions with the functions you know
Exercise - 7 minutes
- How many ‘DEMOCRAT’ values are there in party?
- Which value in contributor_state is most frequent?
- Is ‘Mayoral Race’ a value you from the type variable?
- How many distinct contributor_zip values are there?
table(contr$party)[1] ## DEMOCRAT ## 14748 table(contr$contributor_state) %>% tail() ## ## VA VT WA WI WV WY ## 119 8 38394 19 27 3 contr$type %>% unique() ## [1] NA "Candidate" contr$contributor_zip %>% unique() %>% length() ## [1] 1845
Make sure…
min() on a character string<-
%>%
Exercise - 10 minutes
- How many rows and columns are there in crime?
- What is the most recent occurred_date?
- Which neighborhood sees the most incident activity?
- How many ‘THEFT-BICYCLE’ incidences are there in crime_subcategory?
- What are the earliest and latest reported_time values?
Begin the exercise by importing the crime dataset in R Studio
crime <- read_csv('https://bit.ly/2mcZLq4') %>% as.data.frame()
Exercise - 10 minutes
- How many rows and columns are there in crime?
- What is the most recent occurred_date?
- Which neighborhood sees the most incident activity?
- How many ‘THEFT-BICYCLE’ incidences are there in crime_subcategory?
- What are the earliest and latest reported_time values?
crime %>% dim() ## [1] 101141 13
Exercise - 10 minutes
- How many rows and columns are there in crime?
- What is the most recent occurred_date?
- Which neighborhood sees the most incident activity?
- How many ‘THEFT-BICYCLE’ incidences are there in crime_subcategory?
- What are the earliest and latest reported_time values?
crime %>% dim() ## [1] 101141 13 crime$occurred_date %>% max(na.rm = TRUE) ## [1] "2019-03-20"
Exercise - 10 minutes
- How many rows and columns are there in crime?
- What is the most recent occurred_date?
- Which neighborhood sees the most incident activity?
- How many ‘THEFT-BICYCLE’ incidences are there in crime_subcategory?
- What are the earliest and latest reported_time values?
crime$neighborhood %>% table() %>% sort() %>% tail() ## . ## UNIVERSITY SLU/CASCADE QUEEN ANNE ## 3524 4276 4671 ## CAPITOL HILL NORTHGATE DOWNTOWN COMMERCIAL ## 5091 5186 8813
Exercise - 10 minutes
- How many rows and columns are there in crime?
- What is the most recent occurred_date?
- Which neighborhood sees the most incident activity?
- How many ‘THEFT-BICYCLE’ incidences are there in crime_subcategory?
- What are the earliest and latest reported_time values?
crime$crime_subcategory %>% table() %>% tail(5) ## . ## THEFT-BICYCLE THEFT-BUILDING THEFT-SHOPLIFT TRESPASS WEAPON ## 1411 3855 9225 2505 917
Exercise - 10 minutes
- How many rows and columns are there in crime?
- What is the most recent occurred_date?
- Which neighborhood sees the most incident activity?
- How many ‘THEFT-BICYCLE’ incidences are there in crime_subcategory?
- What are the earliest and latest reported_time values?
crime$crime_subcategory %>% table() %>% tail(5) ## . ## THEFT-BICYCLE THEFT-BUILDING THEFT-SHOPLIFT TRESPASS WEAPON ## 1411 3855 9225 2505 917 crime$reported_time %>% min(na.rm = TRUE) ## [1] 0 crime$reported_time %>% max(na.rm = TRUE) ## [1] 2359