flights %>%
mutate(arrival_delay = arr_delay/60) %>%
filter(arrival_delay >= 2) %>%
select(flight, carrier, origin, dest, arrival_delay) %>%
arrange(desc(arrival_delay))
flights %>%
filter(carrier == c('UA', 'AA', 'DL')) %>%
group_by(carrier) %>%
summarize(cnt = n())
longer object length is not a multiple of shorter object length
flights %>%
filter(month == c(7,8,9)) %>%
mutate(month_name = ifelse(month==7, 'JULY',
ifelse(month==8, 'AUGUST',
ifelse(month==9, 'SEPTEMBER', NA)))) %>%
group_by(month_name) %>%
summarise(cnt = n())
longer object length is not a multiple of shorter object length
NA
flights %>%
mutate(arrival_delay_hour = arr_delay/60) %>%
filter(dep_delay <= 0 & arrival_delay_hour >= 2) %>%
select(flight, carrier, dest, origin, dep_delay, arrival_delay_hour)%>%
arrange(desc(arrival_delay_hour))
flights %>%
filter(hour >= 0 & hour <= 6) %>%
group_by(hour) %>%
summarize(cnt = n())
Sort flights to find the most delayed flights.
Find the flights that left earliest
Sort flights to find the fastes flights
Brainstorm as many ways as possible to select dep_time, dep_delay, arr_time, and arr_delay from flights.
What happens if you include the name of a variable multiple times in a select() call?
R/ Nothing, it is not included more than one time
Currently dep_time and sched_dep_time are convenient to look at, but hard to compute with because they’re not really continuous numbers. Convert them to a more convenient representation of number of minutes since midnight.
flights %>%
transmute(dep_time,
dep_time_hour = dep_time %/% 100 ,
dep_time_minute = dep_time %% 100,
dep_time_minutes_smn = dep_time_hour*60+dep_time_minute,
sched_dep_time,
sched_dep_time_hour = sched_dep_time %/% 100,
sched_dep_time_minute = sched_dep_time %% 100,
sched_dep_time_minutes_smn = sched_dep_time_hour*60+sched_dep_time_minute)
NA
NA
Which plane (tailnum) has the worst on-time record?