Econometria I
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(nycflights13)
5.2.4 Exercises
1.Find all flights that
-1.Had an arrival delay of two or more hours
flights %>%
filter(arr_delay >= 120)
-2.Flew to Houston (IAH or HOU)
flights %>%
filter(dest=="IAH" | dest=="HOU")
-3.Were operated by United, American, or Delta
flights %>%
filter(carrier=="UA" | carrier=="AA" | carrier=="DL")
-4.Departed in summer (July, August, and September)
flights %>%
filter(month==7 | month==8 | month==9)
-5.Arrived more than two hours late, but didn’t leave late
flights %>%
filter(arr_delay>=120, dep_delay<=0)
-6.Were delayed by at least an hour, but made up over 30 minutes in flight
flights %>%
filter(dep_delay >= 60, dep_delay - arr_delay > 30)
-7.Departed between midnight and 6am (inclusive)
flights %>%
filter(dep_time <= 600 | dep_time == 2400)
5.3.1 Exercises
2.Sort flights to find the most delayed flights. Find the flights that left earliest.
flights %>%
arrange(desc(dep_delay))
3.Sort flights to find the fastest flights.
flights %>%
arrange(distance / air_time * 60)
5.4.1 Exercises
1.Brainstorm as many ways as possible to select dep_time, dep_delay, arr_time, and arr_delay from flights.
flights %>%
select(dep_time, dep_delay, arr_time, arr_delay)
2.What happens if you include the name of a variable multiple times in a select() call?
flights %>%
select(year, month, day, year, year)
5.5.2 Exercises
1.Currently dep_time and sched_dep_time are convenient to look at, but hard to compute with because they’re not really continuous numbers. Convert them to a more convenient representation of number of minutes since midnight.
flights_real_time <- flights %>%
mutate(dep_time_mins = (dep_time %/% 100 * 60 + dep_time %% 100) %% 1440,
sched_dep_time_mins = (sched_dep_time %/% 100 * 60 +
sched_dep_time %% 100) %% 1440)
flights_real_time %>%
select(dep_time, dep_time_mins, sched_dep_time, sched_dep_time_mins)
5.7.1 Exercises
2. Which plane (tailnum) has the worst on-time record?
flights %>%
group_by(tailnum) %>%
summarise(arr_delay = mean(arr_delay)) %>%
filter(min_rank(desc(arr_delay)) <= 1)