Oscar Padilla

Carné: 13000285

Econometria I

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(nycflights13)

5.2.4 Exercises

1.Find all flights that

-1.Had an arrival delay of two or more hours

flights %>% 
    filter(arr_delay >= 120)

-2.Flew to Houston (IAH or HOU)

flights %>% 
    filter(dest=="IAH" | dest=="HOU")

-3.Were operated by United, American, or Delta

flights %>% 
    filter(carrier=="UA" | carrier=="AA" | carrier=="DL")

-4.Departed in summer (July, August, and September)

flights %>% 
    filter(month==7 | month==8 | month==9)

-5.Arrived more than two hours late, but didn’t leave late

flights %>% 
    filter(arr_delay>=120, dep_delay<=0)

-6.Were delayed by at least an hour, but made up over 30 minutes in flight

flights %>% 
    filter(dep_delay >= 60, dep_delay - arr_delay > 30)

-7.Departed between midnight and 6am (inclusive)

flights %>% 
    filter(dep_time <= 600 | dep_time == 2400)

5.3.1 Exercises

2.Sort flights to find the most delayed flights. Find the flights that left earliest.

flights %>% 
  arrange(desc(dep_delay))

3.Sort flights to find the fastest flights.

flights %>% 
  arrange(distance / air_time * 60)

5.4.1 Exercises

1.Brainstorm as many ways as possible to select dep_time, dep_delay, arr_time, and arr_delay from flights.

flights %>% 
  select(dep_time, dep_delay, arr_time, arr_delay)

2.What happens if you include the name of a variable multiple times in a select() call?

flights %>% 
  select(year, month, day, year, year)

5.5.2 Exercises

1.Currently dep_time and sched_dep_time are convenient to look at, but hard to compute with because they’re not really continuous numbers. Convert them to a more convenient representation of number of minutes since midnight.

flights_real_time <- flights %>% 
                      mutate(dep_time_mins = (dep_time %/% 100 * 60 + dep_time %% 100) %% 1440,
                      sched_dep_time_mins = (sched_dep_time %/% 100 * 60 + 
                      sched_dep_time %% 100) %% 1440)
flights_real_time %>% 
  select(dep_time, dep_time_mins, sched_dep_time, sched_dep_time_mins)

5.7.1 Exercises

2. Which plane (tailnum) has the worst on-time record?

flights %>%
  group_by(tailnum) %>%
  summarise(arr_delay = mean(arr_delay)) %>%
  filter(min_rank(desc(arr_delay)) <= 1)