Admistrative:

Please indicate

  • Who you collaborated with:
  • Roughly how much time you spent on this HW so far: 3 hours
  • The URL of the RPubs published URL here.
  • What gave you the most trouble: overriding my data frames and then getting errors after that
  • Any comments you have:

Question 1:

Plot a “time series” of the proportion of flights that were delayed by > 30 minutes on each day. i.e.

  • the x-axis should be some notion of time
  • the y-axis should be the proportion.

Using this plot, indicate describe the seasonality of when delays over 30 minutes tend to occur.

delayed_per_day <- flights %>% mutate(over_30 = ifelse(dep_delay>30, TRUE, FALSE)) %>% group_by(date) %>% summarise(p_delayed = mean(over_30, na.rm=TRUE))

ggplot(data=delayed_per_day, aes(x=date, y=p_delayed)) + geom_point()

Question 2:

Some people prefer flying on older planes. Even though they aren’t as nice, they tend to have more room. Which airlines should these people favor?

Airplanes built before 2000 will be considered “old” http://www.independenttraveler.com/travel-tips/travelers-ed/the-airplane-seat-narrow-cramped-and-about-to-get-worse

planes <- planes %>% mutate(old_plane = ifelse(year<2000, TRUE, FALSE))

flights_temp <- left_join(flights, planes, by=“plane”) flights_temp <- flights_temp %>% select(plane, old_plane, carrier) %>% group_by(carrier) %>% summarise(p_old = mean(old_plane, na.rm=TRUE)) %>% arrange(p_old)

ggplot(data=flights_temp, aes(x=carrier, y=p_old)) + geom_bar(stat=“identity”)

Question 3:

  • What states did Southwest Airlines’ flight paths tend to fly to?
  • What states did Southwest Airlines’ flights tend to fly to?

For example, Southwest Airlines Flight 60 to Dallas consists of a single flight path, but since it flew 299 times in 2013, it would be counted as 299 flights.

Southwest’s airline carrier code is WN.

southwest_flights <- flights %>% filter(carrier == “WN”) %>% select(date, flight, dest)

southwest_flights <- left_join(southwest_flights, airports, by = c(“dest”=“iata”))

sw_flightcount <- southwest_flights %>% group_by(state) %>% tally() %>% rename(num_swflights_per_state = n)

N/A entries for state all are the ECP flights which I think is an airport in Florida

ggplot(data=sw_flightcount, aes(x=state, y=num_swflights_per_state)) + geom_bar(stat=“identity”)

the plot above displays how many southwest flights go to each state

sw_flightpath_count <- southwest_flights %>% group_by(flight, state) %>% tally()

Question 4:

I want to know proportionately what regions (NE, south, west, midwest) each carrier flies to/from Houston in the month of July. Consider the month() function from the lubridate package.