ANSWER
(1) There are many flights leaving before the departure time (2) Departure delay peaks are clearly visible with more bins
Create a new data frame that includes flights headed to SFO in February, and save this data frame as sfo_feb_flights. How many flights meet these criteria?
ANSWER
sfo_feb_flights <- nycflights %>%
filter(dest == "SFO", month == 2)
paste("Total flights in the criteria are : ",count(sfo_feb_flights))Total flights in the criteria are : 68
Describe the distribution of the arrival delays of these flights using a histogram and appropriate summary statistics
ANSWER
print("\n")
sfo_feb_flights <- nycflights %>%
filter(dest == "SFO", month==2)
ggplot(data = sfo_feb_flights, aes(x = arr_delay)) +
geom_histogram(bins = 15)print("From the above statistics we can inter that
American airlines(AA) has worst arrival and Virgin america has the best")
paste("In the month of Feb average number of flights
headed to San Francisco had an arrival time of ",median((sfo_feb_flights$arr_delay)))From the above statistics we can inter that American airlines(AA) has worst arrival and Virgin america has the best
In the month of Feb average number of flights headed to San Francisco had an arrival time of -11
Calculate the median and interquartile range for arr_delays of flights in in the sfo_feb_flights data frame, grouped by carrier. Which carrier has the most variable arrival delays?
ANSWER
bycarrier_sfo_feb_flights <- group_by(sfo_feb_flights, carrier)
summarise(bycarrier_sfo_feb_flights, Median=median(arr_delay), IQR=IQR(arr_delay))carrier Median IQR
1 AA 5 17.5
2 B6 -10.5 12.2
3 DL -15 22
4 UA -10 22
5 VX -22.5 21.2
Delta airlines(DL) and United airlines(UA) have most variable arrival delays
Suppose you really dislike departure delays and you want to schedule your travel in a month that minimizes your potential departure delay leaving NYC. One option is to choose the month with the lowest mean departure delay. Another option is to choose the month with the lowest median departure delay. What are the pros and cons of these two choices?
ANSWER
If you don’t like departure delays better choose based on mean so that you have an idea what the maximum delay it would take. Median gives you right average delay but what if the highest delayed flight will be again the same which you choose and you expect the delay to be as per median.
If you were selecting an airport simply based on on time departure percentage, which NYC airport would you choose to fly out of?
nycflights <- nycflights %>%
mutate(dep_type = ifelse(dep_delay < 5, "on time", "delayed"))
nycflights %>%
group_by(origin) %>%
summarise(ot_dep_rate = sum(dep_type == "on time") / n()) %>%
arrange(desc(ot_dep_rate))ANSWER
origin ot_dep_rate
1 LGA 0.728
2 JFK 0.694
3 EWR 0.637
I will choose LaGuardia(LGA)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.