My first step is to set the working directory to the designated folder. Once this is done, I will then use the read.csv function to download the data files into the R Environment, and create names for the files so I can reference them.
setwd("C:/Users/jhass/OneDrive - University of Cincinnati/College Class Materials/3rd Year/Fall '20/Descriptive Analytics & Data Visualization/Homeworks/Homework 4")
CVG.Flights <- read.csv("CVG_Flights.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = "")
airlines <- read.csv("airlines.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = "")
airports <- read.csv("airports.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = "")
Next, I will load the package “tidyverse,” which will be needed for some of the codes.
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
I will use the nrow() function to check the number of rows in each of the data files. This will provide information on how many records (or entries) there are for each file.
nrow(CVG.Flights)
## [1] 7763
nrow(airlines)
## [1] 14
nrow(airports)
## [1] 322
The nrow() function shows me that there are 7763 rows for the CVG.Flights dataset, 14 rows for the airlines dataset, and 322 rows for the airports dataset.
I will use the ncol() function to check the number of columns in each of the data files. This will provide information on the number of categories (or variables) for each file.
ncol(CVG.Flights)
## [1] 27
ncol(airlines)
## [1] 2
ncol(airports)
## [1] 7
The ncol() functions shows me that there are 27 columns for the CVG.Flights dataset, 2 columns for the airlines dataset, and 7 columns for the airports dataset.
My next step is to merge the data frames into one new data frame. To do this, I will use the left_join() function three times. First, I will merge CVG.Flights and airlines by their columns named “AIRLINE” and “IATA_CODE”, which will provide the Airline name in this data frame based on their shared codes. Then, I will merge this new data frame with the airports data frame by their columns named “ORIGIN_AIRPORT” and “IATA_CODE”. Finally, I will merge this newest data frame with the airports data frame again, this time based on their columns named “DESTINATION_AIRPORT” and “IATA_CODE”.
flights_w_airlines <- left_join(CVG.Flights, airlines, by = c("AIRLINE" = "IATA_CODE"))
flights_w_origins <- left_join(flights_w_airlines, airports, by = c("ORIGIN_AIRPORT" = "IATA_CODE"))
flights_merged <- left_join(flights_w_origins, airports, by = c("DESTINATION_AIRPORT" = "IATA_CODE"))
After merging the data frames three times, I now have a new data frame called “flights_merged” with 40 columns providing more information on all the flight records.
Using the head() and tail() functions, the first and last six rows will be printed, respectively. This can help visualize a subset of the new data frame to ensure there weren’t any mistakes.
head(flights_merged)
## FLIGHT_DATE AIRLINE FLIGHT_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT
## 1 1/1/2015 MQ 3355 CVG MIA
## 2 1/1/2015 MQ 3597 CVG DFW
## 3 1/1/2015 OO 4811 BNA CVG
## 4 1/1/2015 DL 2315 CVG ATL
## 5 1/1/2015 F9 1308 IAD CVG
## 6 1/1/2015 DL 1502 CVG TPA
## SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF
## 1 6:00:00 AM 5:56:00 AM -4 15 6:11:00 AM
## 2 6:00:00 AM <NA> NA NA <NA>
## 3 6:45:00 AM 6:37:00 AM -8 16 6:53:00 AM
## 4 7:10:00 AM 7:08:00 AM -2 36 7:44:00 AM
## 5 7:15:00 AM 7:05:00 AM -10 11 7:16:00 AM
## 6 7:20:00 AM 7:19:00 AM -1 9 7:28:00 AM
## SCHEDULED_TIME ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN
## 1 153 150 123 948 8:14:00 AM 12
## 2 165 NA NA 812 <NA> NA
## 3 72 66 45 230 8:38:00 AM 5
## 4 102 101 62 373 8:46:00 AM 3
## 5 95 91 75 388 8:31:00 AM 5
## 6 141 116 105 773 9:13:00 AM 2
## SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED
## 1 8:33:00 AM 8:26:00 AM -7 0 0
## 2 7:45:00 AM <NA> NA 0 1
## 3 8:57:00 AM 8:43:00 AM -14 0 0
## 4 8:52:00 AM 8:49:00 AM -3 0 0
## 5 8:50:00 AM 8:36:00 AM -14 0 0
## 6 9:41:00 AM 9:15:00 AM -26 0 0
## CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY
## 1 <NA> NA NA NA
## 2 B NA NA NA
## 3 <NA> NA NA NA
## 4 <NA> NA NA NA
## 5 <NA> NA NA NA
## 6 <NA> NA NA NA
## LATE_AIRCRAFT_DELAY WEATHER_DELAY AIRLINE.y
## 1 NA NA American Eagle Airlines Inc.
## 2 NA NA American Eagle Airlines Inc.
## 3 NA NA Skywest Airlines Inc.
## 4 NA NA Delta Air Lines Inc.
## 5 NA NA Frontier Airlines Inc.
## 6 NA NA Delta Air Lines Inc.
## AIRPORT.x CITY.x STATE.x
## 1 Cincinnati/Northern Kentucky International Airport Covington KY
## 2 Cincinnati/Northern Kentucky International Airport Covington KY
## 3 Nashville International Airport Nashville TN
## 4 Cincinnati/Northern Kentucky International Airport Covington KY
## 5 Washington Dulles International Airport Chantilly VA
## 6 Cincinnati/Northern Kentucky International Airport Covington KY
## COUNTRY.x LATITUDE.x LONGITUDE.x
## 1 USA 39.04614 -84.66217
## 2 USA 39.04614 -84.66217
## 3 USA 36.12448 -86.67818
## 4 USA 39.04614 -84.66217
## 5 USA 38.94453 -77.45581
## 6 USA 39.04614 -84.66217
## AIRPORT.y CITY.y STATE.y
## 1 Miami International Airport Miami FL
## 2 Dallas/Fort Worth International Airport Dallas-Fort Worth TX
## 3 Cincinnati/Northern Kentucky International Airport Covington KY
## 4 Hartsfield-Jackson Atlanta International Airport Atlanta GA
## 5 Cincinnati/Northern Kentucky International Airport Covington KY
## 6 Tampa International Airport Tampa FL
## COUNTRY.y LATITUDE.y LONGITUDE.y
## 1 USA 25.79325 -80.29056
## 2 USA 32.89595 -97.03720
## 3 USA 39.04614 -84.66217
## 4 USA 33.64044 -84.42694
## 5 USA 39.04614 -84.66217
## 6 USA 27.97547 -82.53325
tail(flights_merged)
## FLIGHT_DATE AIRLINE FLIGHT_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT
## 7758 3/10/2015 DL 803 CVG MCO
## 7759 3/10/2015 DL 2011 ATL CVG
## 7760 3/10/2015 EV 4638 EWR CVG
## 7761 3/10/2015 DL 1705 CVG LGA
## 7762 3/10/2015 F9 734 DEN CVG
## 7763 3/10/2015 OO 5643 DEN CVG
## SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF
## 7758 9:40:00 AM 9:36:00 AM -4 12 9:48:00 AM
## 7759 9:40:00 AM 9:48:00 AM 8 10 9:58:00 AM
## 7760 9:41:00 AM 9:31:00 AM -10 15 9:46:00 AM
## 7761 9:45:00 AM 9:45:00 AM 0 13 9:58:00 AM
## 7762 10:00:00 AM 10:09:00 AM 9 16 10:25:00 AM
## 7763 10:00:00 AM 10:02:00 AM 2 16 10:18:00 AM
## SCHEDULED_TIME ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN
## 7758 129 127 110 757 1:38:00 AM 5
## 7759 85 74 59 373 1:57:00 AM 5
## 7760 136 125 103 569 1:29:00 AM 7
## 7761 121 100 73 585 1:11:00 AM 14
## 7762 152 156 132 1069 1:37:00 AM 8
## 7763 159 159 134 1069 1:32:00 AM 9
## SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED
## 7758 11:49:00 AM 11:43:00 AM -6 0 0
## 7759 11:05:00 AM 11:02:00 AM -3 0 0
## 7760 11:57:00 AM 11:36:00 AM -21 0 0
## 7761 11:46:00 AM 11:25:00 AM -21 0 0
## 7762 2:32:00 PM 2:45:00 PM 13 0 0
## 7763 2:39:00 PM 2:41:00 PM 2 0 0
## CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY
## 7758 <NA> NA NA NA
## 7759 <NA> NA NA NA
## 7760 <NA> NA NA NA
## 7761 <NA> NA NA NA
## 7762 <NA> NA NA NA
## 7763 <NA> NA NA NA
## LATE_AIRCRAFT_DELAY WEATHER_DELAY AIRLINE.y
## 7758 NA NA Delta Air Lines Inc.
## 7759 NA NA Delta Air Lines Inc.
## 7760 NA NA Atlantic Southeast Airlines
## 7761 NA NA Delta Air Lines Inc.
## 7762 NA NA Frontier Airlines Inc.
## 7763 NA NA Skywest Airlines Inc.
## AIRPORT.x CITY.x STATE.x
## 7758 Cincinnati/Northern Kentucky International Airport Covington KY
## 7759 Hartsfield-Jackson Atlanta International Airport Atlanta GA
## 7760 Newark Liberty International Airport Newark NJ
## 7761 Cincinnati/Northern Kentucky International Airport Covington KY
## 7762 Denver International Airport Denver CO
## 7763 Denver International Airport Denver CO
## COUNTRY.x LATITUDE.x LONGITUDE.x
## 7758 USA 39.04614 -84.66217
## 7759 USA 33.64044 -84.42694
## 7760 USA 40.69250 -74.16866
## 7761 USA 39.04614 -84.66217
## 7762 USA 39.85841 -104.66700
## 7763 USA 39.85841 -104.66700
## AIRPORT.y CITY.y STATE.y
## 7758 Orlando International Airport Orlando FL
## 7759 Cincinnati/Northern Kentucky International Airport Covington KY
## 7760 Cincinnati/Northern Kentucky International Airport Covington KY
## 7761 LaGuardia Airport (Marine Air Terminal) New York NY
## 7762 Cincinnati/Northern Kentucky International Airport Covington KY
## 7763 Cincinnati/Northern Kentucky International Airport Covington KY
## COUNTRY.y LATITUDE.y LONGITUDE.y
## 7758 USA 28.42889 -81.31603
## 7759 USA 39.04614 -84.66217
## 7760 USA 39.04614 -84.66217
## 7761 USA 40.77724 -73.87261
## 7762 USA 39.04614 -84.66217
## 7763 USA 39.04614 -84.66217
This helps confirm that the data frames were successfully merged and didn’t encounter any problems.
I will use the colSums() function and the is.na() function within it to see the number of missing values (which have values of NA) for each of the columns.
colSums(is.na(flights_merged))
## FLIGHT_DATE AIRLINE FLIGHT_NUMBER ORIGIN_AIRPORT
## 0 0 0 0
## DESTINATION_AIRPORT SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY
## 0 0 370 370
## TAXI_OUT WHEELS_OFF SCHEDULED_TIME ELAPSED_TIME
## 378 378 0 392
## AIR_TIME DISTANCE WHEELS_ON TAXI_IN
## 392 0 381 381
## SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED
## 0 381 392 0
## CANCELLED CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY
## 0 7385 5814 5814
## AIRLINE_DELAY LATE_AIRCRAFT_DELAY WEATHER_DELAY AIRLINE.y
## 5814 5814 5814 0
## AIRPORT.x CITY.x STATE.x COUNTRY.x
## 0 0 0 0
## LATITUDE.x LONGITUDE.x AIRPORT.y CITY.y
## 0 0 0 0
## STATE.y COUNTRY.y LATITUDE.y LONGITUDE.y
## 0 0 0 0
This step shows that there are missing values in the DEPARTURE_TIME, DEPARTURE_DELAY, TAXI_OUT, WHEELS_OFF, ELAPSED_TIME, AIR_TIME, WHEELS_ON, TAXI_IN, ARRIVAL_TIME, ARRIVAL_DELAY, CANCELLATION_REASON, AIR_SYSTEM_DELAY, SECURITY_DELAY, AIRLINE_DELAY, LATE_AIRCRAFT_DELAY, and WEATHER_DELAY variables.
Since cancelled flights has a value of “1” in the “CANCELLED” category, taking a sum of the records equal to 1 in this category will amount to a count of cancelled flights. I will use the sum() function to do so, and then divide it by the number of total records, found by using the nrow() function, which gives the proportion of flights that were cancelled. Multiplying this by 100 gives the percentage of cancelled flights.
sum(flights_merged$CANCELLED == 1)
## [1] 378
(sum(flights_merged$CANCELLED == 1)) / (nrow(flights_merged))
## [1] 0.04869252
(sum(flights_merged$CANCELLED == 1)) / (nrow(flights_merged))*100
## [1] 4.869252
This shows us that roughly 4.87% of flights were cancelled.
I will define the CANCELLATION_REASON category as a factor to be able to use the levels() function to see the number of cancellation reasons.
flights_merged$CANCELLATION_REASON <- as.factor(flights_merged$CANCELLATION_REASON)
levels(flights_merged$CANCELLATION_REASON)
## [1] "A" "B" "C"
This shows us that there are three different cancellation reasons, “A”,“B”, and “C”.
I will use the sum() function as well as the is.na() function within it to find the number of records with missing values.
sum(is.na(flights_merged$DEPARTURE_TIME))
## [1] 370
This step shows that there are 370 records with missing values in the DEPARTURE_TIME category. This is very close to the number of flights that were cancelled, which was 378, which may explain why many of these flights had missing values in the DEPARTURE_TIME category.
I will use define a new category, called TimeDifference, as the difference between the SCHEDULED_TIME category and the ELAPSED_TIME category.
flights_merged$TimeDifference <- flights_merged$SCHEDULED_TIME - flights_merged$ELAPSED_TIME
head(flights_merged)
## FLIGHT_DATE AIRLINE FLIGHT_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT
## 1 1/1/2015 MQ 3355 CVG MIA
## 2 1/1/2015 MQ 3597 CVG DFW
## 3 1/1/2015 OO 4811 BNA CVG
## 4 1/1/2015 DL 2315 CVG ATL
## 5 1/1/2015 F9 1308 IAD CVG
## 6 1/1/2015 DL 1502 CVG TPA
## SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF
## 1 6:00:00 AM 5:56:00 AM -4 15 6:11:00 AM
## 2 6:00:00 AM <NA> NA NA <NA>
## 3 6:45:00 AM 6:37:00 AM -8 16 6:53:00 AM
## 4 7:10:00 AM 7:08:00 AM -2 36 7:44:00 AM
## 5 7:15:00 AM 7:05:00 AM -10 11 7:16:00 AM
## 6 7:20:00 AM 7:19:00 AM -1 9 7:28:00 AM
## SCHEDULED_TIME ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN
## 1 153 150 123 948 8:14:00 AM 12
## 2 165 NA NA 812 <NA> NA
## 3 72 66 45 230 8:38:00 AM 5
## 4 102 101 62 373 8:46:00 AM 3
## 5 95 91 75 388 8:31:00 AM 5
## 6 141 116 105 773 9:13:00 AM 2
## SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED
## 1 8:33:00 AM 8:26:00 AM -7 0 0
## 2 7:45:00 AM <NA> NA 0 1
## 3 8:57:00 AM 8:43:00 AM -14 0 0
## 4 8:52:00 AM 8:49:00 AM -3 0 0
## 5 8:50:00 AM 8:36:00 AM -14 0 0
## 6 9:41:00 AM 9:15:00 AM -26 0 0
## CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY
## 1 <NA> NA NA NA
## 2 B NA NA NA
## 3 <NA> NA NA NA
## 4 <NA> NA NA NA
## 5 <NA> NA NA NA
## 6 <NA> NA NA NA
## LATE_AIRCRAFT_DELAY WEATHER_DELAY AIRLINE.y
## 1 NA NA American Eagle Airlines Inc.
## 2 NA NA American Eagle Airlines Inc.
## 3 NA NA Skywest Airlines Inc.
## 4 NA NA Delta Air Lines Inc.
## 5 NA NA Frontier Airlines Inc.
## 6 NA NA Delta Air Lines Inc.
## AIRPORT.x CITY.x STATE.x
## 1 Cincinnati/Northern Kentucky International Airport Covington KY
## 2 Cincinnati/Northern Kentucky International Airport Covington KY
## 3 Nashville International Airport Nashville TN
## 4 Cincinnati/Northern Kentucky International Airport Covington KY
## 5 Washington Dulles International Airport Chantilly VA
## 6 Cincinnati/Northern Kentucky International Airport Covington KY
## COUNTRY.x LATITUDE.x LONGITUDE.x
## 1 USA 39.04614 -84.66217
## 2 USA 39.04614 -84.66217
## 3 USA 36.12448 -86.67818
## 4 USA 39.04614 -84.66217
## 5 USA 38.94453 -77.45581
## 6 USA 39.04614 -84.66217
## AIRPORT.y CITY.y STATE.y
## 1 Miami International Airport Miami FL
## 2 Dallas/Fort Worth International Airport Dallas-Fort Worth TX
## 3 Cincinnati/Northern Kentucky International Airport Covington KY
## 4 Hartsfield-Jackson Atlanta International Airport Atlanta GA
## 5 Cincinnati/Northern Kentucky International Airport Covington KY
## 6 Tampa International Airport Tampa FL
## COUNTRY.y LATITUDE.y LONGITUDE.y TimeDifference
## 1 USA 25.79325 -80.29056 3
## 2 USA 32.89595 -97.03720 NA
## 3 USA 39.04614 -84.66217 6
## 4 USA 33.64044 -84.42694 1
## 5 USA 39.04614 -84.66217 4
## 6 USA 27.97547 -82.53325 25
tail(flights_merged)
## FLIGHT_DATE AIRLINE FLIGHT_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT
## 7758 3/10/2015 DL 803 CVG MCO
## 7759 3/10/2015 DL 2011 ATL CVG
## 7760 3/10/2015 EV 4638 EWR CVG
## 7761 3/10/2015 DL 1705 CVG LGA
## 7762 3/10/2015 F9 734 DEN CVG
## 7763 3/10/2015 OO 5643 DEN CVG
## SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF
## 7758 9:40:00 AM 9:36:00 AM -4 12 9:48:00 AM
## 7759 9:40:00 AM 9:48:00 AM 8 10 9:58:00 AM
## 7760 9:41:00 AM 9:31:00 AM -10 15 9:46:00 AM
## 7761 9:45:00 AM 9:45:00 AM 0 13 9:58:00 AM
## 7762 10:00:00 AM 10:09:00 AM 9 16 10:25:00 AM
## 7763 10:00:00 AM 10:02:00 AM 2 16 10:18:00 AM
## SCHEDULED_TIME ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN
## 7758 129 127 110 757 1:38:00 AM 5
## 7759 85 74 59 373 1:57:00 AM 5
## 7760 136 125 103 569 1:29:00 AM 7
## 7761 121 100 73 585 1:11:00 AM 14
## 7762 152 156 132 1069 1:37:00 AM 8
## 7763 159 159 134 1069 1:32:00 AM 9
## SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED
## 7758 11:49:00 AM 11:43:00 AM -6 0 0
## 7759 11:05:00 AM 11:02:00 AM -3 0 0
## 7760 11:57:00 AM 11:36:00 AM -21 0 0
## 7761 11:46:00 AM 11:25:00 AM -21 0 0
## 7762 2:32:00 PM 2:45:00 PM 13 0 0
## 7763 2:39:00 PM 2:41:00 PM 2 0 0
## CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY
## 7758 <NA> NA NA NA
## 7759 <NA> NA NA NA
## 7760 <NA> NA NA NA
## 7761 <NA> NA NA NA
## 7762 <NA> NA NA NA
## 7763 <NA> NA NA NA
## LATE_AIRCRAFT_DELAY WEATHER_DELAY AIRLINE.y
## 7758 NA NA Delta Air Lines Inc.
## 7759 NA NA Delta Air Lines Inc.
## 7760 NA NA Atlantic Southeast Airlines
## 7761 NA NA Delta Air Lines Inc.
## 7762 NA NA Frontier Airlines Inc.
## 7763 NA NA Skywest Airlines Inc.
## AIRPORT.x CITY.x STATE.x
## 7758 Cincinnati/Northern Kentucky International Airport Covington KY
## 7759 Hartsfield-Jackson Atlanta International Airport Atlanta GA
## 7760 Newark Liberty International Airport Newark NJ
## 7761 Cincinnati/Northern Kentucky International Airport Covington KY
## 7762 Denver International Airport Denver CO
## 7763 Denver International Airport Denver CO
## COUNTRY.x LATITUDE.x LONGITUDE.x
## 7758 USA 39.04614 -84.66217
## 7759 USA 33.64044 -84.42694
## 7760 USA 40.69250 -74.16866
## 7761 USA 39.04614 -84.66217
## 7762 USA 39.85841 -104.66700
## 7763 USA 39.85841 -104.66700
## AIRPORT.y CITY.y STATE.y
## 7758 Orlando International Airport Orlando FL
## 7759 Cincinnati/Northern Kentucky International Airport Covington KY
## 7760 Cincinnati/Northern Kentucky International Airport Covington KY
## 7761 LaGuardia Airport (Marine Air Terminal) New York NY
## 7762 Cincinnati/Northern Kentucky International Airport Covington KY
## 7763 Cincinnati/Northern Kentucky International Airport Covington KY
## COUNTRY.y LATITUDE.y LONGITUDE.y TimeDifference
## 7758 USA 28.42889 -81.31603 2
## 7759 USA 39.04614 -84.66217 11
## 7760 USA 39.04614 -84.66217 11
## 7761 USA 40.77724 -73.87261 21
## 7762 USA 39.04614 -84.66217 -4
## 7763 USA 39.04614 -84.66217 0
Printing the first and last six rows of the data frame shows that this new category seems to have been successfully created within the flights_merged data frame.
I will define a new data frame, called Delta_to_Cincy, as a filter of the flights_merged data frame where the values of the AIRLINE.y and ORIGIN_AIRPORT variables are equal to the stated requirements.
Delta_to_Cincy <-filter(flights_merged, AIRLINE.y == "Delta Air Lines Inc.", ORIGIN_AIRPORT == "CVG")
head(Delta_to_Cincy)
## FLIGHT_DATE AIRLINE FLIGHT_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT
## 1 1/1/2015 DL 2315 CVG ATL
## 2 1/1/2015 DL 1502 CVG TPA
## 3 1/1/2015 DL 2359 CVG RSW
## 4 1/1/2015 DL 1741 CVG SLC
## 5 1/1/2015 DL 1221 CVG SEA
## 6 1/1/2015 DL 2367 CVG FLL
## SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF
## 1 7:10:00 AM 7:08:00 AM -2 36 7:44:00 AM
## 2 7:20:00 AM 7:19:00 AM -1 9 7:28:00 AM
## 3 7:31:00 AM 7:22:00 AM -9 27 7:49:00 AM
## 4 8:25:00 AM 8:20:00 AM -5 20 8:40:00 AM
## 5 8:40:00 AM 8:39:00 AM -1 10 8:49:00 AM
## 6 9:30:00 AM 9:27:00 AM -3 19 9:46:00 AM
## SCHEDULED_TIME ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN
## 1 102 101 62 373 8:46:00 AM 3
## 2 141 116 105 773 9:13:00 AM 2
## 3 153 150 117 879 9:46:00 AM 6
## 4 244 238 214 1450 1:14:00 AM 4
## 5 306 289 270 1965 1:19:00 AM 9
## 6 165 144 122 932 1:48:00 AM 3
## SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED
## 1 8:52:00 AM 8:49:00 AM -3 0 0
## 2 9:41:00 AM 9:15:00 AM -26 0 0
## 3 10:04:00 AM 9:52:00 AM -12 0 0
## 4 10:29:00 AM 10:18:00 AM -11 0 0
## 5 10:46:00 AM 10:28:00 AM -18 0 0
## 6 12:15:00 PM 11:51:00 AM -24 0 0
## CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY
## 1 <NA> NA NA NA
## 2 <NA> NA NA NA
## 3 <NA> NA NA NA
## 4 <NA> NA NA NA
## 5 <NA> NA NA NA
## 6 <NA> NA NA NA
## LATE_AIRCRAFT_DELAY WEATHER_DELAY AIRLINE.y
## 1 NA NA Delta Air Lines Inc.
## 2 NA NA Delta Air Lines Inc.
## 3 NA NA Delta Air Lines Inc.
## 4 NA NA Delta Air Lines Inc.
## 5 NA NA Delta Air Lines Inc.
## 6 NA NA Delta Air Lines Inc.
## AIRPORT.x CITY.x STATE.x
## 1 Cincinnati/Northern Kentucky International Airport Covington KY
## 2 Cincinnati/Northern Kentucky International Airport Covington KY
## 3 Cincinnati/Northern Kentucky International Airport Covington KY
## 4 Cincinnati/Northern Kentucky International Airport Covington KY
## 5 Cincinnati/Northern Kentucky International Airport Covington KY
## 6 Cincinnati/Northern Kentucky International Airport Covington KY
## COUNTRY.x LATITUDE.x LONGITUDE.x
## 1 USA 39.04614 -84.66217
## 2 USA 39.04614 -84.66217
## 3 USA 39.04614 -84.66217
## 4 USA 39.04614 -84.66217
## 5 USA 39.04614 -84.66217
## 6 USA 39.04614 -84.66217
## AIRPORT.y CITY.y STATE.y
## 1 Hartsfield-Jackson Atlanta International Airport Atlanta GA
## 2 Tampa International Airport Tampa FL
## 3 Southwest Florida International Airport Ft. Myers FL
## 4 Salt Lake City International Airport Salt Lake City UT
## 5 Seattle-Tacoma International Airport Seattle WA
## 6 Fort Lauderdale-Hollywood International Airport Ft. Lauderdale FL
## COUNTRY.y LATITUDE.y LONGITUDE.y TimeDifference
## 1 USA 33.64044 -84.42694 1
## 2 USA 27.97547 -82.53325 25
## 3 USA 26.53617 -81.75517 3
## 4 USA 40.78839 -111.97777 6
## 5 USA 47.44898 -122.30931 17
## 6 USA 26.07258 -80.15275 21
tail(Delta_to_Cincy)
## FLIGHT_DATE AIRLINE FLIGHT_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT
## 1144 3/10/2015 DL 2498 CVG ATL
## 1145 3/10/2015 DL 2527 CVG FLL
## 1146 3/10/2015 DL 1502 CVG TPA
## 1147 3/10/2015 DL 1933 CVG LAX
## 1148 3/10/2015 DL 803 CVG MCO
## 1149 3/10/2015 DL 1705 CVG LGA
## SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF
## 1144 9:30:00 AM 9:28:00 AM -2 9 9:37:00 AM
## 1145 9:30:00 AM 9:33:00 AM 3 10 9:43:00 AM
## 1146 9:30:00 AM 9:28:00 AM -2 13 9:41:00 AM
## 1147 9:30:00 AM 9:27:00 AM -3 12 9:39:00 AM
## 1148 9:40:00 AM 9:36:00 AM -4 12 9:48:00 AM
## 1149 9:45:00 AM 9:45:00 AM 0 13 9:58:00 AM
## SCHEDULED_TIME ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN
## 1144 90 80 64 373 1:41:00 AM 7
## 1145 151 145 131 932 1:54:00 AM 4
## 1146 125 127 111 773 1:32:00 AM 3
## 1147 305 278 256 1900 1:55:00 AM 10
## 1148 129 127 110 757 1:38:00 AM 5
## 1149 121 100 73 585 1:11:00 AM 14
## SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED
## 1144 11:00:00 AM 10:48:00 AM -12 0 0
## 1145 12:01:00 PM 11:58:00 AM -3 0 0
## 1146 11:35:00 AM 11:35:00 AM 0 0 0
## 1147 11:35:00 AM 11:05:00 AM -30 0 0
## 1148 11:49:00 AM 11:43:00 AM -6 0 0
## 1149 11:46:00 AM 11:25:00 AM -21 0 0
## CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY
## 1144 <NA> NA NA NA
## 1145 <NA> NA NA NA
## 1146 <NA> NA NA NA
## 1147 <NA> NA NA NA
## 1148 <NA> NA NA NA
## 1149 <NA> NA NA NA
## LATE_AIRCRAFT_DELAY WEATHER_DELAY AIRLINE.y
## 1144 NA NA Delta Air Lines Inc.
## 1145 NA NA Delta Air Lines Inc.
## 1146 NA NA Delta Air Lines Inc.
## 1147 NA NA Delta Air Lines Inc.
## 1148 NA NA Delta Air Lines Inc.
## 1149 NA NA Delta Air Lines Inc.
## AIRPORT.x CITY.x STATE.x
## 1144 Cincinnati/Northern Kentucky International Airport Covington KY
## 1145 Cincinnati/Northern Kentucky International Airport Covington KY
## 1146 Cincinnati/Northern Kentucky International Airport Covington KY
## 1147 Cincinnati/Northern Kentucky International Airport Covington KY
## 1148 Cincinnati/Northern Kentucky International Airport Covington KY
## 1149 Cincinnati/Northern Kentucky International Airport Covington KY
## COUNTRY.x LATITUDE.x LONGITUDE.x
## 1144 USA 39.04614 -84.66217
## 1145 USA 39.04614 -84.66217
## 1146 USA 39.04614 -84.66217
## 1147 USA 39.04614 -84.66217
## 1148 USA 39.04614 -84.66217
## 1149 USA 39.04614 -84.66217
## AIRPORT.y CITY.y STATE.y
## 1144 Hartsfield-Jackson Atlanta International Airport Atlanta GA
## 1145 Fort Lauderdale-Hollywood International Airport Ft. Lauderdale FL
## 1146 Tampa International Airport Tampa FL
## 1147 Los Angeles International Airport Los Angeles CA
## 1148 Orlando International Airport Orlando FL
## 1149 LaGuardia Airport (Marine Air Terminal) New York NY
## COUNTRY.y LATITUDE.y LONGITUDE.y TimeDifference
## 1144 USA 33.64044 -84.42694 10
## 1145 USA 26.07258 -80.15275 6
## 1146 USA 27.97547 -82.53325 -2
## 1147 USA 33.94254 -118.40807 27
## 1148 USA 28.42889 -81.31603 2
## 1149 USA 40.77724 -73.87261 21
Using the head() and tail() functions to print the first and last siz rows shows that the new data frame is a subset of the original with only the records where Airline is Delta and Origin Airport is CVG.
I will define a new data frame, called Delta_to_Cincy_delay30, as a filter of the flights_merged data frame where the values of the AIRLINE.y, ORIGIN_AIRPORT, and DEPARTURE_DELAY variables are equal to the stated requirements.
Delta_to_Cincy_delay30 <-filter(flights_merged, AIRLINE.y == "Delta Air Lines Inc.", ORIGIN_AIRPORT == "CVG", DEPARTURE_DELAY > 30)
head(Delta_to_Cincy_delay30)
## FLIGHT_DATE AIRLINE FLIGHT_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT
## 1 1/3/2015 DL 1741 CVG SLC
## 2 1/3/2015 DL 1833 CVG LAX
## 3 1/4/2015 DL 1854 CVG RSW
## 4 1/9/2015 DL 1503 CVG MCO
## 5 1/9/2015 DL 63 CVG BOS
## 6 1/10/2015 DL 1934 CVG ATL
## SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF
## 1 8:25:00 AM 11:41:00 AM 196 11 11:52:00 AM
## 2 4:46:00 PM 5:43:00 PM 57 7 5:50:00 PM
## 3 9:35:00 AM 10:20:00 AM 45 15 10:35:00 AM
## 4 1:55:00 PM 2:39:00 PM 44 15 2:54:00 PM
## 5 4:15:00 PM 5:32:00 PM 77 9 5:41:00 PM
## 6 2:19:00 PM 3:58:00 PM 99 8 4:06:00 PM
## SCHEDULED_TIME ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN
## 1 245 222 208 1450 1:20:00 AM 3
## 2 294 269 250 1900 1:00:00 AM 12
## 3 153 156 139 879 1:54:00 AM 2
## 4 134 131 109 757 1:43:00 AM 7
## 5 121 104 90 752 1:11:00 AM 5
## 6 99 74 60 373 1:06:00 AM 6
## SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED
## 1 10:30:00 AM 1:23:00 PM 173 0 0
## 2 6:40:00 PM 7:12:00 PM 32 0 0
## 3 12:08:00 PM 12:56:00 PM 48 0 0
## 4 4:09:00 PM 4:50:00 PM 41 0 0
## 5 6:16:00 PM 7:16:00 PM 60 0 0
## 6 3:58:00 PM 5:12:00 PM 74 0 0
## CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY
## 1 <NA> 0 0 173
## 2 <NA> 0 0 9
## 3 <NA> 48 0 0
## 4 <NA> 0 0 41
## 5 <NA> 0 0 0
## 6 <NA> 0 0 74
## LATE_AIRCRAFT_DELAY WEATHER_DELAY AIRLINE.y
## 1 0 0 Delta Air Lines Inc.
## 2 23 0 Delta Air Lines Inc.
## 3 0 0 Delta Air Lines Inc.
## 4 0 0 Delta Air Lines Inc.
## 5 60 0 Delta Air Lines Inc.
## 6 0 0 Delta Air Lines Inc.
## AIRPORT.x CITY.x STATE.x
## 1 Cincinnati/Northern Kentucky International Airport Covington KY
## 2 Cincinnati/Northern Kentucky International Airport Covington KY
## 3 Cincinnati/Northern Kentucky International Airport Covington KY
## 4 Cincinnati/Northern Kentucky International Airport Covington KY
## 5 Cincinnati/Northern Kentucky International Airport Covington KY
## 6 Cincinnati/Northern Kentucky International Airport Covington KY
## COUNTRY.x LATITUDE.x LONGITUDE.x
## 1 USA 39.04614 -84.66217
## 2 USA 39.04614 -84.66217
## 3 USA 39.04614 -84.66217
## 4 USA 39.04614 -84.66217
## 5 USA 39.04614 -84.66217
## 6 USA 39.04614 -84.66217
## AIRPORT.y CITY.y STATE.y
## 1 Salt Lake City International Airport Salt Lake City UT
## 2 Los Angeles International Airport Los Angeles CA
## 3 Southwest Florida International Airport Ft. Myers FL
## 4 Orlando International Airport Orlando FL
## 5 Gen. Edward Lawrence Logan International Airport Boston MA
## 6 Hartsfield-Jackson Atlanta International Airport Atlanta GA
## COUNTRY.y LATITUDE.y LONGITUDE.y TimeDifference
## 1 USA 40.78839 -111.97777 23
## 2 USA 33.94254 -118.40807 25
## 3 USA 26.53617 -81.75517 -3
## 4 USA 28.42889 -81.31603 3
## 5 USA 42.36435 -71.00518 17
## 6 USA 33.64044 -84.42694 25
tail(Delta_to_Cincy_delay30)
## FLIGHT_DATE AIRLINE FLIGHT_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT
## 65 3/5/2015 DL 1259 CVG ATL
## 66 3/6/2015 DL 2494 CVG ATL
## 67 3/6/2015 DL 1281 CVG ATL
## 68 3/6/2015 DL 1896 CVG MCO
## 69 3/6/2015 DL 2270 CVG SLC
## 70 3/6/2015 DL 2631 CVG PHX
## SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF
## 65 6:21:00 PM 7:15:00 PM 54 19 7:34:00 PM
## 66 9:30:00 AM 10:40:00 AM 70 11 10:51:00 AM
## 67 11:45:00 AM 1:02:00 PM 77 13 1:15:00 PM
## 68 4:15:00 PM 5:20:00 PM 65 14 5:34:00 PM
## 69 7:18:00 PM 7:49:00 PM 31 7 7:56:00 PM
## 70 8:15:00 PM 8:46:00 PM 31 13 8:59:00 PM
## SCHEDULED_TIME ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN
## 65 99 93 66 373 2:40:00 AM 8
## 66 90 87 70 373 1:01:00 AM 6
## 67 94 87 67 373 1:22:00 AM 7
## 68 126 133 111 757 1:25:00 AM 8
## 69 232 208 196 1450 2:12:00 AM 5
## 70 243 232 217 1569 2:36:00 AM 2
## SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED
## 65 8:00:00 PM 8:48:00 PM 48 0 0
## 66 11:00:00 AM 12:07:00 PM 67 0 0
## 67 1:19:00 PM 2:29:00 PM 70 0 0
## 68 6:21:00 PM 7:33:00 PM 72 0 0
## 69 9:10:00 PM 9:17:00 PM 7 0 0
## 70 10:18:00 PM 10:38:00 PM 20 0 0
## CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY
## 65 <NA> 0 0 4
## 66 <NA> 0 0 3
## 67 <NA> 0 0 5
## 68 <NA> 7 0 0
## 69 <NA> NA NA NA
## 70 <NA> 0 0 9
## LATE_AIRCRAFT_DELAY WEATHER_DELAY AIRLINE.y
## 65 44 0 Delta Air Lines Inc.
## 66 64 0 Delta Air Lines Inc.
## 67 65 0 Delta Air Lines Inc.
## 68 65 0 Delta Air Lines Inc.
## 69 NA NA Delta Air Lines Inc.
## 70 11 0 Delta Air Lines Inc.
## AIRPORT.x CITY.x STATE.x
## 65 Cincinnati/Northern Kentucky International Airport Covington KY
## 66 Cincinnati/Northern Kentucky International Airport Covington KY
## 67 Cincinnati/Northern Kentucky International Airport Covington KY
## 68 Cincinnati/Northern Kentucky International Airport Covington KY
## 69 Cincinnati/Northern Kentucky International Airport Covington KY
## 70 Cincinnati/Northern Kentucky International Airport Covington KY
## COUNTRY.x LATITUDE.x LONGITUDE.x
## 65 USA 39.04614 -84.66217
## 66 USA 39.04614 -84.66217
## 67 USA 39.04614 -84.66217
## 68 USA 39.04614 -84.66217
## 69 USA 39.04614 -84.66217
## 70 USA 39.04614 -84.66217
## AIRPORT.y CITY.y STATE.y
## 65 Hartsfield-Jackson Atlanta International Airport Atlanta GA
## 66 Hartsfield-Jackson Atlanta International Airport Atlanta GA
## 67 Hartsfield-Jackson Atlanta International Airport Atlanta GA
## 68 Orlando International Airport Orlando FL
## 69 Salt Lake City International Airport Salt Lake City UT
## 70 Phoenix Sky Harbor International Airport Phoenix AZ
## COUNTRY.y LATITUDE.y LONGITUDE.y TimeDifference
## 65 USA 33.64044 -84.42694 6
## 66 USA 33.64044 -84.42694 3
## 67 USA 33.64044 -84.42694 7
## 68 USA 28.42889 -81.31603 -7
## 69 USA 40.78839 -111.97777 24
## 70 USA 33.43417 -112.00806 11
Using the head() and tail() functions to print the first and last siz rows shows that the new data frame is a subset of the original with only the records where Airline is Delta and Origin Airport is CVG.
I will use the group_by() function and the summarize() function within it to take the mean of the DEPARTURE_DELAY category grouped by the AIRLINE.y levels. I will use the na.rm function to only take the mean of the records that have a value, and I will name this resulting tibble as “Delay_Airline”. Additionally, I will arrange the data frame in order from highest to lowest delay times.
Delay_Airline <- group_by(flights_merged, AIRLINE.y) %>%
summarize(Delay = mean(DEPARTURE_DELAY, na.rm = TRUE)) %>%
as.data.frame %>%
arrange(desc(Delay))
## `summarise()` ungrouping output (override with `.groups` argument)
Delay_Airline
## AIRLINE.y Delay
## 1 Frontier Airlines Inc. 23.88212
## 2 American Eagle Airlines Inc. 19.05938
## 3 Skywest Airlines Inc. 11.06769
## 4 Atlantic Southeast Airlines 10.36280
## 5 Delta Air Lines Inc. 6.08734
Printing out this data frame out shows the average delay time at each of the airlines. From this we can see that the average delay time is highest at Frontier Airlines Inc., followed by American Eagle Airlines Inc., Skywest Airlines Inc., and Atlantic Southeast Airlines, with Delta Air Lines Inc. having the lowest average delay time.
I will use the group_by() function and the summarize() function within it to take the mean of the DEPARTURE_DELAY category grouped by the ORIGIN_AIRPORT levels. I will use the na.rm function to only take the mean of the records that have a value, and I will name this resulting tibble as “Delay_Origin”. Additionally, I will arrange the data frame in order from highest to lowest delay times.
Delay_Origin <- group_by(flights_merged, ORIGIN_AIRPORT) %>%
summarize(Delay = mean(DEPARTURE_DELAY, na.rm = TRUE)) %>%
as.data.frame %>%
arrange(desc(Delay))
## `summarise()` ungrouping output (override with `.groups` argument)
head(Delay_Origin)
## ORIGIN_AIRPORT Delay
## 1 TTN 74.33333
## 2 RDU 70.75000
## 3 PIT 33.50000
## 4 JFK 27.37097
## 5 LAS 26.03846
## 6 LGA 25.55319
tail(Delay_Origin)
## ORIGIN_AIRPORT Delay
## 33 TPA -0.4576271
## 34 RSW -1.7083333
## 35 SEA -2.5000000
## 36 SAN -2.7500000
## 37 BDL -4.0000000
## 38 STL -4.4137931
Printing out the first and last 6 records of this tibble using the head() and tail() functions shows the origin airports with the highest and lowest average delay times. We can see that TTN, RDU, and PIT have the highest average delay times, and TPA, RSW, and SEA have the lowest average delay times.