Load data into a data frame from SQL Server database that includes data about arriving flights in the United States. The source code to populate the tables can be found here: Flights DB
Once we load the data from the database into R, tidying/cleaning of the data needs to be performed.
Using data visualizations, use GG Plot to compare the ‘On-Time’ and ‘Delayed’ flights.
First we install the required libraries/packages
package 'odbc' successfully unpacked and MD5 sums checked
package 'RODBC' successfully unpacked and MD5 sums checked
package 'sqldf' successfully unpacked and MD5 sums checked
package 'ggplot2' successfully unpacked and MD5 sums checked
package 'plotly' successfully unpacked and MD5 sums checked
package 'dplyr' successfully unpacked and MD5 sums checked
package 'hrbrthemes' successfully unpacked and MD5 sums checked
Installing viridis [0.6.2] ...
OK [linked cache]
Now we establish a connection to the database.
con <-odbcDriverConnect(connection ="Driver= {SQL Server Native Client 11.0};server=localhost;database=DBex;trusted_connection=yes")
Next, we query the data that we want:
queryresult <-sqlQuery(con, "select Airlines.Airline_Name, Arriving_Flights.City, Arriving_Flights.Flight_Status, Arriving_Flights.Flight_Time_minuts, Arriving_Flights.ID FROM Arriving_Flights INNER JOIN Airlines ON Arriving_Flights.ID = Airlines.ID;")
Now that we have the data loaded, we can commence the cleaning process.
Flight time minutes are mislabeled and should be renamed the number of flights.
We also need to convert character variables into factor in order to conduct analysis
We also do not want the ID column.
queryresult$ID <-NULLnames(queryresult)[4] ="Number of Flights"
Since the data is now considered tidy, we can now compare the ‘On-Time’ and delayed groups between the two airlines.
Alaska <-subset(queryresult, Airline_Name =='Alaska')AM.West <-subset(queryresult, Airline_Name =='AM West')Alaska %>%ggplot( aes(x=City, y=`Number of Flights`, group=Flight_Status, color=Flight_Status)) +geom_line() +coord_flip() +scale_color_viridis(discrete =TRUE) +ggtitle("Alaska Airlines: On-Time vs Delayed") +theme_ipsum() +ylab("Number of Flights")AM.West |>ggplot( aes(x=City, y=`Number of Flights`, group=Flight_Status, color=Flight_Status)) +geom_line() +coord_flip() +scale_color_viridis(discrete =TRUE) +ggtitle("AM West Airlines: On-Time vs Delayed") +theme_ipsum() +ylab("Number of Flights")
Number of Flights for Alaska airlines: On-Time vs Delayed
Number of Flights for AM West airlines: On-Time vs Delayed
Flight Status comparison for AM West and Alaska Airlines.
Results
From the graphs we can see that a few takeaways are immediately apparent:
They were a significantly greater amount of flights that were on time rather than delayed– when looking at either airline (not surprising).
For AM West: the number of on-time flights and delayed flights are comparable, except when looking at Phoenix where there is a large number of on-time flights. This could be due to the airline getting a lot of traffic in this area. A similar observation can be seen with Alaska airlines and traveling to Seattle.
Due to a large number of flights going to Seattle (for Alaska) and Phoenix (for AM West), the number of delayed flights that travel to these cities is also fairly large.