Rpubs Link: http://rpubs.com/umais/data607_assignment5
Github Link: https://github.com/umais/DATA-607/tree/master/Week-5Assignment
In this assignment we will be using the TidyR and Dplyer packages to perform some analysis on the arrival delays for two Airlines.
The data set that we will be using will be from MySQL data source. First step would be to use the RMySQL package to connect to the MySQL database and retrieve the data.
mydb = dbConnect(MySQL(), user='root', password='Welcome@1', dbname='flights', host='localhost')
rs = dbSendQuery(mydb, "SELECT * FROM AirlineArrival;")
df=fetch(rs, n=-1)
head(df)
## Airline ArrivalStatus LosAngeles Phoenix San_Diego San_Francisco Seattle
## 1 Alaska On Time 497 221 212 503 1841
## 2 Alaska Delayed 62 12 20 102 305
## 3 AM West On Time 694 4840 383 320 201
## 4 AM West Delayed 117 415 65 129 61
As we can see from the results that the data set is in a wide format. What we would like to do in our first step is transform the data in to a long format so that we have the following columns Airline,ArrivalStatus,Cities, NumberOfFlights. We can achive this using the tidyR function called gather.
#Gather Function from tidyR
airlines2<- gather(df,Cities,NumberOfFlights,LosAngeles:Seattle)
head(airlines2)
## Airline ArrivalStatus Cities NumberOfFlights
## 1 Alaska On Time LosAngeles 497
## 2 Alaska Delayed LosAngeles 62
## 3 AM West On Time LosAngeles 694
## 4 AM West Delayed LosAngeles 117
## 5 Alaska On Time Phoenix 221
## 6 Alaska Delayed Phoenix 12
#Using the Filter Function from DPlyer
delayedFlights=filter(airlines2,ArrivalStatus=="Delayed")
delayedFlights
## Airline ArrivalStatus Cities NumberOfFlights
## 1 Alaska Delayed LosAngeles 62
## 2 AM West Delayed LosAngeles 117
## 3 Alaska Delayed Phoenix 12
## 4 AM West Delayed Phoenix 415
## 5 Alaska Delayed San_Diego 20
## 6 AM West Delayed San_Diego 65
## 7 Alaska Delayed San_Francisco 102
## 8 AM West Delayed San_Francisco 129
## 9 Alaska Delayed Seattle 305
## 10 AM West Delayed Seattle 61
We can see the comparison between how many flights are delayed by each ailrine
delayedFlights %>%
group_by(Airline,ArrivalStatus)%>%
summarise(total=sum(NumberOfFlights))
## Source: local data frame [2 x 3]
## Groups: Airline [?]
##
## Airline ArrivalStatus total
## <chr> <chr> <int>
## 1 Alaska Delayed 501
## 2 AM West Delayed 787