For this assignment I decided to build a normalized database to contain all the values. I performed a basic comparison between two airlines, AM West and Alaska. The data was derived from the assignment pdf.
db_conn <-odbcConnect("Assignment5", rows_at_time =1)
if(db_conn == -1){
quit("no", 1)
}
print(db_conn)
## RODBC Connection 1
## Details:
## case=nochange
## DSN=Assignment5
## Description=Assignment5
## UID=
## Trusted_Connection=Yes
## WSID=DESKTOP-NQVO0T8
## DATABASE=Assignment5
Rather than create a query to pull the data from the individual tables I wrote a database view and queried the view. In this case the view is named AirlineFlights.
datatable(df_tstsql)
I selected the fields required for analysis and I built an plot show the on-time disparity between AM West and Alaska.
select(df_tstsql,Status,NumberOfMinutes,AirlineName)
dfOntime<-filter(df_tstsql,Status=="ontime")
ggplot(dfOntime, aes(x=CityName, y=NumberOfMinutes/1000))+
geom_point(aes(color=AirlineName))
##Summary of Alaska On-time Minutes to Selected Cities
dfOntimeAlaska<-filter(df_tstsql,Status=="ontime" & AirlineName== "Alaska" )
dfNumberOfOntimeAlaskaMinutes<-select(dfOntimeAlaska,NumberOfMinutes)
summary(dfNumberOfOntimeAlaskaMinutes)
## NumberOfMinutes
## Min. : 212.0
## 1st Qu.: 221.0
## Median : 497.0
## Mean : 654.8
## 3rd Qu.: 503.0
## Max. :1841.0
##Summary of AM West On-time Minutes to Selected Cities
dfOntimeAmWest<-filter(df_tstsql,Status=="ontime" & AirlineName== "AM West" )
dfNumberOfOntimeAMWestMinutes<-select(dfOntimeAmWest,NumberOfMinutes)
summary(dfNumberOfOntimeAMWestMinutes)
## NumberOfMinutes
## Min. : 201
## 1st Qu.: 320
## Median : 383
## Mean :1288
## 3rd Qu.: 694
## Max. :4840
##Below is the plot showing the delay disparity between AM West and Alaska Airlines
select(df_tstsql,Status,NumberOfMinutes,AirlineName)
dfDelayed<-filter(df_tstsql,Status=="delayed")
ggplot(dfDelayed, aes(x=CityName, y=NumberOfMinutes/100))+geom_point(aes(color=AirlineName))
##Summary of Alaska Delayed Minutes to Selected Cities
dfDelayedAlaska<-filter(df_tstsql,Status=="delayed" & AirlineName== "Alaska" )
dfNumberOfdfDelayedAlaskaMinutes<-select(dfDelayedAlaska,NumberOfMinutes)
summary(dfNumberOfdfDelayedAlaskaMinutes)
## NumberOfMinutes
## Min. : 12.0
## 1st Qu.: 20.0
## Median : 62.0
## Mean :100.2
## 3rd Qu.:102.0
## Max. :305.0
##Summary of AM West Delayed Minutes to Selected Cities
dfDelayedAmWest<-filter(df_tstsql,Status=="delayed" & AirlineName== "AM West" )
dfNumberOfdfDelayedAmWestMinutes<-select(dfDelayedAmWest,NumberOfMinutes)
summary(dfNumberOfdfDelayedAmWestMinutes)
## NumberOfMinutes
## Min. : 61.0
## 1st Qu.: 65.0
## Median :117.0
## Mean :157.4
## 3rd Qu.:129.0
## Max. :415.0
##Analysis
##Arrivals Based on the summary and the chart, the mean and max of on time arrival is greater with AM West. This is especially true with regards to Phoenix. Phoenix appears to be an outlier. Otherwise, the median arrivals is slightly greater with Alaska.
##Delays Based on the summary and the chart, the mean and max of on time delays is greater with AM West, again this is especially true with regards to Phoenix. Phoenix appears to be an outlier. But in the case of delays Alaska’s median is lower than AM West.