We have been asked to import data that has been provided in the image below. I have created a CSV file with the data and posted it to github.
I will begin by importing the CSV from GitHub.
file <- ("https://raw.githubusercontent.com/bharbans/DATA607_HW/main/Week%205%20Assignment/week5_assignment.csv")
airlineData <- read_csv(file)
Please find the data displayed below.
reactable(airlineData)
In the tidy data format, each observation should be in its own row and each variable in its own column. I will now convert the data from wide format to long by introducing a new variable, City. I will also use the spread function to add the status as two separate columns.
airlineDataTidy <- airlineData %>%
rename(Airline = AIRLINE)
airlineDataTidy <- airlineDataTidy %>%
gather( key = "City",value = "NumberOfFlights", 3:7, factor_key = TRUE)
airlineDataTidy <- airlineDataTidy %>%
spread(key = Status,value = NumberOfFlights)
airlineDataTidy <- airlineDataTidy %>%
mutate(totalNumberofFlights = delayed + `on time` , PropOnTime = `on time`/totalNumberofFlights)
reactable(airlineDataTidy)
Below I will compare the per-city on-time performance for both airlines.
airlineDataTidy %>%
select(City, Airline, PropOnTime) %>%
arrange(desc(PropOnTime), City ) %>%
reactable()
Displayed graphically:
plot1 <-airlineDataTidy %>%
ggplot(aes(x=City,y=`on time`,color=Airline)) + geom_point() + ggtitle("City vs Number of On-Time Flights (Fig1)")
plot2 <- airlineDataTidy %>%
ggplot(aes(x=City,y=PropOnTime,color=Airline)) + geom_point() + ggtitle("City vs Proportion of Flights On Time (Fig 2)")
grid.newpage()
grid.arrange(plot1,plot2, nrow= 2)
Below I will compare the overall on-time performance for both airlines. The data shows that Alaska Airlines on average has less on time flights than AM West.
airlineDataTidy %>%
group_by(Airline) %>%
summarise( "Number On Time"=sum(`on time`),ProportionOnTime = sum(`on time`)/sum(`on time`,delayed) ) %>%
arrange(desc(ProportionOnTime)) %>%
reactable()
It appears that even though Alaska Airlines has a higher proportion of flights on time for every city, AM West has the higher proportion of on-time flights. Why is this the case? AM West has the majority of its flights departing from Phoenix which has the highest number of on-time flights. This skews the overall number of flights in favor of AM West when looking at the data as a whole.
airlineDataTidy %>%
group_by(City, Airline) %>%
summarise( propOnTimeByCity = `on time`/totalNumberofFlights, totalNumberofFlights, .groups = "keep") %>%
arrange(desc(propOnTimeByCity) ) %>%
reactable()