This assignment is asking to take the chart provided below, that describes arrival delays for two airlines across five destinations, and perform an analysis to compare the arrival delays for the two airlines.
The chart is presented in an image that would ultimately need to be made into a file that can be read more easily and then tidied and transformed to properly perform the analysis.
My approach is to recreate the chart in excel to output a .csv file using the wide format that it is in. I will then load it into my github repository so that I can reference it in RStudio to be read and will use a combination of tidyr and dplyr to clean and transform the data. Once the data is in a better format, then I will compare the arrival delays for each airline across the various cities and use different summary statistics to accomplish this.
I took the data presented and recreated it in a .csv file to use in this assignment. Which can be found in my github repository (https://github.com/DRA-SPS27/DATA607-Week-5-Assignments/tree/main). I am presenting a glimpse of that data below:
url<-"https://raw.githubusercontent.com/DRA-SPS27/DATA607-Week-5-Assignments/refs/heads/main/D.Atherley%20-%20Airline%20Delays%20(%235A).csv"
untidy_airline<-read.csv(url)
glimpse(untidy_airline)
## Rows: 5
## Columns: 7
## $ X <chr> "ALASKA", "", "", "AM WEST", ""
## $ X.1 <chr> "on time", "delayed", "", "on time", "delayed"
## $ Los.Angeles <int> 497, 62, NA, 694, 117
## $ Phoenix <int> 221, 12, NA, 4840, 415
## $ San.Diego <int> 212, 20, NA, 383, 65
## $ San.Francisco <int> 503, 102, NA, 320, 129
## $ Seattle <int> 1841, 305, NA, 201, 61