This assignment is about creating a csv file and then transforming the data to provide analysis on two airlines flight perfmance on key Western airports.
My first step was to create the data file in Excel and save it as a csv file.
I then imported the csv file into r.
df <- read.csv("Book1.csv", header = TRUE)
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on 'Book1.csv'
head(df)
## Airline Flight.Status Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 Alaska On Time 497 221 212 503 1841
## 2 Alaska Delayed 62 12 20 102 305
## 3 AM West On Time 694 4840 383 320 201
## 4 AM West Delayed 117 415 65 129 61
df2 <- as_tibble(df)
head(df2)
## # A tibble: 4 x 7
## Airline Flight.Status Los.Angeles Phoenix San.Diego San.Francisco Seattle
## <fct> <fct> <int> <int> <int> <int> <int>
## 1 Alaska On Time 497 221 212 503 1841
## 2 Alaska Delayed 62 12 20 102 305
## 3 AM West On Time 694 4840 383 320 201
## 4 AM West Delayed 117 415 65 129 61
Once the data was imported, it needed to be transformed in order to provide analysis. I’m flipping the data from wide to long in order to do calculations easier.
df3 <- gather(df2, "Arrival City", "Flight Count", 3:7)
head(df3, 10)
## # A tibble: 10 x 4
## Airline Flight.Status `Arrival City` `Flight Count`
## <fct> <fct> <chr> <int>
## 1 Alaska On Time Los.Angeles 497
## 2 Alaska Delayed Los.Angeles 62
## 3 AM West On Time Los.Angeles 694
## 4 AM West Delayed Los.Angeles 117
## 5 Alaska On Time Phoenix 221
## 6 Alaska Delayed Phoenix 12
## 7 AM West On Time Phoenix 4840
## 8 AM West Delayed Phoenix 415
## 9 Alaska On Time San.Diego 212
## 10 Alaska Delayed San.Diego 20
df4 <- spread(df3, "Flight.Status", "Flight Count")
head(df4,10)
## # A tibble: 10 x 4
## Airline `Arrival City` Delayed `On Time`
## <fct> <chr> <int> <int>
## 1 Alaska Los.Angeles 62 497
## 2 Alaska Phoenix 12 221
## 3 Alaska San.Diego 20 212
## 4 Alaska San.Francisco 102 503
## 5 Alaska Seattle 305 1841
## 6 AM West Los.Angeles 117 694
## 7 AM West Phoenix 415 4840
## 8 AM West San.Diego 65 383
## 9 AM West San.Francisco 129 320
## 10 AM West Seattle 61 201
df5 <- df4 %>%
mutate("Total Flights" = (`Delayed` + `On Time`),
"On Time Rate" = (`On Time` / (`On Time` + `Delayed`))
)
head(df5,15)
## # A tibble: 10 x 6
## Airline `Arrival City` Delayed `On Time` `Total Flights` `On Time Rate`
## <fct> <chr> <int> <int> <int> <dbl>
## 1 Alaska Los.Angeles 62 497 559 0.889
## 2 Alaska Phoenix 12 221 233 0.948
## 3 Alaska San.Diego 20 212 232 0.914
## 4 Alaska San.Francisco 102 503 605 0.831
## 5 Alaska Seattle 305 1841 2146 0.858
## 6 AM West Los.Angeles 117 694 811 0.856
## 7 AM West Phoenix 415 4840 5255 0.921
## 8 AM West San.Diego 65 383 448 0.855
## 9 AM West San.Francisco 129 320 449 0.713
## 10 AM West Seattle 61 201 262 0.767
Next it was time to look at the arrival delays of the two airlines by city and overall.
Alaska Airlines Total Flights and On Time Rate We can see that Alaska Airlines had 3,775 total flights and an on time rate of 88.8%
#Alaska Airlines Total Flights and On Time Rate
df5 %>%
filter(`Airline` == "Alaska") %>%
summarise(total = sum(`Total Flights`), OnTimeRate = mean(`On Time Rate`))
## # A tibble: 1 x 2
## total OnTimeRate
## <int> <dbl>
## 1 3775 0.888
AM West had nearly double the flights as 7,225 flights and an on time rate that was below Alaska at 82.2%.
#AM West Total Flights and On Time Rate
df5 %>%
filter(`Airline` == "AM West") %>%
summarise(total = sum(`Total Flights`), OnTimeRate = mean(`On Time Rate`))
## # A tibble: 1 x 2
## total OnTimeRate
## <int> <dbl>
## 1 7225 0.822
Sorting (arrange) by city allows us to easily compare the on time rates by arrival city. We see that Alaska had a better average on time rates in all five cities in this data set.
df5 %>%
arrange(`Arrival City`)
## # A tibble: 10 x 6
## Airline `Arrival City` Delayed `On Time` `Total Flights` `On Time Rate`
## <fct> <chr> <int> <int> <int> <dbl>
## 1 Alaska Los.Angeles 62 497 559 0.889
## 2 AM West Los.Angeles 117 694 811 0.856
## 3 Alaska Phoenix 12 221 233 0.948
## 4 AM West Phoenix 415 4840 5255 0.921
## 5 Alaska San.Diego 20 212 232 0.914
## 6 AM West San.Diego 65 383 448 0.855
## 7 Alaska San.Francisco 102 503 605 0.831
## 8 AM West San.Francisco 129 320 449 0.713
## 9 Alaska Seattle 305 1841 2146 0.858
## 10 AM West Seattle 61 201 262 0.767
ggplot(data = df5) +
geom_point(mapping = aes(x = `Airline`, y = `On Time Rate`)) +
facet_wrap(~ `Arrival City`, nrow = 2) +
labs(title = "On Time Rates by Airline by City") +
theme_bw()
The data clearly shows that Alaska Airlines had better on time rates at all five cities. It’s 88.8% on time rate was ahead of AM West’s on time rate of 82.2%. San Francisco had the worst on time rate overall of the five airports.