First step is to obtain the data, which was in ‘csv’ format. I manually inputted the data into an excel sheet and then saved it under a .csv.
airline_data <- read.csv("airline_arrival.csv")
airline_data
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 3 NA NA NA NA NA
## 4 AM WEST on time 694 4840 383 320 201
## 5 delayed 117 415 65 129 61
There were some problems, given that the first 2 rows had strange anomalies inside and the city names were labeled with a period in between the words. We took the dataframe and transformed it so that the cities were on the left.
long_airline_data <- as.data.frame(t(airline_data))
long_airline_data
## V1 V2 V3 V4 V5
## X ALASKA AM WEST
## X.1 on time delayed on time delayed
## Los.Angeles 497 62 <NA> 694 117
## Phoenix 221 12 <NA> 4840 415
## San.Diego 212 20 <NA> 383 65
## San.Francisco 503 102 <NA> 320 129
## Seattle 1841 305 <NA> 201 61
We take the defunct city names and change them using the trimws() function and the chartr() function to get rid of the periods.
colnames(long_airline_data)
## [1] "V1" "V2" "V3" "V4" "V5"
city_names <- rownames(long_airline_data)[3:7]
city_names <- trimws(chartr(".", " ", city_names))
We managed to navigate the data and singled out that rows 3-7 were the ones that we neneded. We took those rows, as a number and saved them as data1, data2, data4, and data5. We are able to add them together as well as to a little math to come out with the fraction.
long_airline_data[3:7, 'V1']
## [1] "497" " 221" "212" "503" "1841"
long_airline_data[3:7, 'V2']
## [1] " 62" " 12" " 20" "102" " 305"
as.numeric(as.character(long_airline_data[3:7, 'V1']))
## [1] 497 221 212 503 1841
data1 <- as.numeric(as.character(long_airline_data[3:7, 'V1']))
data2 <- as.numeric(as.character(long_airline_data[3:7, 'V2']))
data3 <- data1 + data2
data3
## [1] 559 233 232 605 2146
fraction <- round(100*(data2/data3), digits = 0)
data4 <- as.numeric(as.character(long_airline_data[3:7, 'V4']))
data5 <- as.numeric(as.character(long_airline_data[3:7, 'V5']))
data6 <- data4 + data5
data6
## [1] 811 5255 448 449 262
fraction2 <- round(100*(data5/data6), digits = 0)
Add the data on the dataframe with the new columnn names. As well as reformat the df with better text.
long_airline_data <- long_airline_data[3:7,] %>%
mutate('Alaska Total' = data3)
long_airline_data <- long_airline_data %>%
mutate('Alaska Delayed Percentage' = fraction)
long_airline_data <- long_airline_data %>%
mutate('AM West Total' = data6)
long_airline_data <- long_airline_data %>%
mutate('AM West Delayed Percentage' = fraction2)
colnames(long_airline_data)[1] = "Alaska On Time"
colnames(long_airline_data)[2] = "Alaska Delayed"
colnames(long_airline_data)[4] = "AM West On Time"
colnames(long_airline_data)[5] = "AM West Delayed"
rownames(long_airline_data) = city_names
long_airline_data[-3]
## Alaska On Time Alaska Delayed AM West On Time AM West Delayed
## Los Angeles 497 62 694 117
## Phoenix 221 12 4840 415
## San Diego 212 20 383 65
## San Francisco 503 102 320 129
## Seattle 1841 305 201 61
## Alaska Total Alaska Delayed Percentage AM West Total
## Los Angeles 559 11 811
## Phoenix 233 5 5255
## San Diego 232 9 448
## San Francisco 605 17 449
## Seattle 2146 14 262
## AM West Delayed Percentage
## Los Angeles 14
## Phoenix 8
## San Diego 15
## San Francisco 29
## Seattle 23
Alaska <- long_airline_data['Alaska Delayed Percentage']
AM_west <-long_airline_data['AM West Delayed Percentage']
Graphing was a slightly problamatic, in that there were 2 ‘y’ variables for every ‘x’ variable. So we used the technique of grabbing our data and putting it into a new df, transposing it, and then plotting the results.
new_df <- data.frame(City = city_names,
second_column = Alaska,
third_column = AM_west)
MX <- t(as.matrix(new_df[-1]))
colnames(MX) <- new_df$City
colours = c("red","blue")
barplot(MX,main="Delayed by Percentage",ylab="Percentage", xlab="City",beside = TRUE, col = colours, ylim=c(0,max(MX)*1.3))
# to add a box around the plot
box()
# add a legend
legend('topright',fill=colours,legend=c('Alaska','AM West'))
We can see with the data given that AM West has a higher percentage rate of delays compared to Alaska in all airports. Particularly in San Francisco where it’s up nearly 15 percent! Seattle is also has a large 10 percentage point difference. Overall, I would go with Alaska airlines if we’re going by delayed arrivals only.