Getting the Data into R

First step is to obtain the data, which was in ‘csv’ format. I manually inputted the data into an excel sheet and then saved it under a .csv.

airline_data <- read.csv("airline_arrival.csv")
airline_data
##         X     X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  ALASKA on time         497     221       212           503    1841
## 2         delayed          62      12        20           102     305
## 3                          NA      NA        NA            NA      NA
## 4 AM WEST on time         694    4840       383           320     201
## 5         delayed         117     415        65           129      61

Looking inside the Data

There were some problems, given that the first 2 rows had strange anomalies inside and the city names were labeled with a period in between the words. We took the dataframe and transformed it so that the cities were on the left.

long_airline_data <- as.data.frame(t(airline_data))
long_airline_data
##                    V1      V2   V3      V4      V5
## X              ALASKA              AM WEST        
## X.1           on time delayed      on time delayed
## Los.Angeles       497      62 <NA>     694     117
## Phoenix           221      12 <NA>    4840     415
## San.Diego         212      20 <NA>     383      65
## San.Francisco     503     102 <NA>     320     129
## Seattle          1841     305 <NA>     201      61

Manipulating the Data

We take the defunct city names and change them using the trimws() function and the chartr() function to get rid of the periods.

colnames(long_airline_data)
## [1] "V1" "V2" "V3" "V4" "V5"
city_names <- rownames(long_airline_data)[3:7]
city_names <- trimws(chartr(".", " ", city_names))

We managed to navigate the data and singled out that rows 3-7 were the ones that we neneded. We took those rows, as a number and saved them as data1, data2, data4, and data5. We are able to add them together as well as to a little math to come out with the fraction.

long_airline_data[3:7, 'V1']
## [1] "497"  " 221" "212"  "503"  "1841"
long_airline_data[3:7, 'V2']
## [1] " 62"  "  12" " 20"  "102"  " 305"
as.numeric(as.character(long_airline_data[3:7, 'V1']))
## [1]  497  221  212  503 1841
data1 <- as.numeric(as.character(long_airline_data[3:7, 'V1']))
data2 <- as.numeric(as.character(long_airline_data[3:7, 'V2']))
data3 <- data1 + data2
data3
## [1]  559  233  232  605 2146
fraction <- round(100*(data2/data3), digits = 0)
data4 <- as.numeric(as.character(long_airline_data[3:7, 'V4']))
data5 <- as.numeric(as.character(long_airline_data[3:7, 'V5']))
data6 <- data4 + data5
data6
## [1]  811 5255  448  449  262
fraction2 <- round(100*(data5/data6), digits = 0)

Adding column names

Add the data on the dataframe with the new columnn names. As well as reformat the df with better text.

long_airline_data <- long_airline_data[3:7,] %>%
  mutate('Alaska Total' = data3)
long_airline_data <- long_airline_data %>%
  mutate('Alaska Delayed Percentage' = fraction)
long_airline_data <-  long_airline_data %>%
  mutate('AM West Total' = data6)
long_airline_data <-  long_airline_data %>%
  mutate('AM West Delayed Percentage' = fraction2)

colnames(long_airline_data)[1] = "Alaska On Time"
colnames(long_airline_data)[2] = "Alaska Delayed"
colnames(long_airline_data)[4] = "AM West On Time"
colnames(long_airline_data)[5] = "AM West Delayed"

rownames(long_airline_data) = city_names

long_airline_data[-3]
##               Alaska On Time Alaska Delayed AM West On Time AM West Delayed
## Los Angeles              497             62             694             117
## Phoenix                  221             12            4840             415
## San Diego                212             20             383              65
## San Francisco            503            102             320             129
## Seattle                 1841            305             201              61
##               Alaska Total Alaska Delayed Percentage AM West Total
## Los Angeles            559                        11           811
## Phoenix                233                         5          5255
## San Diego              232                         9           448
## San Francisco          605                        17           449
## Seattle               2146                        14           262
##               AM West Delayed Percentage
## Los Angeles                           14
## Phoenix                                8
## San Diego                             15
## San Francisco                         29
## Seattle                               23
Alaska <- long_airline_data['Alaska Delayed Percentage']
AM_west <-long_airline_data['AM West Delayed Percentage']

Graphing the results

Graphing was a slightly problamatic, in that there were 2 ‘y’ variables for every ‘x’ variable. So we used the technique of grabbing our data and putting it into a new df, transposing it, and then plotting the results.

new_df <- data.frame(City = city_names,
                    second_column = Alaska, 
                    third_column = AM_west)

MX <- t(as.matrix(new_df[-1]))
colnames(MX) <- new_df$City

colours = c("red","blue")

barplot(MX,main="Delayed by Percentage",ylab="Percentage", xlab="City",beside = TRUE, col = colours, ylim=c(0,max(MX)*1.3))

# to add a box around the plot
box()

# add a legend
legend('topright',fill=colours,legend=c('Alaska','AM West'))

Analysis comparing the arrival delays of two airlines.

We can see with the data given that AM West has a higher percentage rate of delays compared to Alaska in all airports. Particularly in San Francisco where it’s up nearly 15 percent! Seattle is also has a large 10 percentage point difference. Overall, I would go with Alaska airlines if we’re going by delayed arrivals only.