Airlines and Status

Introduction

In this assignment, we have been provided with a small messy data that describes delays for two airloned across five destinations. Our goal is to

  • Create a.csv file that includes all the information provided

  • Read the information from the .csv file into R, and use tidyr and dplyr as needed to tidy transform the data.

  • Perform analysis to compare the arroval delays for two airlines.

Create .csv file, load it in R and use tidyr and dplyr for tidy transform of the data.

        X     X.1 Los.Angeles Phoenix San.diego San.Francisco Seattle
1  ALASKA on time         497     221       212           503    1841
2         delayed          62      12        20           102     305
3                          NA      NA        NA            NA      NA
4 AM WEST on time         694    4840       383           320     201
5         delayed         117     415        65           129      61

As we can see the data has missing information, empty row values and other structural issues. Our goal is to bring this dataframe into tidy data standards, which is making each column a variable.

We need to fill the rows that has no values with the appropriate values

        X     X.1 Los.Angeles Phoenix San.diego San.Francisco Seattle
1  ALASKA on time         497     221       212           503    1841
2  ALASKA delayed          62      12        20           102     305
3                          NA      NA        NA            NA      NA
4 AM WEST on time         694    4840       383           320     201
5 AM WEST delayed         117     415        65           129      61

Remove row 3 as it has NA values that we do not need

        X     X.1 Los.Angeles Phoenix San.diego San.Francisco Seattle
1  ALASKA on time         497     221       212           503    1841
2  ALASKA delayed          62      12        20           102     305
4 AM WEST on time         694    4840       383           320     201
5 AM WEST delayed         117     415        65           129      61

Assign Row numbers properly to the data frame

        X     X.1 Los.Angeles Phoenix San.diego San.Francisco Seattle
1  ALASKA on time         497     221       212           503    1841
2  ALASKA delayed          62      12        20           102     305
3 AM WEST on time         694    4840       383           320     201
4 AM WEST delayed         117     415        65           129      61

Assign Column Names properly to the data frame

  Airlines  Status Los Angeles Phoenix San Diego San Francisco Seattle
1   ALASKA on time         497     221       212           503    1841
2   ALASKA delayed          62      12        20           102     305
3  AM WEST on time         694    4840       383           320     201
4  AM WEST delayed         117     415        65           129      61

Transform the data

   Airlines  Status   Destination Flights
1    ALASKA on time   Los Angeles     497
2    ALASKA delayed   Los Angeles      62
3   AM WEST on time   Los Angeles     694
4   AM WEST delayed   Los Angeles     117
5    ALASKA on time       Phoenix     221
6    ALASKA delayed       Phoenix      12
7   AM WEST on time       Phoenix    4840
8   AM WEST delayed       Phoenix     415
9    ALASKA on time     San Diego     212
10   ALASKA delayed     San Diego      20
11  AM WEST on time     San Diego     383
12  AM WEST delayed     San Diego      65
13   ALASKA on time San Francisco     503
14   ALASKA delayed San Francisco     102
15  AM WEST on time San Francisco     320
16  AM WEST delayed San Francisco     129
17   ALASKA on time       Seattle    1841
18   ALASKA delayed       Seattle     305
 [ reached 'max' / getOption("max.print") -- omitted 2 rows ]
   Airlines   Destination delayed on time
1    ALASKA   Los Angeles      62     497
2    ALASKA       Phoenix      12     221
3    ALASKA     San Diego      20     212
4    ALASKA San Francisco     102     503
5    ALASKA       Seattle     305    1841
6   AM WEST   Los Angeles     117     694
7   AM WEST       Phoenix     415    4840
8   AM WEST     San Diego      65     383
9   AM WEST San Francisco     129     320
10  AM WEST       Seattle      61     201

Perform Analysis to compare arrival delays for the two airlines.

# A tibble: 4 x 3
# Groups:   Airlines [2]
  Airlines Status  Flights
  <chr>    <chr>     <int>
1 ALASKA   delayed     501
2 ALASKA   on time    3274
3 AM WEST  delayed     787
4 AM WEST  on time    6438

When we compare the on time and delayed distribution between the Airlines and Flights, we see that their each has similar distribituon Alaska having 86% of their flights on time and AM West 89% of their flights on time.

We can also look at the distribution of ontime and delayed Status for Destination.

There are more signifacntly more delayed flights in Alaska Airlines for Seattle destination, and there are significantly more delayed flights for Phoenix destination for AM West Airlines. This might be due to the amount of flights that both airlines offer to those particular destinations however based on this analysis it might make sense to take Alaska Airlines for Phoenix flights and AM West for Seattle.

Conclusion

Based on our analysis, we can see that the Status of Flights being two categories; On time or Delayed, are similar between the two Airlines, AM West flights being slighlty higher chance to be ontime. However, when we look at the destinations, we can see that Airlines Status of Flights are different between the two airlines for each destination.

2019-09-26