Author: Romerl Elizes

Load Libraries

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Part I. Load Data

Data will be loaded from the CSV file I created for the project. I will parse the data as a csv file and execute summary function to show some important information of the raw data.

##   Airline  Status Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  ALASKA on time         497     221       212           503    1841
## 2  ALASKA delayed          62      12        20           102     305
## 3 AM WEST on time         694    4840       383           320     201
## 4 AM WEST delayed         117     415        65           129      61

Part II. Perform Basic Comparisons using TIDYR, DPLYR, and GGPLOT

Since I loaded the data successfully, I decided to use a combination of tidyr, dplyr, and ggplot in order to show graphical comparisons of specific questions I would like to ask: a comparison of delays vs. on time statuses by Airline and by City, a comparison of percent delays by airlines by city, and a comparison of total percent delays by city regardless of airlines.

A. Show Graph of Delays and On-Time Comparisons

Solution: Using a combination of tidyr and dplyr functions, I was able to arrange my data to show flights as a column in the new data frame. To help ease my graphing of the fields, I combined Airline and Status fields into one field. Using the new data frame I created, I created a bar chart that shows a visual comparison of delay and on-time statuses for each airline by city. Delayed statuses are colored a darker color while on-time statuses are colored a lighter color. Alaska Airlines uses color variations of blue while American West uses color variations of red.

Analysis: By analyzing the chart, you can see that Phoenix and Seattle encompass the most flights for the two airlines. Both airlines appear to have the least amount of delays vs. total on-time flights in Phoenix. American West appears to have a commanding presence in Phoenix as well. The city with the second most flights appears to be Seattle. A more important gauge is to find the percentage of delays vs. on flight statuses as they will give a more accurate picture of air flight delays between the two airlines for the listed cities.

##      AirlineStatus          City Flights
## 1   ALASKA-delayed   Los.Angeles      62
## 2   ALASKA-delayed       Phoenix      12
## 3   ALASKA-delayed     San.Diego      20
## 4   ALASKA-delayed San.Francisco     102
## 5   ALASKA-delayed       Seattle     305
## 6   ALASKA-on time   Los.Angeles     497
## 7   ALASKA-on time       Phoenix     221
## 8   ALASKA-on time     San.Diego     212
## 9   ALASKA-on time San.Francisco     503
## 10  ALASKA-on time       Seattle    1841
## 11 AM WEST-delayed   Los.Angeles     117
## 12 AM WEST-delayed       Phoenix     415
## 13 AM WEST-delayed     San.Diego      65
## 14 AM WEST-delayed San.Francisco     129
## 15 AM WEST-delayed       Seattle      61
## 16 AM WEST-on time   Los.Angeles     694
## 17 AM WEST-on time       Phoenix    4840
## 18 AM WEST-on time     San.Diego     383
## 19 AM WEST-on time San.Francisco     320
## 20 AM WEST-on time       Seattle     201
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

B. Find out Percent Delays of Each Airlines by City

Solution: Using a combination of tidyr and dplyr functions, I was able to arrange my data to show flights as a column in the new data frame. I used mutate to add two new calculated fields: AlaskaPercentDelay and AMWestPercentDelay. To help ease my graphing of the fields, I combined Airline and Status fields into one field. Using the new data frame I created, I created a bar chart that shows a visual comparison of the percentage delays for each airline by city.

Analysis: As I indicated in the last chart and now in this chart, both airlines have the lowest percentage of delays in Phoenix at 5.43% for Alaska and 8.57% for American West. Conversely, both airlines have the highest percentage of delays in San Francisco at 20.28% for Alaska and 40.31% for American West. While American West enjoys the best flight delay percentage at Phoenix, it is certainly not enjoying the worst flight delay percentage at San Francisco.

## Warning: attributes are not identical across measure variables;
## they will be dropped
##            City  ALT ALD  AMT AMD
## 2   Los.Angeles  497  62  694 117
## 3       Phoenix  221  12 4840 415
## 4     San.Diego  212  20  383  65
## 5 San.Francisco  503 102  320 129
## 6       Seattle 1841 305  201  61
##            City  ALT ALD  AMT AMD AlaskaPercentDelay AMWestPercentDelay
## 1   Los.Angeles  497  62  694 117          12.474849           16.85879
## 2       Phoenix  221  12 4840 415           5.429864            8.57438
## 3     San.Diego  212  20  383  65           9.433962           16.97128
## 4 San.Francisco  503 102  320 129          20.278330           40.31250
## 5       Seattle 1841 305  201  61          16.567083           30.34826

C. Find comparisons of Percent Delays of Each City Regardless of Airlines.

Solution: Using the calculations above, I created a mutate function that calculates total percent delay for each city regarldess of airline. I created a chart that shows the percent delays for each city regardless of airline.

Analysis: As in the charts above, the least percent delay of 8.44% was found to be in Phoenix and the worst percent of 28.07% delay was found to be in San Francisco. That’s pretty chilling. More than a quarter of all flights at San Francisco is going to be delayed. Los Angeles has a much larger population than San Francisco and yet it enjoys a lesser percent delay at 15.03%.

##            City  ALT ALD  AMT AMD PercentDelay
## 1   Los.Angeles  497  62  694 117    15.029387
## 2       Phoenix  221  12 4840 415     8.437068
## 3     San.Diego  212  20  383  65    14.285714
## 4 San.Francisco  503 102  320 129    28.068044
## 5       Seattle 1841 305  201  61    17.923604

References

[DAT] Data Wrangling with dplyr and tidyr Cheat Sheet Retrieved from website: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

[GGP] ggplot2 barplots : Quick start guide - R software and data visualization Retrieved from website: http://www.sthda.com/english/wiki/ggplot2-barplots-quick-start-guide-r-software-and-data-visualization

[GRO] **** Retrieved from website: https://stackoverflow.com/questions/37667630/grouped-bar-chart-r-using-counts-table

[HTM] HTML Color Codes Retrieved from website: https://htmlcolorcodes.com/