The chart describes arrival delays for two airlines across five destinations. Yourtask is to:(1)Create a .CSV file (or optionally, a MySQL database!) that includes all of the information above. You’re encouraged to use a “wide” structure similar to how the information appears above, so that you can practice tidying and transformations as described below.(2)Read the information from your .CSV file into R, and use tidyranddplyr as needed to tidy and transform your data. (3)Perform analysis to compare the arrival delays for the two airlines.(4)Your codeshould be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis,and conclusions.Please include in your homework submission:TheURLto the .Rmd file in your GitHub repository.and TheURL for your rpubs.com web page.
library(RCurl)
## Loading required package: bitops
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
##
## complete
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library (knitr)
Numbersense <- read.table(text = getURL("https://raw.githubusercontent.com/Emahayz/Data-607-Class/master/Numbersense.txt"), header = T, sep = ",")
Remove all the NA’s
Numbersense1 <-Numbersense %>%
filter(!is.na(Phoenix)) %>%
rename(Airline=X,Arrival=X.1,"Los Angeles"=Los.Angeles,"San Diego"=San.Diego,"San Francisco"=San.Francisco)
Clean the Data by replacing the missing values (NA)
Numbersense1$Airline<- as.character(Numbersense1$Airline)
Numbersense1$Airline[Numbersense1$Airline==""]<- c("ALASKA","AM WEST")
View the tidy data
Numbersense1
## Airline Arrival Los Angeles Phoenix San Diego San Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 ALASKA delayed 62 12 20 102 305
## 3 AM WEST on time 694 4840 383 320 201
## 4 AM WEST delayed 117 415 65 129 61
Transforming the data to long format
write.csv(Numbersense1, file = "Numbersense.csv")
AirlineData <- Numbersense1 %>% gather(City, Frequency, "Los Angeles", "Phoenix", "San Diego", "San Francisco", "Seattle")
AirlineData
## Airline Arrival City Frequency
## 1 ALASKA on time Los Angeles 497
## 2 ALASKA delayed Los Angeles 62
## 3 AM WEST on time Los Angeles 694
## 4 AM WEST delayed Los Angeles 117
## 5 ALASKA on time Phoenix 221
## 6 ALASKA delayed Phoenix 12
## 7 AM WEST on time Phoenix 4840
## 8 AM WEST delayed Phoenix 415
## 9 ALASKA on time San Diego 212
## 10 ALASKA delayed San Diego 20
## 11 AM WEST on time San Diego 383
## 12 AM WEST delayed San Diego 65
## 13 ALASKA on time San Francisco 503
## 14 ALASKA delayed San Francisco 102
## 15 AM WEST on time San Francisco 320
## 16 AM WEST delayed San Francisco 129
## 17 ALASKA on time Seattle 1841
## 18 ALASKA delayed Seattle 305
## 19 AM WEST on time Seattle 201
## 20 AM WEST delayed Seattle 61
write.csv(AirlineData, file = "AirlineData.csv")
Analyze the AirlineData for the flights with no delays using visualization
ggplot() + geom_bar(aes(y = Frequency, x = City,fill =Airline), data = AirlineData,stat="identity")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
labs( x="City", y="Flights on Time ")
Transform the data to wide format
AirlineWide <- tidyr::spread(AirlineData, Arrival, Frequency)
AirlineWide
## Airline City delayed on time
## 1 ALASKA Los Angeles 62 497
## 2 ALASKA Phoenix 12 221
## 3 ALASKA San Diego 20 212
## 4 ALASKA San Francisco 102 503
## 5 ALASKA Seattle 305 1841
## 6 AM WEST Los Angeles 117 694
## 7 AM WEST Phoenix 415 4840
## 8 AM WEST San Diego 65 383
## 9 AM WEST San Francisco 129 320
## 10 AM WEST Seattle 61 201
Show summary statistics for Airline delays
Airline_delay <- AirlineWide %>% dplyr::group_by(Airline) %>%
dplyr::summarise(Max=max(delayed),
Min=min(delayed),
Average=mean(delayed),
Std.Dev=sd(delayed),
Median=median(delayed))
View Summary statistics for delayed flights
kable(Airline_delay)
| Airline | Max | Min | Average | Std.Dev | Median |
|---|---|---|---|---|---|
| ALASKA | 305 | 12 | 100.2 | 120.0175 | 62 |
| AM WEST | 415 | 61 | 157.4 | 147.1625 | 117 |
From the visualization, the data shows that AM West Airline has the highest number of on time flights for Phoenix while Alaska Airline has the highest number of on time flights for Seattle. The summary statistics for both airline shows that AM West has more delays than Alaska Airline with Average delay of about 157.