The chart describes arrival delays for two airlines across five destinations. Yourtask is to:(1)Create a .CSV file (or optionally, a MySQL database!) that includes all of the information above. You’re encouraged to use a “wide” structure similar to how the information appears above, so that you can practice tidying and transformations as described below.(2)Read the information from your .CSV file into R, and use tidyranddplyr as needed to tidy and transform your data. (3)Perform analysis to compare the arrival delays for the two airlines.(4)Your codeshould be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis,and conclusions.Please include in your homework submission:TheURLto the .Rmd file in your GitHub repository.and TheURL for your rpubs.com web page.

library(RCurl) 
## Loading required package: bitops
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
## 
##     complete
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library (knitr)
Numbersense <- read.table(text = getURL("https://raw.githubusercontent.com/Emahayz/Data-607-Class/master/Numbersense.txt"), header = T, sep = ",")

Remove all the NA’s

Numbersense1 <-Numbersense %>%  
  filter(!is.na(Phoenix)) %>% 
  rename(Airline=X,Arrival=X.1,"Los Angeles"=Los.Angeles,"San Diego"=San.Diego,"San Francisco"=San.Francisco)

Clean the Data by replacing the missing values (NA)

Numbersense1$Airline<- as.character(Numbersense1$Airline)
Numbersense1$Airline[Numbersense1$Airline==""]<- c("ALASKA","AM WEST")

View the tidy data

Numbersense1
##   Airline Arrival Los Angeles Phoenix San Diego San Francisco Seattle
## 1  ALASKA on time         497     221       212           503    1841
## 2  ALASKA delayed          62      12        20           102     305
## 3 AM WEST on time         694    4840       383           320     201
## 4 AM WEST delayed         117     415        65           129      61

Transforming the data to long format

write.csv(Numbersense1, file = "Numbersense.csv")
AirlineData <- Numbersense1 %>% gather(City, Frequency, "Los Angeles", "Phoenix", "San Diego", "San Francisco", "Seattle")
AirlineData
##    Airline Arrival          City Frequency
## 1   ALASKA on time   Los Angeles       497
## 2   ALASKA delayed   Los Angeles        62
## 3  AM WEST on time   Los Angeles       694
## 4  AM WEST delayed   Los Angeles       117
## 5   ALASKA on time       Phoenix       221
## 6   ALASKA delayed       Phoenix        12
## 7  AM WEST on time       Phoenix      4840
## 8  AM WEST delayed       Phoenix       415
## 9   ALASKA on time     San Diego       212
## 10  ALASKA delayed     San Diego        20
## 11 AM WEST on time     San Diego       383
## 12 AM WEST delayed     San Diego        65
## 13  ALASKA on time San Francisco       503
## 14  ALASKA delayed San Francisco       102
## 15 AM WEST on time San Francisco       320
## 16 AM WEST delayed San Francisco       129
## 17  ALASKA on time       Seattle      1841
## 18  ALASKA delayed       Seattle       305
## 19 AM WEST on time       Seattle       201
## 20 AM WEST delayed       Seattle        61
write.csv(AirlineData, file = "AirlineData.csv")

Analyze the AirlineData for the flights with no delays using visualization

ggplot() + geom_bar(aes(y = Frequency, x = City,fill =Airline), data = AirlineData,stat="identity")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs( x="City", y="Flights on Time ")

Transform the data to wide format

AirlineWide <- tidyr::spread(AirlineData, Arrival, Frequency)
AirlineWide
##    Airline          City delayed on time
## 1   ALASKA   Los Angeles      62     497
## 2   ALASKA       Phoenix      12     221
## 3   ALASKA     San Diego      20     212
## 4   ALASKA San Francisco     102     503
## 5   ALASKA       Seattle     305    1841
## 6  AM WEST   Los Angeles     117     694
## 7  AM WEST       Phoenix     415    4840
## 8  AM WEST     San Diego      65     383
## 9  AM WEST San Francisco     129     320
## 10 AM WEST       Seattle      61     201

Show summary statistics for Airline delays

Airline_delay <- AirlineWide %>% dplyr::group_by(Airline) %>%
  dplyr::summarise(Max=max(delayed), 
                   Min=min(delayed),
                   Average=mean(delayed),
                   Std.Dev=sd(delayed),
                   Median=median(delayed))

View Summary statistics for delayed flights

kable(Airline_delay)
Airline Max Min Average Std.Dev Median
ALASKA 305 12 100.2 120.0175 62
AM WEST 415 61 157.4 147.1625 117

Conclusion

From the visualization, the data shows that AM West Airline has the highest number of on time flights for Phoenix while Alaska Airline has the highest number of on time flights for Seattle. The summary statistics for both airline shows that AM West has more delays than Alaska Airline with Average delay of about 157.