R Markdown
In order to tidy and transforming the given dataset a csv file was created using all the information with a wide structure.The CSV file is stored in GitHub.
[link] (https://github.com/maliat-hossain/Air-lane-Data-607)
## Warning: package 'wesanderson' was built under R version 4.0.4
csv file:
Flight_Information1<-read.csv("https://raw.githubusercontent.com/maliat-hossain/Flight-Information-Dataset/main/Flight%20Information1.csv",sep = ",")
Flight_Information1## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 Alaska On Time 497 221 212 503 1841
## 2 Delayed 62 12 20 102 305
## 3 NA NA NA NA NA
## 4 AmWest On Time 694 4840 383 320 201
## 5 Delayed 117 415 65 129 61
Column names are added to first columns.Empty cells were filled with the needed value.
colnames(Flight_Information1)[1:2] <- c("AirlineName","Status")
Flight_Information1[[1]][2] <- Flight_Information1[[1]][1]
Flight_Information1[[1]][5] <- Flight_Information1[[1]][4]
Flight_Information1 %>%
knitr::kable(caption = "Table 1:Flight Information Table")%>%
kableExtra::kable_styling(bootstrap_options = "striped")| AirlineName | Status | Los.Angeles | Phoenix | San.Diego | San.Francisco | Seattle |
|---|---|---|---|---|---|---|
| Alaska | On Time | 497 | 221 | 212 | 503 | 1841 |
| Alaska | Delayed | 62 | 12 | 20 | 102 | 305 |
| NA | NA | NA | NA | NA | ||
| AmWest | On Time | 694 | 4840 | 383 | 320 | 201 |
| AmWest | Delayed | 117 | 415 | 65 | 129 | 61 |
Gather function is implemented to tidy the data.
Airline data has been converted to numeric data to perform analysis,unnecessary punctuation marks were removed to make data more accessible for analysis.
Flight_Information1 <- filter(Flight_Information1,Status!="")
i <- 1
while(i <= length(Flight_Information1$city)) {
Flight_Information1 $city[i] <- gsub("\\."," ",Flight_Information1 $city[i])
Flight_Information1$TotalFlights[i] <- gsub(",","",Flight_Information1$TotalFlights[i])
i <- i + 1
}
Flight_Information1$TotalFlights <- as.numeric(Flight_Information1$TotalFlights)
Flight_Information1 %>%
knitr::kable(caption = "Table 2:Flight Information Preparation for Analysis")%>%
kableExtra::kable_styling(bootstrap_options = "striped")| AirlineName | Status | city | TotalFlights |
|---|---|---|---|
| Alaska | On Time | Los Angeles | 497 |
| Alaska | Delayed | Los Angeles | 62 |
| AmWest | On Time | Los Angeles | 694 |
| AmWest | Delayed | Los Angeles | 117 |
| Alaska | On Time | Phoenix | 221 |
| Alaska | Delayed | Phoenix | 12 |
| AmWest | On Time | Phoenix | 4840 |
| AmWest | Delayed | Phoenix | 415 |
| Alaska | On Time | San Diego | 212 |
| Alaska | Delayed | San Diego | 20 |
| AmWest | On Time | San Diego | 383 |
| AmWest | Delayed | San Diego | 65 |
| Alaska | On Time | San Francisco | 503 |
| Alaska | Delayed | San Francisco | 102 |
| AmWest | On Time | San Francisco | 320 |
| AmWest | Delayed | San Francisco | 129 |
| Alaska | On Time | Seattle | 1841 |
| Alaska | Delayed | Seattle | 305 |
| AmWest | On Time | Seattle | 201 |
| AmWest | Delayed | Seattle | 61 |
The rate of On Time and Delayed flights are calculated. The comparison of the rate of delay and on time flights bteween two airlines is necessary to quantify their performances.
## `summarise()` regrouping output by 'AirlineName' (override with `.groups` argument)
i <- 1
StatusRate <- vector()
while(i <= length(Flight_Information1$TotalFlights)){
ap <- Flight_Information1$AirlineName[i]
ct <- Flight_Information1$city[i]
StatusRate[i] <- round(Flight_Information1$TotalFlights[i]/Flight_Information_Analysis1$Total.Airlines1[Flight_Information_Analysis1$AirlineName==ap & Flight_Information_Analysis1$city == ct],3)
i <- i + 1
}
Flight_Information1 <- cbind(Flight_Information1,StatusRate)
Flight_Information1%>%
knitr::kable(caption = "Table 3:Flight Information Scenario")%>%
kableExtra::kable_styling(bootstrap_options = "striped")| AirlineName | Status | city | TotalFlights | StatusRate |
|---|---|---|---|---|
| Alaska | On Time | Los Angeles | 497 | 0.889 |
| Alaska | Delayed | Los Angeles | 62 | 0.111 |
| AmWest | On Time | Los Angeles | 694 | 0.856 |
| AmWest | Delayed | Los Angeles | 117 | 0.144 |
| Alaska | On Time | Phoenix | 221 | 0.948 |
| Alaska | Delayed | Phoenix | 12 | 0.052 |
| AmWest | On Time | Phoenix | 4840 | 0.921 |
| AmWest | Delayed | Phoenix | 415 | 0.079 |
| Alaska | On Time | San Diego | 212 | 0.914 |
| Alaska | Delayed | San Diego | 20 | 0.086 |
| AmWest | On Time | San Diego | 383 | 0.855 |
| AmWest | Delayed | San Diego | 65 | 0.145 |
| Alaska | On Time | San Francisco | 503 | 0.831 |
| Alaska | Delayed | San Francisco | 102 | 0.169 |
| AmWest | On Time | San Francisco | 320 | 0.713 |
| AmWest | Delayed | San Francisco | 129 | 0.287 |
| Alaska | On Time | Seattle | 1841 | 0.858 |
| Alaska | Delayed | Seattle | 305 | 0.142 |
| AmWest | On Time | Seattle | 201 | 0.767 |
| AmWest | Delayed | Seattle | 61 | 0.233 |
On time flights are extracted
OnTime_Flights1.1<-filter(Flight_Information1,Status =="On Time")
OnTime_Flights1.1 %>%
knitr::kable(caption = "Table 4:Flight Information Scenario with On Time Flights ")%>%
kableExtra::kable_styling(bootstrap_options = "striped")| AirlineName | Status | city | TotalFlights | StatusRate |
|---|---|---|---|---|
| Alaska | On Time | Los Angeles | 497 | 0.889 |
| AmWest | On Time | Los Angeles | 694 | 0.856 |
| Alaska | On Time | Phoenix | 221 | 0.948 |
| AmWest | On Time | Phoenix | 4840 | 0.921 |
| Alaska | On Time | San Diego | 212 | 0.914 |
| AmWest | On Time | San Diego | 383 | 0.855 |
| Alaska | On Time | San Francisco | 503 | 0.831 |
| AmWest | On Time | San Francisco | 320 | 0.713 |
| Alaska | On Time | Seattle | 1841 | 0.858 |
| AmWest | On Time | Seattle | 201 | 0.767 |
Delayed flights are extracted
Delayed_Flights1.1<-filter(Flight_Information1,Status=="Delayed")
Delayed_Flights1.1 %>%
knitr::kable(caption = "Table 5:Flight Information Scenario with Delayed Flights ")%>%
kableExtra::kable_styling(bootstrap_options = "striped")| AirlineName | Status | city | TotalFlights | StatusRate |
|---|---|---|---|---|
| Alaska | Delayed | Los Angeles | 62 | 0.111 |
| AmWest | Delayed | Los Angeles | 117 | 0.144 |
| Alaska | Delayed | Phoenix | 12 | 0.052 |
| AmWest | Delayed | Phoenix | 415 | 0.079 |
| Alaska | Delayed | San Diego | 20 | 0.086 |
| AmWest | Delayed | San Diego | 65 | 0.145 |
| Alaska | Delayed | San Francisco | 102 | 0.169 |
| AmWest | Delayed | San Francisco | 129 | 0.287 |
| Alaska | Delayed | Seattle | 305 | 0.142 |
| AmWest | Delayed | Seattle | 61 | 0.233 |
Graphs for OnTime and Delayed Flights
ggplot(data=Delayed_Flights1.1, aes(x=city,y=StatusRate,fill=AirlineName)) +
scale_fill_manual(values = wes_palette("FantasticFox1",5)[4:5])+
geom_bar(stat="identity", position=position_dodge()) +
ggtitle(" Analizing Rate of Delay by Airlines and City") +
xlab("city") +
ylab("rate") ggplot(data=OnTime_Flights1.1, aes(x=city,y=StatusRate,fill=AirlineName)) +
scale_fill_manual(values = wes_palette("Darjeeling2",5)[1:3])+
geom_bar(stat="identity", position=position_dodge()) +
ggtitle(" Analizing Rate of OnTime Flights by Airlines and City") +
xlab("city") +
ylab("rate")