This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
First we pull the data from personal Github, data was made with excel and uploaded to github as a CSV.
Data <- read.csv("https://raw.githubusercontent.com/sokkarbishoy/DATA607/main/Flights%20wk5.csv")
print(Data)
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 ALASKA delayed 62 12 20 102 305
## 3 AM WEST on time 694 4840 383 320 201
## 4 AM WEST delayed 117 415 65 129 61
Install packages
in the code below, I installed the packages tidyverse which includes tidyr amd dplyr packages. I started by
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)
library(dplyr)
#code to transform the states columns into a new variale called destination
Data2 <-Data %>%
pivot_longer( cols = c('Los.Angeles', 'Phoenix', 'San.Diego', 'San.Francisco', 'Seattle'), names_to = "Destination", values_to = "Frequency")
head(Data2)
## # A tibble: 6 × 4
## X X.1 Destination Frequency
## <chr> <chr> <chr> <int>
## 1 ALASKA "on time" Los.Angeles 497
## 2 ALASKA "on time" Phoenix 221
## 3 ALASKA "on time" San.Diego 212
## 4 ALASKA "on time" San.Francisco 503
## 5 ALASKA "on time" Seattle 1841
## 6 ALASKA "delayed " Los.Angeles 62
In the code below I rename the missing columns names with Airline and Status and remove the “.” in the states with a space.
colnames(Data2)[colnames(Data2) %in% c("X", "X.1")] <- c("Airline", "Status")
head(Data2)
## # A tibble: 6 × 4
## Airline Status Destination Frequency
## <chr> <chr> <chr> <int>
## 1 ALASKA "on time" Los.Angeles 497
## 2 ALASKA "on time" Phoenix 221
## 3 ALASKA "on time" San.Diego 212
## 4 ALASKA "on time" San.Francisco 503
## 5 ALASKA "on time" Seattle 1841
## 6 ALASKA "delayed " Los.Angeles 62
To remove the . between states I used the following code.
Data2$Destination <- gsub("\\.", " ", Data2$Destination)
head(Data2)
## # A tibble: 6 × 4
## Airline Status Destination Frequency
## <chr> <chr> <chr> <int>
## 1 ALASKA "on time" Los Angeles 497
## 2 ALASKA "on time" Phoenix 221
## 3 ALASKA "on time" San Diego 212
## 4 ALASKA "on time" San Francisco 503
## 5 ALASKA "on time" Seattle 1841
## 6 ALASKA "delayed " Los Angeles 62
str(Data2)
## tibble [20 × 4] (S3: tbl_df/tbl/data.frame)
## $ Airline : chr [1:20] "ALASKA" "ALASKA" "ALASKA" "ALASKA" ...
## $ Status : chr [1:20] "on time" "on time" "on time" "on time" ...
## $ Destination: chr [1:20] "Los Angeles" "Phoenix" "San Diego" "San Francisco" ...
## $ Frequency : int [1:20] 497 221 212 503 1841 62 12 20 102 305 ...
summary(Data2)
## Airline Status Destination Frequency
## Length:20 Length:20 Length:20 Min. : 12.00
## Class :character Class :character Class :character 1st Qu.: 92.75
## Mode :character Mode :character Mode :character Median : 216.50
## Mean : 550.00
## 3rd Qu.: 435.50
## Max. :4840.00
average_flights <- mean(Data2$Frequency)
average_flights
## [1] 550
We can find the create a ggplot to highlight the destinations and how
often they could be delayed.
ggplot(Data2, aes(x= Destination, y= Frequency, fill = Status))+
geom_bar(stat = "identity")+
labs(title = "Flights by Destination and Status",
x = "Destination",
y = "Flights",
fill = "Status")
To compare the number of flights of both Airlines mentioned and the number of delayed can be presented using the next plot.
It appears that AM west have higher number of flights and more thus more delayed flighthass.
ggplot(Data2, aes(x= Airline, y = Frequency, fill = Status))+
geom_bar(stat = "identity")