# loading the required packages with the nycflights13 datasetlibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)library(ggplot2)
flights_nona <- flights |>filter(!is.na(arr_delay)) # remove na's arr_delay (cleaning data)flights_nona6 <- flights_nona |>filter(month ==6)# filter the data to only the flights in Juneflights_nona6 <- flights_nona6 |>select(year, month, day, arr_delay, tailnum)head(flights_nona6)
# specifiy needed columns used for the visualization in flights_nona
Data Cleaning (Removing NA’s and Filtering to June)
weather_nona <- weather |>filter(!is.na(visib))# remove na's in the column visib (cleaning data)weather_nona6 <- weather_nona |>filter(month ==6)# filter the data to only the weather in Juneweather_nona6 <- weather_nona6 |>select(year, month, day, hour, visib)head(weather_nona6)
# specify needed columns used for the visualization in weather_nona
Grouping Information & Calculating Means
by_day_flights <- flights_nona6 |>group_by(day) |># group all tailnumbers togethersummarise(count =n(), # counts totals for each daydelay =mean(arr_delay), month ) # calculates the mean arrival delay
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'day'. You can override using the `.groups`
argument.
delay <-filter(by_day_flights, count >20) by_day_weather <- weather_nona6 |>group_by(day) |># group the weather by each day in Junesummarise(count =n(),vision =mean(visib) # find the mean visibility for each day in June )head(by_day_weather)
junenycflights <-merge(by_day_flights, by_day_weather, by ="day")#merging the datasets
Creating the Visualization
graph <-ggplot(junenycflights, aes(vision, delay, color = vision)) +geom_point(aes(size = vision), alpha = .1) +scale_color_gradient(low ="navy", high ="lightpink") +scale_size_area() +theme_bw() +labs(x ="Overall Visibility of Weather",y ="Average Flight Delay (Minutes)",caption ="FAA Aircraft registry",title ="Arrival Delays and Visibility for Flights in June 2013")graph
The graph exhibits a extremely slight correlation with flight visibility and average flight delays during June 2013.The x-axis represents how visible the weather is from a scale from 1-10. On the y-axis represents the amount of minutes late that a flight arrives. The lighter points represent flights that were in weather with better visibility, contrasting the darker points which represent flights with lower visibility.
As shown on the graph, the majority of the points are on the right side signaling that most flights had weather with good visibility. But there were some outliers found on the left side of the graph which have experienced poor visibility.
So lets answer the big question: Does weather visibility impact whether a flight is delayed or not? Sort of. As we can see, the flights that arrived the latest have had slightly lower visibility rankings compared to the flights that have arrived on time or slightly late. Not only this, the flights that arrived on time have had a 10/10 for visibility which indicates that flights with better visibility are highly likely to arrive on time.
But because there isn’t significant visual results, we cannot conclude that the visibility of the weather directly influences whether a flight arrives late or not. But with we can conclude that there is enough information to conclude that the visibility of the weather can contribute as a factor to whether a flight’s arrival is delayed or not.