NYFlights13 Assignment

Author

Renato Chavez

Published

February 28, 2023

First visualization on my own ! I will be using the pre-built data set, nycflights13

Load the libraries

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.4     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(nycflights13)
library(RColorBrewer)
data(flights)

First, I removed NA values from the departure times and delay times.

flights_nona <- flights %>% 
    filter(!is.na(dep_time) & !is.na(arr_delay))

Using group_by, and summarize

by_tailnum <- flights_nona %>%
  group_by(tailnum, month) %>%
  summarise(count = n(), 
            deptime = mean(dep_time),
            delay = mean(arr_delay))
`summarise()` has grouped output by 'tailnum'. You can override using the
`.groups` argument.
delay <- filter(by_tailnum, count > 20)
delay$month <- as.factor(delay$month)
delay2 <- delay %>%
  filter(month %in% c(1, 2, 3))

Delay according to the departure times of the flights

ggplot(delay2, aes(deptime, delay, color = month)) + 
  geom_point(aes(size = count), alpha = 1/2) + 
  ggtitle("Delay of flights according to the time of departure") + 
  xlab("Departure time (hhmm)") + 
  ylab("Delay time (minutes)") + 
  geom_smooth() + 
  scale_size_area() 
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Description

In this visualization, we can see the delay times according to the departure times. Notice that many flights between 10 am and 4 pm tend to have up to 30 minutes of delays. This is interesting consider that it is a very convenient time to fly because one does not have to wake up very early or stay up until late at night. I have also considered the first three months of the year to filter this data frame, so that we can visualize it better. I would also like to highlight that, while the period of time between 10 am and 4 pm have more delays, it also has a considerable concentration of flights that depart earlier than expected.