NYC Frights Homework

Author

Kittim

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
library(RColorBrewer)
data(flights)
glimpse(airlines)
Rows: 16
Columns: 2
$ carrier <chr> "9E", "AA", "AS", "B6", "DL", "EV", "F9", "FL", "HA", "MQ", "O…
$ name    <chr> "Endeavor Air Inc.", "American Airlines Inc.", "Alaska Airline…
glimpse(AirPassengers)
 Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
AirPassengers
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432
airmiles
Time Series:
Start = 1937 
End = 1960 
Frequency = 1 
 [1]   412   480   683  1052  1385  1418  1634  2178  3362  5948  6109  5981
[13]  6753  8003 10566 12528 14760 16769 19819 22362 25340 25343 29269 30514
flights
# A tibble: 336,776 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2013     1     1      517            515         2      830            819
 2  2013     1     1      533            529         4      850            830
 3  2013     1     1      542            540         2      923            850
 4  2013     1     1      544            545        -1     1004           1022
 5  2013     1     1      554            600        -6      812            837
 6  2013     1     1      554            558        -4      740            728
 7  2013     1     1      555            600        -5      913            854
 8  2013     1     1      557            600        -3      709            723
 9  2013     1     1      557            600        -3      838            846
10  2013     1     1      558            600        -2      753            745
# ℹ 336,766 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Removing the NA observations

flights_nona <- flights %>%
  filter(!is.na(origin), !is.na(tailnum), !is.na(dep_time) & !is.na(tailnum))

Grouping the data by origin and summarizing flights by count

we are able to get a closer look to the airports in New york and the number of flights originating in each one in 2013.

by_origin <- flights_nona %>%
  group_by(origin,) %>%  #grouping flights to establish the dep location.group_by(origin,) %>%  #grouping flights to establish the dep location.
  summarise(count = n())

Creating a plot to showcase origin of various flights in the three airports

ggplot(data = by_origin, aes(x = origin, y = count, fill = origin)) + 
  geom_col(alpha = 0.7)+
  labs(x = "origin", y = "count", 
       title = "Flights departing from various Airports in New york")

The plot illustrates the situation of the three airports in New York in 2013, specifically showcasing the count or number of flights that originated from each of them during that year. Notably, EWR emerges as the airport with the highest number of flights, suggesting a high level of activity and busyness compared to the other two airports. This substantial volume of flights may further imply that EWR covers a larger geographical area, making it a prominent hub for air travel in the region during the year 2013.