library(tidyverse)
library(nycflights13)
library(psych)
head(flights)
## # A tibble: 6 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
jan1 <- flights |>
filter(month == 1 & day == 1)
jan1
## # A tibble: 842 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## 7 2013 1 1 555 600 -5 913 854
## 8 2013 1 1 557 600 -3 709 723
## 9 2013 1 1 557 600 -3 838 846
## 10 2013 1 1 558 600 -2 753 745
## # ℹ 832 more rows
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
jan1 <- flights |>
filter(month == 1 & day == 1)
jan1 |>
ggplot() +
geom_point(aes(x = sched_dep_time, y = dep_delay, color = origin)) +
xlab("Scheduled Departure Time") +
ylab("Departure Delay") +
ggtitle("Scatterplot of Departure Delays for Each Departure Time on January 1, 2013")
## Warning: Removed 4 rows containing missing values (`geom_point()`).
jan1 <- flights |>
filter(month == 1 & day == 1)
jan1 |>
ggplot() +
geom_smooth(aes(x = sched_dep_time, y = dep_delay, color = origin)) +
xlab("Scheduled Departure Time") +
ylab("Departure Delay") +
ggtitle("Scatterplot of Departure Delays for Each Departure Time on January 1, 2013")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).
Your assignment is to create one plot to visualize one aspect of this dataset. The plot may be any type we have covered so far in this class (bargraphs, scatterplots, boxplots, histograms, treemaps, heatmaps, streamgraphs, or alluvials)
This scatterplot visualization indicates a subset of the flights dataset, namely the scheduled departure time and departure delay for all of the domestic flights which departed from the New York City metropolitan area on January 1, 2013. The legend shows three colors–one color assigned to each of the three distinct origins– EWR (red, Newark Liberty International Airport) JFK (green, John F. Kennedy International Airport), and LGA (blue, LaGuardia Airport). Included also are labels for the x-axis (“Scheduled Departure Time), y-axis (”Dearture Delay”), and the title (“Scatterplot of Departure Delays for Each Departure Time on January 1, 2013”). It appears that greater departure delays are clustered between 3:00 and 6:00 pm. Especially after also viewing the geom_smooth, EWR shows the greatest delays and LGA shows the smallest.