HW 7, Michael Simms

Load the libraries and view the “flights” dataset

library(tidyverse)
library(nycflights13)
library(psych)
head(flights)
## # A tibble: 6 × 19
##    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
## 1  2013     1     1      517            515         2      830            819
## 2  2013     1     1      533            529         4      850            830
## 3  2013     1     1      542            540         2      923            850
## 4  2013     1     1      544            545        -1     1004           1022
## 5  2013     1     1      554            600        -6      812            837
## 6  2013     1     1      554            558        -4      740            728
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>

Now create one data visualization with this dataset

jan1 <- flights |>
  filter(month == 1 & day == 1)
jan1 
## # A tibble: 842 × 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1      517            515         2      830            819
##  2  2013     1     1      533            529         4      850            830
##  3  2013     1     1      542            540         2      923            850
##  4  2013     1     1      544            545        -1     1004           1022
##  5  2013     1     1      554            600        -6      812            837
##  6  2013     1     1      554            558        -4      740            728
##  7  2013     1     1      555            600        -5      913            854
##  8  2013     1     1      557            600        -3      709            723
##  9  2013     1     1      557            600        -3      838            846
## 10  2013     1     1      558            600        -2      753            745
## # ℹ 832 more rows
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>
jan1 <- flights |>
  filter(month == 1 & day == 1)
jan1 |>
ggplot() +
    geom_point(aes(x = sched_dep_time, y = dep_delay, color = origin)) +
  xlab("Scheduled Departure Time") +
  ylab("Departure Delay") +
  ggtitle("Scatterplot of Departure Delays for Each Departure Time on January 1, 2013")
## Warning: Removed 4 rows containing missing values (`geom_point()`).

jan1 <- flights |>
  filter(month == 1 & day == 1)
jan1 |>
ggplot() +
    geom_smooth(aes(x = sched_dep_time, y = dep_delay, color = origin)) +
  xlab("Scheduled Departure Time") +
  ylab("Departure Delay") +
  ggtitle("Scatterplot of Departure Delays for Each Departure Time on January 1, 2013")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).

Your assignment is to create one plot to visualize one aspect of this dataset. The plot may be any type we have covered so far in this class (bargraphs, scatterplots, boxplots, histograms, treemaps, heatmaps, streamgraphs, or alluvials)

Requirements for the plot:

  1. Include at least one dplyr command (filter, sort, summarize, group_by, select, mutate, ….)
  2. Include labels for the x- and y-axes
  3. Include a title
  4. Your plot must incorporate at least 2 colors
  5. Include a legend that indicates what the colors represent
  6. Write a brief paragraph that describes the visualization you have created and at least one aspect of the plot that you would like to highlight.

Paragraph Summary

This scatterplot visualization indicates a subset of the flights dataset, namely the scheduled departure time and departure delay for all of the domestic flights which departed from the New York City metropolitan area on January 1, 2013. The legend shows three colors–one color assigned to each of the three distinct origins– EWR (red, Newark Liberty International Airport) JFK (green, John F. Kennedy International Airport), and LGA (blue, LaGuardia Airport). Included also are labels for the x-axis (“Scheduled Departure Time), y-axis (”Dearture Delay”), and the title (“Scatterplot of Departure Delays for Each Departure Time on January 1, 2013”). It appears that greater departure delays are clustered between 3:00 and 6:00 pm. Especially after also viewing the geom_smooth, EWR shows the greatest delays and LGA shows the smallest.