Tutorial HW

Author

Allan Maino Vieytes

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
library(viridis)
Loading required package: viridisLite
data(flights)
head(flights)
# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2013     1     1      517            515         2      830            819
2  2013     1     1      533            529         4      850            830
3  2013     1     1      542            540         2      923            850
4  2013     1     1      544            545        -1     1004           1022
5  2013     1     1      554            600        -6      812            837
6  2013     1     1      554            558        -4      740            728
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>
# x=Distance,y=FT,Bubble= Total delay time,Bubble Color= Carrier 

# Include at least one dplyr command (filter, sort, summarize, group_by, select, mutate, ….)
# Include labels for the x- and y-axes
# Include a title and caption for the data source
# Your plot must incorporate at least 2 colors
# Include a legend that indicates what the colors represent
# Write a brief paragraph that describes the visualization you have created and at least one aspect of the plot that you would like to highlight.

sort( table( flights$carrier) )  # Frequency table: Most common carriers

   OO    HA    YV    F9    AS    FL    VX    WN    9E    US    MQ    AA    DL 
   32   342   601   685   714  3260  5162 12275 18460 20536 26397 32729 48110 
   EV    B6    UA 
54173 54635 58665 
flights.2 <- flights %>% # piped flights dataset to mutate function
  mutate( total.delay.time = dep_delay + arr_delay ) %>% # mutated flights dataset to create total.delay.time with the two delay variables
  select( total.delay.time, carrier, distance, air_time ) %>% # selected the variables needed for the plot
  filter( carrier %in% c("UA", "EV", "B6") ) # filtered the carriers columnn to only contain the top three most common carriers
ggplot( data = flights.2, mapping = aes( y = total.delay.time, x = air_time, color = carrier ) ) +
  geom_point() + # plotted the points
  theme_bw() +  # added a white theme
  scale_color_manual( values = c( "red", "purple","khaki" ), # Changed legend colors and labels
                      breaks = c("UA", "B6", "EV"), 
                      labels = c("United", "JetBlue", "ExpressJet") ) +
  labs( caption = "Source: RITA, Bureau of transportation statistics, https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236", 
        title = "Total Delay Time vs. Total Air Time for Selected Airlines" ) + # Adds caption and title
  ylab( "Total Delay Time (MIN)" ) + # changes y axis label
  xlab( "Total Air Time (MIN)" ) # changes x axis label
Warning: Removed 4534 rows containing missing values (`geom_point()`).

My visualization illustrates the relationship between the total delay time and total air time for three selected airlines - United (UA), JetBlue (B6), and ExpressJet (EV). Each point on the scatter plot represents a flight, with the x-axis representing the total air time and the y-axis representing the total delay time in minutes. Each point is color-coded by airline. I would have preffered to do a bubble plot or a facet plot, but i did not find them satisfactory. Although there is no real conclusion that can be drawn from the visualization, you can see a slight uptik in the delay times for shorter flights. This may be due to there being more total flights between 0-400 on the x-axis. Overall i would like to play around with the dataset more and create more unique visualizations.