# x=Distance,y=FT,Bubble= Total delay time,Bubble Color= Carrier # Include at least one dplyr command (filter, sort, summarize, group_by, select, mutate, ….)# Include labels for the x- and y-axes# Include a title and caption for the data source# Your plot must incorporate at least 2 colors# Include a legend that indicates what the colors represent# Write a brief paragraph that describes the visualization you have created and at least one aspect of the plot that you would like to highlight.sort( table( flights$carrier) ) # Frequency table: Most common carriers
OO HA YV F9 AS FL VX WN 9E US MQ AA DL
32 342 601 685 714 3260 5162 12275 18460 20536 26397 32729 48110
EV B6 UA
54173 54635 58665
flights.2<- flights %>%# piped flights dataset to mutate functionmutate( total.delay.time = dep_delay + arr_delay ) %>%# mutated flights dataset to create total.delay.time with the two delay variablesselect( total.delay.time, carrier, distance, air_time ) %>%# selected the variables needed for the plotfilter( carrier %in%c("UA", "EV", "B6") ) # filtered the carriers columnn to only contain the top three most common carriers
ggplot( data = flights.2, mapping =aes( y = total.delay.time, x = air_time, color = carrier ) ) +geom_point() +# plotted the pointstheme_bw() +# added a white themescale_color_manual( values =c( "red", "purple","khaki" ), # Changed legend colors and labelsbreaks =c("UA", "B6", "EV"), labels =c("United", "JetBlue", "ExpressJet") ) +labs( caption ="Source: RITA, Bureau of transportation statistics, https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236", title ="Total Delay Time vs. Total Air Time for Selected Airlines" ) +# Adds caption and titleylab( "Total Delay Time (MIN)" ) +# changes y axis labelxlab( "Total Air Time (MIN)" ) # changes x axis label
My visualization illustrates the relationship between the total delay time and total air time for three selected airlines - United (UA), JetBlue (B6), and ExpressJet (EV). Each point on the scatter plot represents a flight, with the x-axis representing the total air time and the y-axis representing the total delay time in minutes. Each point is color-coded by airline. I would have preffered to do a bubble plot or a facet plot, but i did not find them satisfactory. Although there is no real conclusion that can be drawn from the visualization, you can see a slight uptik in the delay times for shorter flights. This may be due to there being more total flights between 0-400 on the x-axis. Overall i would like to play around with the dataset more and create more unique visualizations.