Built a scatter plot that shows the relationship between months and flights taken
ggplot(flights_by_month, aes(x =factor(month, levels =1:12,labels =c("Jan", "Feb", "Mar", "Apr","May", "Jun", "Jul", "Aug","Sep", "Oct", "Nov", "Dec")), y = flights)) +geom_point(aes(color = color)) +labs(x ="Months", y ="Number of Flights",title ="Number of Flights by Month (2023)", caption ="RITA, Bureau of transportation statistics, https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236") +scale_color_manual(values =c("> 37000"="red", "< 37000"="purple"))
The scatter plot that I’ve created shows the relationship between the months of 2023 and the number of flights taken from New York’s three largest airports within each of those months. The points that are colored purple indicate the months that had less than 37000 flights. The points that are colored red indicate the months that had more than 37000 flights. Before making this visualization, I thought June, July, August, and maybe December would’ve had the most flights since people are typically on vacation during these months; however, it turned out to be March, April, and May. I was quite surprised because I wasn’t expecting these three months to have had the most flights, which is why I decided to highlight them in this scatter plot by making them red. I was able to use the mutate function to change the colors of these specific plot points through the help of ChatGPT.