Creating the Scatter plot showing distance vs arrival delay by airline
p1 <- flights_clean |>ggplot(aes(x = distance, y = arr_delay, color = name)) +geom_point(alpha =0.5) +labs(x ="Flight Distance (miles)",y ="Arrival Delay (minutes)",title ="Relationship Between Flight Distance and Arrival Delay",color ="Airline",caption ="Data Source: nycflights23 dataset" ) +scale_color_manual(values =c("American Airlines Inc."="blue","Delta Air Lines Inc."="red","United Air Lines Inc."="green","JetBlue Airways"="purple","Southwest Airlines Co."="orange" )) +theme_minimal()
p1
Description of visualization
I created a scatter plot that presents the relationship between flight distance and arrival delay for flights departing from New York City airports. Each point in this plot shows one flight. The x axis shows flight distance in miles while the y axis shows arrival delay in minutes. Different colors is used to represent different airlines which is easy to compare.
This visualization helps examine whether longer flights experience greater arrival delays. Most observations appear close to the horizontal area around zero delay which indicates many flights arrive near their scheduled time but some flights show large delays which appear higher on the plot.
I used theme_minimal in ggplot. This theme removes unnecessary background elements and keeps the visualization simple which improves readability. The scatter plot code also uses color to separate airlines which creates a clear legend and helps viewers compare delay quickly. The use of many individual points allows the viewer to observe both overall trends and unusual flights that stand apart from the main distribution of delays.