ggplot(filtered_airlines, aes(x=avg_dist, y= name)) +geom_col(aes(fill= avg_dist)) +scale_fill_continuous() +labs(x="Average Distance in miles", y ="Airlines",title ="Side-by-Side Comparison of Distance by airline bar plot",caption="New york flights package data")
My data visualization is a side-by-side comparison of distance by 4 different airlines. Using 4 different airlines is significant because originally there is around 14 different airlines in flight three dataset. So I filtered flights 3 into filtered airlines dataset so that it would only include 4 airlines. Then since the first air line in flights three was a major outlier I sliced it from two to five rather than one to four because now my data is more closer to each other. Using the dplyr command slice actually really helped my graph be easeir to view becasue without it the y axis would be loaded with airlines so much to the point where it would be very difficult to read each one individually. Additionally, I added labels to the x and y axes. Then I added a caption and title for the data source. Then I included two colors into my plot. Include a legend that shows what each color represents in terms of distance in miles.