library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.0 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.1.8
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(nycflights13)
library(viridis)
## Loading required package: viridisLite
data("flights")
head(flights)
## # A tibble: 6 × 19
## year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
## <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
## 1 2013 1 1 517 515 2 830 819 11 UA
## 2 2013 1 1 533 529 4 850 830 20 UA
## 3 2013 1 1 542 540 2 923 850 33 AA
## 4 2013 1 1 544 545 -1 1004 1022 -18 B6
## 5 2013 1 1 554 600 -6 812 837 -25 DL
## 6 2013 1 1 554 558 -4 740 728 12 UA
## # … with 9 more variables: flight <int>, tailnum <chr>, origin <chr>,
## # dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
## # time_hour <dttm>, and abbreviated variable names ¹sched_dep_time,
## # ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
visits_by_dest <- flights %>%
group_by(dest) %>%
summarize(visits = n())
visits_by_dest <- visits_by_dest %>%
arrange(desc(visits)) %>%
head(10)
JFKplot <- visits_by_dest %>%
mutate(dest = factor(dest, levels = dest[order(visits, decreasing = TRUE)])) %>% # convert dest to a factor variable with levels in the order of the bar chart
ggplot(aes(x = visits, y = dest, fill = dest)) +
geom_bar(stat = "identity") +
scale_y_discrete(limits = rev(visits_by_dest$dest)) + # orders the bars from lowest amount of visit to most.
labs(title = "Top 10 Destinations from NY in 2013",
x = "Number of Visits",
y = "Destination",
fill = "Destination") +
coord_flip() # flips chart
JFKplot
theme_update(plot.title = element_text(hjust = 0.5))
dest_colors <- viridis_pal()(length(unique(visits_by_dest$dest)))
names(dest_colors) <- unique(visits_by_dest$dest)
JFKplot <- visits_by_dest %>%
mutate(dest = factor(dest, levels = dest[order(visits, decreasing = TRUE)])) %>%
ggplot(aes(x = visits, y = dest, fill = dest)) +
geom_bar( stat = "identity") +
scale_y_discrete(limits = rev(visits_by_dest$dest)) +
labs(title = "Top 10 Destinations from NY in 2013",
x = "Number of Visits",
y = "Destination",
fill = "Destination") +
coord_flip() +
scale_fill_manual(values = dest_colors)
JFKplot
jfk_dest <- flights %>%
filter(origin == "JFK") %>%
count(dest) %>%
arrange(desc(n)) %>%
head(20)
library(treemap)
treemap(jfk_dest, index = "dest", vSize = "n",
vColor = "n", type = "manual",
title = "Top 20 Destinations from JFK Airport",
palette = "RdYlBu")
The visualization I created for this assignment was a bar graph. I chose to plot the top 10 destinations from NY in 2013. While bar graphs are simple, they are easily understood and do a fantastic job of representing the data. A couple of interesting plot aspects are the colors, labels, and chart order. Rather than going with the preset colors, as shown, I chose to use the viridis palette, which made the chart more eye appealing. I centered the title as small to make the graph symmetrical and rearranged the order of the destinations in the x-axis and legend so that it would be listed least traveled to most.
In addition, I created a treemap of the top 20 destinations from JFK airport. The treemap is visually pleasing and straightforward. I originally was going to choose this over the bar graph. However, the treemap does not include an x and y axis, so it would not fulfill the requirements for the assignment.