This assignment is to select a Tidy Tuesday posting and use the data to demonstrate some of the methods of visualization you learned in the Datacamp course on best practices.
Which Tidy Tuesday item did you select (date)? What is it about? Which variables are you using for your demonstration. You will need a quantitative variable and a categorical variable.
Place your answer here.
I choose the Tidy Tuesday of 2021-01-05, Transit Cost Project. https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-01-05/readme.md
This dataset has information about transit system construction projects such as cost, length, and start year. The data dictionary is below. I choose to use the number of stations as my quantitative variable and year as caregorical variable.
Load the libraries you need and the data from Tidy Tuesday.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
transit_cost <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-01-05/transit_cost.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_character(),
## e = col_double(),
## rr = col_double(),
## length = col_double(),
## tunnel = col_double(),
## stations = col_double(),
## cost = col_double(),
## year = col_double(),
## ppp_rate = col_double(),
## cost_km_millions = col_double()
## )
## i Use `spec()` for the full column specifications.
transit_cost <- transit_cost %>%
filter(start_year > 2016) #filter to only recent projects
transit_cost$year <- as.factor(transit_cost$year)
Create a histogram for your quantitative variable. Display three alternative choices of bin width.
# Place your code here.
ggplot(transit_cost, aes(x=stations)) + geom_histogram(binwidth = 3, color = "white") + labs(title = "Distribution of Stations Variable") #binwidth = 3
## Warning: Removed 1 rows containing non-finite values (stat_bin).
ggplot(transit_cost, aes(x=stations)) + geom_histogram(binwidth = 6, color = "white") + labs(title = "Distribution of Stations Variable")#binwidth = 6
## Warning: Removed 1 rows containing non-finite values (stat_bin).
ggplot(transit_cost, aes(x=stations)) + geom_histogram(binwidth = 10, color = "white") + labs(title = "Distribution of Stations Variable") #binwidth = 10
## Warning: Removed 1 rows containing non-finite values (stat_bin).
Create a density plot with a rug for your quantitative variable. Show three different alternatives for the parameter bw.
# Place your code here.
ggplot(transit_cost, aes(x=stations)) + geom_density(bw=5) + geom_rug(alpha=0.1) + labs(title = "Distribution of Stations Variable")#bw = 5
## Warning: Removed 1 rows containing non-finite values (stat_density).
ggplot(transit_cost, aes(x=stations)) + geom_density(bw=10) + geom_rug(alpha=0.1) + labs(title = "Distribution of Stations Variable") #bw = 10
## Warning: Removed 1 rows containing non-finite values (stat_density).
ggplot(transit_cost, aes(x=stations)) + geom_density(bw=15) + geom_rug(alpha=0.1) + labs(title = "Distribution of Stations Variable")#bw = 15
## Warning: Removed 1 rows containing non-finite values (stat_density).
Create an appropriate visualization of the relationship between your categorical and quantitative variables.
# Place your code here.
ggplot(transit_cost, aes(x=year, y=stations, fill = year)) + geom_boxplot(alpha = 0.1) + geom_jitter(width=0.25, alpha = 0.25) + coord_flip() + labs(title = "Stations on Transit Lines Each Year")
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1 rows containing missing values (geom_point).
Provide an alternative answer to Problem 5.
# Place your code here.
ggplot(transit_cost, aes(x=year, y=stations, fill = year)) + geom_violin() + coord_flip() + labs(title = "Stations on Transit Lines Each Year")
## Warning: Removed 1 rows containing non-finite values (stat_ydensity).