I used the summarize function to count the number of active tobacco retailers in each zip code. I chose zip code because other identifiers, like neighborhood name, are written inconsistently. Long Island City is written as both “Long Is City” and “Long Island City”. Zip code is consistent as far as I can tell. I then created a column plot using this data in two ways: ordered by zip code and ordered by number of active tobacco retailers. Zip code 11368, which includes all of Corona except for a couple blocks, Flushing Meadows Park, and Willet’s Point (in other words the only populated part of this zip code is Corona) has the second highest number of active tobacco retailers at 73. The only zip code with more active tobacco retailers is 11385, Ridgewood and Glendale, with 104 active retailers.
Load standard libraries
library(tidyverse)
library(tidycensus)
library(RSocrata)
library(sf)
library(knitr)
library(DT)
library(viridis)
library(scales)
options(scipen = 999)
Use read.socrata to import the dataset of active tobacco retailers in NYC, from NYC OpenData
raw_active_tobacco_retailers <- read.socrata("https://data.cityofnewyork.us/resource/adw8-wvxb.csv")
Filter to Queens
queens_tobacco_retailers <- raw_active_tobacco_retailers %>%
filter(Address.Borough == "Queens")
queens_tobacco_retailers %>%
arrange(desc(Address.ZIP)) %>%
datatable()
Calculate summary statistics for each zip code in Queens
queens_tobacco_stats <- queens_tobacco_retailers %>%
group_by(Address.ZIP) %>%
summarise(active_tobacco_retailers = n()) %>%
mutate(zip_code = as.character(Address.ZIP))
queens_tobacco_stats %>%
datatable()
Summarize for Queens on the Borough level
queens_boro_tobacco_stats <- queens_tobacco_retailers %>%
group_by(Address.Borough) %>%
summarise(active_tobacco_retailers = n())
queens_boro_tobacco_stats %>%
datatable()
Create a bar plot of active tobacco retailers in each zip code in Queens, ordered by zip code
ggplot() +
geom_col(data = queens_tobacco_stats,
mapping = aes(x = zip_code, y = active_tobacco_retailers)) +
labs(x = "Zip Code",
y = "Number of Active Tobacco Retailers",
title = "Active Tobacco Retailers in Queens, by Zip Code",
caption = "Source: NYC Open Data") +
theme(axis.text.x = element_text(angle = 90, hjust=1, vjust = 0.5))
The same bar plot, reordered by number of active tobacco retailers
ggplot() +
geom_col(data = queens_tobacco_stats,
mapping = aes(x = reorder(zip_code, active_tobacco_retailers), y = active_tobacco_retailers)) +
labs(x = "Zip Code",
y = "Number of Active Tobacco Retailers",
title = "Active Tobacco Retailers in Queens, by Zip Code",
caption = "Source: NYC Open Data") +
theme(axis.text.x = element_text(angle = 90, hjust=1, vjust = 0.5))