Methods

I used the summarize function to count the number of active tobacco retailers in each zip code. I chose zip code because other identifiers, like neighborhood name, are written inconsistently. Long Island City is written as both “Long Is City” and “Long Island City”. Zip code is consistent as far as I can tell. I then created a column plot using this data in two ways: ordered by zip code and ordered by number of active tobacco retailers. Zip code 11368, which includes all of Corona except for a couple blocks, Flushing Meadows Park, and Willet’s Point (in other words the only populated part of this zip code is Corona) has the second highest number of active tobacco retailers at 73. The only zip code with more active tobacco retailers is 11385, Ridgewood and Glendale, with 104 active retailers.

Script

Load Libraries

Load standard libraries

library(tidyverse)
library(tidycensus)
library(RSocrata)
library(sf)
library(knitr)
library(DT)
library(viridis)
library(scales)
options(scipen = 999)

Import and Filter Dataset

Use read.socrata to import the dataset of active tobacco retailers in NYC, from NYC OpenData

raw_active_tobacco_retailers <- read.socrata("https://data.cityofnewyork.us/resource/adw8-wvxb.csv")

Filter to Queens

queens_tobacco_retailers <- raw_active_tobacco_retailers %>% 
  filter(Address.Borough == "Queens")

queens_tobacco_retailers %>% 
  arrange(desc(Address.ZIP)) %>% 
  datatable()

Statistics

Calculate summary statistics for each zip code in Queens

queens_tobacco_stats <- queens_tobacco_retailers %>% 
  group_by(Address.ZIP) %>% 
  summarise(active_tobacco_retailers = n()) %>% 
  mutate(zip_code = as.character(Address.ZIP))

queens_tobacco_stats %>% 
  datatable()

Summarize for Queens on the Borough level

queens_boro_tobacco_stats <- queens_tobacco_retailers %>% 
  group_by(Address.Borough) %>% 
  summarise(active_tobacco_retailers = n())

queens_boro_tobacco_stats %>% 
  datatable()

Plot

Create a bar plot of active tobacco retailers in each zip code in Queens, ordered by zip code

ggplot() +
  geom_col(data = queens_tobacco_stats,
           mapping = aes(x = zip_code, y = active_tobacco_retailers)) +
  labs(x = "Zip Code",
       y = "Number of Active Tobacco Retailers",
       title = "Active Tobacco Retailers in Queens, by Zip Code",
       caption = "Source: NYC Open Data") +
  theme(axis.text.x = element_text(angle = 90, hjust=1, vjust = 0.5))

The same bar plot, reordered by number of active tobacco retailers

ggplot() +
  geom_col(data = queens_tobacco_stats,
           mapping = aes(x = reorder(zip_code, active_tobacco_retailers), y = active_tobacco_retailers)) +
  labs(x = "Zip Code",
       y = "Number of Active Tobacco Retailers",
      title = "Active Tobacco Retailers in Queens, by Zip Code",
      caption = "Source: NYC Open Data") +
  theme(axis.text.x = element_text(angle = 90, hjust=1, vjust = 0.5))