Analysis of Active Tobacco Retailer Dealer Licenses in Queens.
NYC Open Data: NYC Open Data
Import Libraries:
library(tidyverse)
library(tidycensus)
library(RSocrata)
library(ggcharts)
library(ggblanket)
library(knitr)
library(DT)
library(sf)
library(scales)
library(viridis)
Import the data directly into RStudio using url path
OpenData <- read.socrata("https://data.cityofnewyork.us/resource/adw8-wvxb.csv")
Select the columns needed, filter by Queens Borough and rename them. I used the ZIP code to be able to compare the amount of Active Tobacco Retailer inbetween Queens.
columns <-c("Business.Name","Address.Borough","Address.ZIP","Latitude","Longitude","Location")
OpenDataTabacco <- OpenData %>%
select(all_of(columns)) %>%
filter(Address.Borough == "Queens") %>%
rename("NAME"="Business.Name",
"BOROUGH"="Address.Borough",
"ZIPcode"="Address.ZIP")
Made a Data Frame with the Name, Borough, Zip Code, Latitude, Longitude and Location. The three last columns I put because I tried to make a map, but I could not manage to do a Spatial Joint.
OpenDataTabacco %>%
datatable()
To explore a bit more our dataset I made a SCATTER PLOT. Even though is not so useful as a map, at least show the abstract location of the Active Tobacco Retailer inbetween Queens.
ggplot(OpenDataTabacco, aes(x = Latitude, y = Longitude)) +
geom_point(size = .5) +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma) +
labs(x = "Latitude", y = "Longitude",
title = "Location",
caption = "Source: NYC Open Data")
Calculated summary statistics by ZIP Code
OpenDataTabacco_Stat <- OpenDataTabacco %>%
group_by(ZIPcode) %>%
summarise(ActiveTabacco = n()) %>%
mutate(ZIP_code = as.character(ZIPcode))
I plot a bar chart that compares the amount of Active Tobacco Retailer in between Queens by Zip Code.
ggplot(data=OpenDataTabacco_Stat,
aes(x=reorder(ZIP_code,ActiveTabacco),
y=ActiveTabacco)) +
geom_col() +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.1)) +
labs(x = "ZIPcode", y = "ActiveTabacco",
title = "Active Tabacco Business",
caption = "NYC Open Data")