Introduction

NYC (New Amsterdam) was founded on a land deal and from then on has been driven by profit and commerce. NYC has since weathered many booms and busts. It is with this I endeavor to analyze the impact on businesses of recessions in the United States. My father is a small business owner and I have friends who have started businesses in NYC. I would like to understand potentially how long a recession would impact the NYC economic climate as I have been here in this city since birth.

The data sources I would use are:

The National Bureau of Economic Research (NBER) web site on U.S Business Cycle Expansions and Contractions: https://www.nber.org/cycles/. Though I had initially decided to use Wikipedia https://en.wikipedia.org/wiki/List_of_recessions_in_the_United_States as a source for dates of recessions, NBER’s site proved to be a more reliable source. A copy of the web page was downloaded on May 6, 2020 and is available here: https://raw.githubusercontent.com/logicalschema/DATA607/master/FinalProject/nber.html
NYC Open Data: https://data.cityofnewyork.us/Business/Legally-Operating-Businesses/w7w3-xahh as a source for data from at least 1998 for licenses of businesses in NYC. The data contains the types of industry and information about their licenses. A download of the data was made to csv and is available here: https://github.com/logicalschema/DATA607/raw/master/FinalProject/Legally_Operating_Businesses.csv.gz

Based on the data, I would like to see how long NYC would potentially be impacted by financial downturn.

Let’s start by using the libraries we will be using.

library(knitr)
library(xml2)
library(stringr)
library(dplyr)
library(rvest)
library(lubridate)
library(tidyverse)
library(summarytools)
library(ggplot2)
library(leaflet)

# For summarytools package
opts_chunk$set(results = 'asis',      
                comment = NA,
                prompt  = FALSE,
                cache   = FALSE)

st_options(plain.ascii = FALSE,        
            style        = "rmarkdown", 
            footnote     = NA,          
            subtitle.emphasis = FALSE)  


# Function to remove HTML tags from x
cleanHTML <- function(x) {
  return(gsub("<.*?>", "", x))
}

Import

This section will go over how the data was imported for use in R.

NYC Zip Codes

NYC consists of five boroughs: the Bronx, Brooklyn, Manhattan (aka New York), Queens, and Staten Island. Queens can be broken down into its old village system where cities are listed as such villages as Jackson Heights, Elmhurst, and Forest Hills. I decided that I needed a one-to-one mapping of zip code to borough to better represent the data. For this, I consulted the web site: https://www.nycbynatives.com/nyc_info/new_york_city_zip_codes.php that breaks down zip codes into the associated borough. This page was downloaded and saved on my Github.

zipHTML <- read_html("https://raw.githubusercontent.com/logicalschema/DATA607/master/FinalProject/neighborhoods.html")
tbls <- html_nodes(zipHTML, "table")
zip <- as.data.frame(html_table(tbls))
zip <- zip[, -c(3)]
zipcode <- zip$X1
zipcode <- c(zipcode, zip$X4)
borough <- zip$X2
borough <- c(borough, zip$X5)
zipcodes <- cbind(zipcode, borough)
zipcodes <- as.data.frame(zipcodes)
zipcodes <- distinct(zipcodes, zipcode, .keep_all = TRUE)
zipcodes$borough <- str_replace(zipcodes$borough, 'Staten', 'Staten Island')

# zipcodes dataframe matches zip codes with the appropriate borough
zipcodes

The dataframe zipcodes has a mapping of zip codes to their corresponding borough. This will be used for the import of te NYC OpenData to map valid zip codes with their corresponding borough.

Here is map view of NYC using the leaflet library.

# https://rstudio.github.io/leaflet/
# https://maps.nyc.gov/tiles/leaflet-xyz.html
# https://rpubs.com/jhofman/nycmaps

m <- leaflet() %>%
  addTiles() %>%  
  addMarkers(lng=-73.989850, lat=40.748590, popup="<b>CUNY School of Professional Studies</b><br>119 W 31st Street<br>New York, NY 10001") %>% 
  setView(lng = -73.98925, lat = 40.75039, zoom = 11)
m  # Print the map

NYC Data

The NYC OpenData Platform allows you to download their data in xml and csv formats. It is a very handy source. The csv was downloaded and I placed it on my Github.

# Using read_csv as it allows for on the fly decompression of zip csv files
# licenseData <- read_csv("https://github.com/logicalschema/DATA607/raw/master/FinalProject/Legally_Operating_Businesses.csv.gz")

# Reading the Legally_Operating_Businesses.csv file
licenseData <- read.csv("Legally_Operating_Businesses.csv", 
                    sep = ",", 
                    header = TRUE)


# A view of what was imported
head(licenseData, 10)

The imported table licenseData has these as columns: DCA.License.Number, License.Type, License.Expiration.Date, License.Status, License.Creation.Date, Industry, Business.Name, Business.Name.2, Address.Building, Address.Street.Name, Secondary.Address.Street.Name, Address.City, Address.State, Address.ZIP, Contact.Phone.Number, Address.Borough, Borough.Code, Community.Board, Council.District, BIN, BBL, NTA, Census.Tract, Detail, Longitude, Latitude, Location

I will tidy up the data and take a subset of the information I need. In the end of this section, I will use the data from zipcodes to populate a new column Address.Borough that will have the corresponding borough of the business address. For invalid zip codes, it will have NA.

# Remove unnecessary columns from the imported data 
data <- subset(licenseData, select = -c(Contact.Phone.Number, 
                                        Address.Borough, 
                                        Borough.Code, 
                                        Community.Board, 
                                        Council.District, 
                                        BIN, 
                                        BBL, 
                                        NTA, 
                                        Census.Tract, 
                                        Detail, 
                                        Longitude, 
                                        Latitude, 
                                        Location)
               )

# Convert License.Creation.Date and License.Expiration.Date to Date types
data$License.Expiration.Date <- as.Date(data$License.Expiration.Date, format = "%m/%d/%Y")
data$License.Creation.Date <- as.Date(data$License.Creation.Date, format = "%m/%d/%Y")

# Reorder the columns
data <- data[c(6,4,5,3,2,1,7,8,9,10,11,12,13,14)]

# Order the rows by the License Creation.Date
data <- data[order(data$License.Creation.Date),]


# Zip codes to Boroughs
tempzips <- data$Address.ZIP


for (value in zipcodes$zipcode){
    tempborough <-  zipcodes %>% filter(zipcode == value) %>% select(borough)
    tempzips <- str_replace_all(tempzips, as.character(value), as.character(tempborough))
}

# Set NA to any zip codes that are not found in NYC
tempzips[!(tempzips %in% c("Bronx","Brooklyn", "Manhattan", "Queens", "Staten Island"))] <- NA

# Add the boroughs as Address.Borough to the NYC data
data <- cbind(data, "Address.Borough" = tempzips)

# A view of the data
head(data, 10)

# Remove before finalizing project
write.csv(data,"data.csv", row.names = FALSE)

We have the recession and NYC’s data. We are ready to begin analysis.

OpenData Format

The following is a description of the data fields from the https://data.cityofnewyork.us/Business/Legally-Operating-Businesses/w7w3-xahh web site.

Column Name	Description	Type
DCA License Number	An identification number issued to businesses/individuals to operate legally for the duration of their license term.	Plain Text
License Type	DCA offers two license types: Business. License is issued to an entity/organization based on their address. Individual. License is issued to an individual person.	Plain Text
License Expiration Date	Expiration date of DCA License.	Date & Time
License Status		Plain Text
License Creation Date		Date & Time
Industry		Plain Text
Business Name	"The legal business name as filed with the New York State Secretary of State or County Clerk or if individual	the person’s first name and last name."
Business Name 2	"If applicable	the Doing-Business-As (DBA)/trade name."
Address Building	The building number of the business’s address.	Plain Text
Address Street Name	The street name of the business’s address.	Plain Text
Secondary Address Street Name	The cross-street of the business’s address.	Plain Text
Address City	The city where the business is located.	Plain Text
Address State	The state where the business is located.	Plain Text
Address ZIP	The zip code where the business is located.	Plain Text
Contact Phone Number	Contact telephone number for legally operating business.	Plain Text
Address Borough	The borough where the business is located.	Plain Text
Borough Code	Provides the following information for each listed license category: Amusement Device: device name(s) Cabaret / Catering Establishment: capacity of largest room	number of additional rooms Games of Chance: type of game Garage / Parking Lot: number of vehicle and bicycle spaces Sidewalk Cafe: type
Community Board		Plain Text
Council District		Plain Text
BIN		Plain Text
BBL		Plain Text
NTA		Plain Text
Census Tract		Plain Text
Detail		Plain Text
Longitude		Plain Text
Latitude		Plain Text
Location		Location

NBER

I used rvest to obtain the information from NBER’s page. I was mainly interested in the recession data.

As mentioned before, the page was downloaded and stored in my Github account. NBER did not make a clean HTML table to represent their data. The peaks and troughs of expansions and contractions are represented as individual <td> elements in one row instead of multiple rows. I am defining a recession period as a peak to its corresponding trough using the NBER data. Some text wrangling is needed for our data.

nberHTML <- read_html("https://raw.githubusercontent.com/logicalschema/DATA607/master/FinalProject/nber.html")

# Grab the td HTML nodes that have the nowrap attribute
tableData <- nberHTML %>%  html_nodes("td[nowrap]")

# We only need the first two elements representing the peaks and troughs of recessions
tableData <- head(tableData, 2) %>% str_replace_all("[\n]" , "")
tableData

[1] "
June 1857(II)
October 1860(III)
April 1865(I)
June 1869(II)
October 1873(III)
March 1882(I)
March 1887(II)
July 1890(III)
January 1893(I)
December 1895(IV)
June 1899(III)
September 1902(IV)
May 1907(II)
January 1910(I)
January 1913(I)
August 1918(III)
January 1920(I)
May 1923(II)
October 1926(III)
August 1929(III)
May 1937(II)
February 1945(I)
November 1948(IV)
July 1953(II)
August 1957(III)
April 1960(II)
December 1969(IV)
November 1973(IV)
January 1980(I)
July 1981(III)
July 1990(III)
March 2001(I)
December 2007 (IV)
"
[2] " December 1854 (IV)
December 1858 (IV)
June 1861 (III)
December 1867 (I)
December 1870 (IV)
March 1879 (I)
May 1885 (II)
April 1888 (I)
May 1891 (II)
June 1894 (II)
June 1897 (II)
December 1900 (IV)
August 1904 (III)
June 1908 (II)
January 1912 (IV)
December 1914 (IV)
March 1919 (I)
July 1921 (III)
July 1924 (III)
November 1927 (IV)
March 1933 (I)
June 1938 (II)
October 1945 (IV)
October 1949 (IV)
May 1954 (II)
April 1958 (II)
February 1961 (I)
November 1970 (IV)
March 1975 (I)
July 1980 (III)
November 1982 (IV)
March 1991(I)
November 2001 (IV)
June 2009 (II)

The above gives us the elements for peaks and trough. Each row is denoted by a line break and will need some cleaning before importing into a dataframe.

peaks <- head(tableData, 1) 
peaks <- peaks %>% str_split("<br>")
peaks <- cleanHTML(peaks)
peaks <- data.frame(peak=unlist(strsplit(as.character(peaks),",")))

troughs <- tail(tableData, 1) 
troughs <- troughs %>% str_split("<br>")
troughs <- cleanHTML(troughs)
troughs <- data.frame(trough=unlist(strsplit(as.character(troughs),",")))

# Combine the peaks and troughs
resessionData <- cbind(peaks, troughs)

# Remove (I... IV) and trailing space
resessionData$peak <- str_replace(resessionData$peak, "\\(.*\\)", "")
resessionData$peak <- str_replace_all(resessionData$peak, '\"', '')

resessionData$trough <- str_replace(resessionData$trough, "\\(.*\\)", "")
resessionData$trough <- str_replace_all(resessionData$trough, '\"', '')

# Replace multiple spaces with one space 
resessionData$peak <- str_replace_all(resessionData$peak, '([ ]+)', ' ')
resessionData$trough <- str_replace_all(resessionData$trough, '([ ]+)', ' ')

# Remove leading and trailing whitespace
resessionData$peak <- str_trim(resessionData$peak)
resessionData$trough <- str_trim(resessionData$trough)


#Remove the first and last rows as they were irrelevant 
resessionData <- resessionData[-1, ]
resessionData <- head(resessionData, -1)

# Creating a new column where the peaks and troughs are converted to Date variables
start <-  as.Date(paste(resessionData$peak, "1", sep = " "), format = "%B %Y %d")
end <- as.Date(paste(resessionData$trough, "1", sep = " "), format = "%B %Y %d")
resessionData <- cbind(resessionData, start = start, end = end)

Here is a look at the NBER data tidied up.

resessionData

It looks like that the only usable dates for recessions in relation to the NYC OpenData are the recessions in 2001 and 2007. Specifically, the recessions for the periods of March 2001 to November 2001 and December 2007 to June 2009. OpenData did not have consistent valid data until 1998.

Analysis

In this section we will look through to see what insights we can gleam from our NYC license data.

This is a view of the frequency for the Industry variable of the license data.

freq(data$Industry, order = "freq", plain.ascii = FALSE)

Frequencies

data$Industry
Type: Factor

	Freq	% Valid	% Valid Cum.	% Total	% Total Cum.
Home Improvement Salesperson	33363	16.40	16.40	16.40	16.40
Home Improvement Contractor	30377	14.93	31.33	14.93	31.33
Tobacco Retail Dealer	24534	12.06	43.39	12.06	43.39
Secondhand Dealer - General	12661	6.22	49.62	6.22	49.62
Electronics Store	10718	5.27	54.89	5.27	54.89
Tow Truck Driver	7770	3.82	58.71	3.82	58.71
Stoop Line Stand	6585	3.24	61.94	3.24	61.94
Sightseeing Guide	6400	3.15	65.09	3.15	65.09
General Vendor	6281	3.09	68.18	3.09	68.18
Pedicab Driver	6076	2.99	71.16	2.99	71.16
Electronic & Appliance Service	5308	2.61	73.77	2.61	73.77
Laundries	4670	2.30	76.07	2.30	76.07
Laundry	4310	2.12	78.19	2.12	78.19
Locksmith	4101	2.02	80.20	2.02	80.20
Debt Collection Agency	4064	2.00	82.20	2.00	82.20
Process Server Individual	3987	1.96	84.16	1.96	84.16
Ticket Seller	3428	1.69	85.84	1.69	85.84
Electronic Cigarette Dealer	3182	1.56	87.41	1.56	87.41
Sidewalk Cafe	3135	1.54	88.95	1.54	88.95
Laundry Jobber	3053	1.50	90.45	1.50	90.45
Garage	2525	1.24	91.69	1.24	91.69
Dealer In Products	1972	0.97	92.66	0.97	92.66
Secondhand Dealer - Auto	1710	0.84	93.50	0.84	93.50
Amusement Device Portable	1530	0.75	94.25	0.75	94.25
Tow Truck Company	1193	0.59	94.84	0.59	94.84
Parking Lot	1090	0.54	95.38	0.54	95.38
Employment Agency	961	0.47	95.85	0.47	95.85
Pawnbroker	952	0.47	96.32	0.47	96.32
Amusement Device Temporary	794	0.39	96.71	0.39	96.71
Auctioneer	760	0.37	97.08	0.37	97.08
Pedicab Business	660	0.32	97.40	0.32	97.40
Motion Picture Projectionist	652	0.32	97.73	0.32	97.73
Newsstand	544	0.27	97.99	0.27	97.99
Special Sale	538	0.26	98.26	0.26	98.26
Horse Drawn Driver	465	0.23	98.49	0.23	98.49
Process Serving Agency	352	0.17	98.66	0.17	98.66
Cabaret	333	0.16	98.82	0.16	98.82
Amusement Device Permanent	289	0.14	98.96	0.14	98.96
Garage and Parking Lot	259	0.13	99.09	0.13	99.09
Games of Chance	235	0.12	99.21	0.12	99.21
Car Wash	184	0.09	99.30	0.09	99.30
Gaming Cafe	148	0.07	99.37	0.07	99.37
Scrap Metal Processor	141	0.07	99.44	0.07	99.44
Pool or Billiard Room	139	0.07	99.51	0.07	99.51
Horse Drawn Cab Owner	138	0.07	99.58	0.07	99.58
Catering Establishment	124	0.06	99.64	0.06	99.64
Bingo Game Operator	120	0.06	99.70	0.06	99.70
Tow Truck Exemption	118	0.06	99.75	0.06	99.75
Storage Warehouse	105	0.05	99.81	0.05	99.81
Auction House Premises	98	0.05	99.85	0.05	99.85
Scale Dealer Repairer	70	0.03	99.89	0.03	99.89
Locksmith Apprentice	55	0.03	99.92	0.03	99.92
Amusement Arcade	54	0.03	99.94	0.03	99.94
Sightseeing Bus	52	0.03	99.97	0.03	99.97
General Vendor Distributor	18	0.01	99.98	0.01	99.98
Commercial Lessor	14	0.01	99.98	0.01	99.98
Booting Company	12	0.01	99.99	0.01	99.99
Secondhand Dealer - Firearms	12	0.01	100.00	0.01	100.00
Ticket Seller Business	10	0.00	100.00	0.00	100.00
<NA>	0			0.00	100.00
Total	203429	100.00	100.00	100.00	100.00

Here is a summary of the data.

summary(data)

                         Industry      License.Status   License.Creation.Date
 Home Improvement Salesperson:33363   Active  : 74165   Min.   :1977-01-24   
 Home Improvement Contractor :30377   Inactive:129264   1st Qu.:2008-07-11   
 Tobacco Retail Dealer       :24534                     Median :2012-05-04   
 Secondhand Dealer - General :12661                     Mean   :2011-10-16   
 Electronics Store           :10718                     3rd Qu.:2016-03-24   
 Tow Truck Driver            : 7770                     Max.   :2020-04-24   
 (Other)                     :84006                                          
 License.Expiration.Date     License.Type      DCA.License.Number
 Min.   :2010-01-02      Business  :130091   1374839-DCA:     2  
 1st Qu.:2014-04-30      Individual: 73338   2003600-DCA:     2  
 Median :2018-03-31                          0002902-DCA:     1  
 Mean   :2017-08-08                          0006840-DCA:     1  
 3rd Qu.:2021-02-28                          0010669-DCA:     1  
 Max.   :2022-12-15                          0010699-DCA:     1  
 NA's   :35                                  (Other)    :203421  
                Business.Name            Business.Name.2   Address.Building
 T-MOBILE NORTHEAST LLC:   609                   :170329          : 74084  
 RADIOSHACK CORPORATION:   432   T-MOBILE        :   293   1      :   487  
 DUANE READE           :   292   T-Mobile        :   188   200    :   454  
 SP PLUS CORPORATION   :   220   AT&T MOBILITY   :   182   2      :   347  
 DUANE READE INC       :   209   VERIZON WIRELESS:   141   10     :   330  
 SPRINT SPECTRUM L.P.  :   207   T-Mobile 4110   :   131   50     :   310  
 (Other)               :201460   (Other)         : 32165   (Other):127417  
    Address.Street.Name  Secondary.Address.Street.Name        Address.City  
              : 73338                   :201395        BROOKLYN     :50438  
 BROADWAY     :  3769   6 AVENUE        :    49        NEW YORK     :37681  
 3RD AVE      :  1835   8 AVENUE        :    48        BRONX        :24700  
 5TH AVE      :  1667   BROADWAY        :    34        STATEN ISLAND:11223  
 JAMAICA AVE  :  1591   LEXINGTON AVENUE:    27        JAMAICA      : 5038  
 ROOSEVELT AVE:  1189   3 AVENUE        :    24        FLUSHING     : 4763  
 (Other)      :120040   (Other)         :  1852        (Other)      :69586  
 Address.State     Address.ZIP          Address.Borough 
 NY     :191494   11385  :  2580   Bronx        :24766  
 NJ     :  5465   11214  :  2460   Brooklyn     :50550  
 PA     :  1069   11220  :  2389   Manhattan    :36444  
        :   831   11368  :  2204   Queens       :48156  
 CT     :   610   11235  :  2175   Staten Island:11247  
 CA     :   411   11218  :  2014   NA's         :32266  
 (Other):  3549   (Other):189607

Here is an additional summary of the license data. Note the earliest license creation date for this data is January 24, 1977 and the latest is April 24, 2020.

print(dfSummary(data[, c(1:5)], graph.magnif = 0.75), method = 'render')

Data Frame Summary

Dimensions: 203429 x 5
Duplicates: 89678

Variable

Stats / Values

Freqs (% of Valid)

Graph

Valid

Missing

Industry [factor]

1. Amusement Arcade 2. Amusement Device Permanen 3. Amusement Device Portable 4. Amusement Device Temporar 5. Auction House Premises 6. Auctioneer 7. Bingo Game Operator 8. Booting Company 9. Cabaret 10. Car Wash [ 49 others ]

54	(	0.0%	)
289	(	0.1%	)
1530	(	0.8%	)
794	(	0.4%	)
98	(	0.0%	)
760	(	0.4%	)
120	(	0.1%	)
12	(	0.0%	)
333	(	0.2%	)
184	(	0.1%	)
199255	(	98.0%	)

203429 (100%)

0 (0%)

License.Status [factor]

1. Active 2. Inactive

74165	(	36.5%	)
129264	(	63.5%	)

203429 (100%)

0 (0%)

License.Creation.Date [Date]

min : 1977-01-24 med : 2012-05-04 max : 2020-04-24 range : 43y 3m 0d

6977 distinct values

203429 (100%)

0 (0%)

License.Expiration.Date [Date]

min : 2010-01-02 med : 2018-03-31 max : 2022-12-15 range : 12y 11m 13d

1578 distinct values

203394 (99.98%)

35 (0.02%)

License.Type [factor]

1. Business 2. Individual

130091	(	63.9%	)
73338	(	36.0%	)

203429 (100%)

0 (0%)

2001: Tobacco and Tow Trucks

Looking at the top 10 industries for new licenses in 2001 for the five boroughs:

“Tobacco Retail Dealer” licenses were the most frequent across the five boroughs
“Home Improvement Salesperson and Contractor” were top industries for Brooklyn and Staten Island.
“Tow Truck Driver” was in the top 10 for all the boroughs except for Manhattan.

2007 to 2009

2019

Visuals

This section has some graphical representations of the data.

New Licenses Created by Year

The following graph is the frequency of licenses created by year.

# Histogram of licenses created by year
freq <- data %>% filter(License.Status == "Active") %>%
  mutate(year = as.numeric(format(License.Creation.Date, '%Y'))) %>%
  group_by(year) %>% 
  tally()

# Basic barplot
p <- ggplot(data=freq, aes(x=year, y=n)) + 
  geom_bar(stat="identity", fill="steelblue") +
  labs(title = "Frequency of Licenses by Year") + 
  xlab("Year") +
  ylab("Count")
p

Bar Graph by Industry

The following graph is the licenses created for industries.

theme_set(theme_classic())

# Histogram on a Categorical variable
g <- ggplot(data, aes(Industry)) + coord_flip()
g + geom_bar(aes(fill=Address.Borough), width = 0.5) + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6)) + 
  labs(title="Histogram on Industry", 
       subtitle="Industry across NYC Borough")

Circular Bargraph

The following graph depicts the frequency of licenses by industry for each of the 5 boroughs.

# https://www.r-graph-gallery.com/295-basic-circular-barplot.html was used to assist in making the circular histogram
data1 <- data.frame(
  id=seq(1,59),
  value=sample( seq(10,100), 59, replace=T)
)

data1 <- cbind(data1, "individual" = unique(data$Industry))

# ----- This section prepare a dataframe for labels ---- #
# Get the name and the y position of each label
label_data <- data1
 
# calculate the ANGLE of the labels
number_of_bar <- nrow(label_data)
angle <-  90 - 360 * (label_data$id-0.5) / number_of_bar     

# calculate the alignment of labels: right or left
# If I am on the left part of the plot, my labels have currently an angle < -90
label_data$hjust<-ifelse( angle < -90, 1, 0)
 
# flip angle BY to make them readable
label_data$angle<-ifelse(angle < -90, angle+180, angle)
# ----- ------------------------------------------- ---- #

 

theme_set(theme_classic())

# Histogram on a Categorical variable
g <- ggplot(data, aes(Industry)) + ylim(-40,120) + coord_polar()
g + geom_bar(aes(fill=Address.Borough), width = 0.5) + 

  # Add the labels, using the label_data dataframe that we have created before
  geom_text(data=label_data, 
            aes(x=id, y=value+10, label=individual, hjust=hjust), 
            color="black", fontface="bold",alpha=0.6, size=2.5, angle= label_data$angle, inherit.aes = FALSE ) +
  
 theme(
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank()
  )

Warning: Removed 209 rows containing missing values (geom_bar).

Shiny App

A small app was constructed to understand the data by year and borough. It is located here: https://logicalschema.shinyapps.io/NYCLicenses/

Conclusion

I was not able to reach a conclusion with regards to finding a causal relationship between recessions and license data. There were only two recessions within the scope of the Open NYC Data. Overall, it seemed that NYC’s license applications climbed within a year of a recession. Business licenses such as contractors and those associated with real estate flourished during downturns. In the future, I would like to cross reference minority owned businesses and borough in addition to a review of average mortgage applications.

I did enjoy learning how to use Shiny but was unable to reach conclusive observations about the data.

Data 607 Final Project

Sung Lee

5/5/2020

Introduction

Import

NYC Zip Codes

NYC Data

OpenData Format

NBER

Analysis

Frequencies

Data Frame Summary

2001: Tobacco and Tow Trucks

2007 to 2009

2019

Visuals

New Licenses Created by Year

Bar Graph by Industry

Circular Bargraph

Shiny App

Conclusion