Assignment on RPubs
Rmd on Github

Introduction

NYC (New Amsterdam) was founded on a land deal and from then on has been driven by profit and commerce. NYC has since weathered many booms and busts. It is with this I endeavor to analyze the impact on businesses of recessions in the United States. My father is a small business owner and I have friends who have started businesses in NYC. I would like to understand potentially how long a recession would impact the NYC economic climate as I have been here in this city since birth.

The data sources I would use are:

  1. The National Bureau of Economic Research (NBER) web site on U.S Business Cycle Expansions and Contractions: https://www.nber.org/cycles/. Though I had initially decided to use Wikipedia https://en.wikipedia.org/wiki/List_of_recessions_in_the_United_States as a source for dates of recessions, NBER’s site proved to be a more reliable source. A copy of the web page was downloaded on May 6, 2020 and is available here: https://raw.githubusercontent.com/logicalschema/DATA607/master/FinalProject/nber.html

  2. NYC Open Data: https://data.cityofnewyork.us/Business/Legally-Operating-Businesses/w7w3-xahh as a source for data from at least 1998 for licenses of businesses in NYC. The data contains the types of industry and information about their licenses. A download of the data was made to csv and is available here: https://github.com/logicalschema/DATA607/raw/master/FinalProject/Legally_Operating_Businesses.csv.gz

Based on the data, I would like to see how long NYC would potentially be impacted by financial downturn.

Let’s start by using the libraries we will be using.

library(knitr)
library(xml2)
library(stringr)
library(dplyr)
library(rvest)
library(lubridate)
library(tidyverse)
library(summarytools)
library(ggplot2)
library(leaflet)

# For summarytools package
opts_chunk$set(results = 'asis',      
                comment = NA,
                prompt  = FALSE,
                cache   = FALSE)

st_options(plain.ascii = FALSE,        
            style        = "rmarkdown", 
            footnote     = NA,          
            subtitle.emphasis = FALSE)  


# Function to remove HTML tags from x
cleanHTML <- function(x) {
  return(gsub("<.*?>", "", x))
}

Import

This section will go over how the data was imported for use in R.

NYC Zip Codes

NYC consists of five boroughs: the Bronx, Brooklyn, Manhattan (aka New York), Queens, and Staten Island. Queens can be broken down into its old village system where cities are listed as such villages as Jackson Heights, Elmhurst, and Forest Hills. I decided that I needed a one-to-one mapping of zip code to borough to better represent the data. For this, I consulted the web site: https://www.nycbynatives.com/nyc_info/new_york_city_zip_codes.php that breaks down zip codes into the associated borough. This page was downloaded and saved on my Github.

zipHTML <- read_html("https://raw.githubusercontent.com/logicalschema/DATA607/master/FinalProject/neighborhoods.html")
tbls <- html_nodes(zipHTML, "table")
zip <- as.data.frame(html_table(tbls))
zip <- zip[, -c(3)]
zipcode <- zip$X1
zipcode <- c(zipcode, zip$X4)
borough <- zip$X2
borough <- c(borough, zip$X5)
zipcodes <- cbind(zipcode, borough)
zipcodes <- as.data.frame(zipcodes)
zipcodes <- distinct(zipcodes, zipcode, .keep_all = TRUE)
zipcodes$borough <- str_replace(zipcodes$borough, 'Staten', 'Staten Island')

# zipcodes dataframe matches zip codes with the appropriate borough
zipcodes

The dataframe zipcodes has a mapping of zip codes to their corresponding borough. This will be used for the import of te NYC OpenData to map valid zip codes with their corresponding borough.

Here is map view of NYC using the leaflet library.

# https://rstudio.github.io/leaflet/
# https://maps.nyc.gov/tiles/leaflet-xyz.html
# https://rpubs.com/jhofman/nycmaps

m <- leaflet() %>%
  addTiles() %>%  
  addMarkers(lng=-73.989850, lat=40.748590, popup="<b>CUNY School of Professional Studies</b><br>119 W 31st Street<br>New York, NY 10001") %>% 
  setView(lng = -73.98925, lat = 40.75039, zoom = 11)
m  # Print the map

NYC Data

The NYC OpenData Platform allows you to download their data in xml and csv formats. It is a very handy source. The csv was downloaded and I placed it on my Github.

# Using read_csv as it allows for on the fly decompression of zip csv files
# licenseData <- read_csv("https://github.com/logicalschema/DATA607/raw/master/FinalProject/Legally_Operating_Businesses.csv.gz")

# Reading the Legally_Operating_Businesses.csv file
licenseData <- read.csv("Legally_Operating_Businesses.csv", 
                    sep = ",", 
                    header = TRUE)


# A view of what was imported
head(licenseData, 10)

The imported table licenseData has these as columns: DCA.License.Number, License.Type, License.Expiration.Date, License.Status, License.Creation.Date, Industry, Business.Name, Business.Name.2, Address.Building, Address.Street.Name, Secondary.Address.Street.Name, Address.City, Address.State, Address.ZIP, Contact.Phone.Number, Address.Borough, Borough.Code, Community.Board, Council.District, BIN, BBL, NTA, Census.Tract, Detail, Longitude, Latitude, Location

I will tidy up the data and take a subset of the information I need. In the end of this section, I will use the data from zipcodes to populate a new column Address.Borough that will have the corresponding borough of the business address. For invalid zip codes, it will have NA.

# Remove unnecessary columns from the imported data 
data <- subset(licenseData, select = -c(Contact.Phone.Number, 
                                        Address.Borough, 
                                        Borough.Code, 
                                        Community.Board, 
                                        Council.District, 
                                        BIN, 
                                        BBL, 
                                        NTA, 
                                        Census.Tract, 
                                        Detail, 
                                        Longitude, 
                                        Latitude, 
                                        Location)
               )

# Convert License.Creation.Date and License.Expiration.Date to Date types
data$License.Expiration.Date <- as.Date(data$License.Expiration.Date, format = "%m/%d/%Y")
data$License.Creation.Date <- as.Date(data$License.Creation.Date, format = "%m/%d/%Y")

# Reorder the columns
data <- data[c(6,4,5,3,2,1,7,8,9,10,11,12,13,14)]

# Order the rows by the License Creation.Date
data <- data[order(data$License.Creation.Date),]


# Zip codes to Boroughs
tempzips <- data$Address.ZIP


for (value in zipcodes$zipcode){
    tempborough <-  zipcodes %>% filter(zipcode == value) %>% select(borough)
    tempzips <- str_replace_all(tempzips, as.character(value), as.character(tempborough))
}

# Set NA to any zip codes that are not found in NYC
tempzips[!(tempzips %in% c("Bronx","Brooklyn", "Manhattan", "Queens", "Staten Island"))] <- NA

# Add the boroughs as Address.Borough to the NYC data
data <- cbind(data, "Address.Borough" = tempzips)

# A view of the data
head(data, 10)
# Remove before finalizing project
write.csv(data,"data.csv", row.names = FALSE)

We have the recession and NYC’s data. We are ready to begin analysis.

OpenData Format

The following is a description of the data fields from the https://data.cityofnewyork.us/Business/Legally-Operating-Businesses/w7w3-xahh web site.

Column Name Description Type
DCA License Number An identification number issued to businesses/individuals to operate legally for the duration of their license term. Plain Text
License Type DCA offers two license types: Business. License is issued to an entity/organization based on their address. Individual. License is issued to an individual person. Plain Text
License Expiration Date Expiration date of DCA License. Date & Time
License Status Plain Text
License Creation Date Date & Time
Industry Plain Text
Business Name "The legal business name as filed with the New York State Secretary of State or County Clerk or if individual the person’s first name and last name."
Business Name 2 "If applicable the Doing-Business-As (DBA)/trade name."
Address Building The building number of the business’s address. Plain Text
Address Street Name The street name of the business’s address. Plain Text
Secondary Address Street Name The cross-street of the business’s address. Plain Text
Address City The city where the business is located. Plain Text
Address State The state where the business is located. Plain Text
Address ZIP The zip code where the business is located. Plain Text
Contact Phone Number Contact telephone number for legally operating business. Plain Text
Address Borough The borough where the business is located. Plain Text
Borough Code Provides the following information for each listed license category: Amusement Device: device name(s) Cabaret / Catering Establishment: capacity of largest room number of additional rooms Games of Chance: type of game Garage / Parking Lot: number of vehicle and bicycle spaces Sidewalk Cafe: type
Community Board Plain Text
Council District Plain Text
BIN Plain Text
BBL Plain Text
NTA Plain Text
Census Tract Plain Text
Detail Plain Text
Longitude Plain Text
Latitude Plain Text
Location Location

NBER

I used rvest to obtain the information from NBER’s page. I was mainly interested in the recession data.

As mentioned before, the page was downloaded and stored in my Github account. NBER did not make a clean HTML table to represent their data. The peaks and troughs of expansions and contractions are represented as individual <td> elements in one row instead of multiple rows. I am defining a recession period as a peak to its corresponding trough using the NBER data. Some text wrangling is needed for our data.

nberHTML <- read_html("https://raw.githubusercontent.com/logicalschema/DATA607/master/FinalProject/nber.html")

# Grab the td HTML nodes that have the nowrap attribute
tableData <- nberHTML %>%  html_nodes("td[nowrap]")

# We only need the first two elements representing the peaks and troughs of recessions
tableData <- head(tableData, 2) %>% str_replace_all("[\n]" , "")
tableData
[1] "
June 1857(II)
October 1860(III)
April 1865(I)
June 1869(II)
October 1873(III)
March 1882(I)
March 1887(II)
July 1890(III)
January 1893(I)
December 1895(IV)
June 1899(III)
September 1902(IV)
May 1907(II)
January 1910(I)
January 1913(I)
August 1918(III)
January 1920(I)
May 1923(II)
October 1926(III)
August 1929(III)
May 1937(II)
February 1945(I)
November 1948(IV)
July 1953(II)
August 1957(III)
April 1960(II)
December 1969(IV)
November 1973(IV)
January 1980(I)
July 1981(III)
July 1990(III)
March 2001(I)
December 2007 (IV)
"
[2] " December 1854 (IV)
December 1858 (IV)
June 1861 (III)
December 1867 (I)
December 1870 (IV)
March 1879 (I)
May 1885 (II)
April 1888 (I)
May 1891 (II)
June 1894 (II)
June 1897 (II)
December 1900 (IV)
August 1904 (III)
June 1908 (II)
January 1912 (IV)
December 1914 (IV)
March 1919 (I)
July 1921 (III)
July 1924 (III)
November 1927 (IV)
March 1933 (I)
June 1938 (II)
October 1945 (IV)
October 1949 (IV)
May 1954 (II)
April 1958 (II)
February 1961 (I)
November 1970 (IV)
March 1975 (I)
July 1980 (III)
November 1982 (IV)
March 1991(I)
November 2001 (IV)
June 2009 (II)

"

The above gives us the elements for peaks and trough. Each row is denoted by a line break and will need some cleaning before importing into a dataframe.

peaks <- head(tableData, 1) 
peaks <- peaks %>% str_split("<br>")
peaks <- cleanHTML(peaks)
peaks <- data.frame(peak=unlist(strsplit(as.character(peaks),",")))

troughs <- tail(tableData, 1) 
troughs <- troughs %>% str_split("<br>")
troughs <- cleanHTML(troughs)
troughs <- data.frame(trough=unlist(strsplit(as.character(troughs),",")))

# Combine the peaks and troughs
resessionData <- cbind(peaks, troughs)

# Remove (I... IV) and trailing space
resessionData$peak <- str_replace(resessionData$peak, "\\(.*\\)", "")
resessionData$peak <- str_replace_all(resessionData$peak, '\"', '')

resessionData$trough <- str_replace(resessionData$trough, "\\(.*\\)", "")
resessionData$trough <- str_replace_all(resessionData$trough, '\"', '')

# Replace multiple spaces with one space 
resessionData$peak <- str_replace_all(resessionData$peak, '([ ]+)', ' ')
resessionData$trough <- str_replace_all(resessionData$trough, '([ ]+)', ' ')

# Remove leading and trailing whitespace
resessionData$peak <- str_trim(resessionData$peak)
resessionData$trough <- str_trim(resessionData$trough)


#Remove the first and last rows as they were irrelevant 
resessionData <- resessionData[-1, ]
resessionData <- head(resessionData, -1)

# Creating a new column where the peaks and troughs are converted to Date variables
start <-  as.Date(paste(resessionData$peak, "1", sep = " "), format = "%B %Y %d")
end <- as.Date(paste(resessionData$trough, "1", sep = " "), format = "%B %Y %d")
resessionData <- cbind(resessionData, start = start, end = end)

Here is a look at the NBER data tidied up.

resessionData

It looks like that the only usable dates for recessions in relation to the NYC OpenData are the recessions in 2001 and 2007. Specifically, the recessions for the periods of March 2001 to November 2001 and December 2007 to June 2009. OpenData did not have consistent valid data until 1998.

Analysis

In this section we will look through to see what insights we can gleam from our NYC license data.

This is a view of the frequency for the Industry variable of the license data.

freq(data$Industry, order = "freq", plain.ascii = FALSE)

Frequencies

data$Industry
Type: Factor

  Freq % Valid % Valid Cum. % Total % Total Cum.
Home Improvement Salesperson 33363 16.40 16.40 16.40 16.40
Home Improvement Contractor 30377 14.93 31.33 14.93 31.33
Tobacco Retail Dealer 24534 12.06 43.39 12.06 43.39
Secondhand Dealer - General 12661 6.22 49.62 6.22 49.62
Electronics Store 10718 5.27 54.89 5.27 54.89
Tow Truck Driver 7770 3.82 58.71 3.82 58.71
Stoop Line Stand 6585 3.24 61.94 3.24 61.94
Sightseeing Guide 6400 3.15 65.09 3.15 65.09
General Vendor 6281 3.09 68.18 3.09 68.18
Pedicab Driver 6076 2.99 71.16 2.99 71.16
Electronic & Appliance Service 5308 2.61 73.77 2.61 73.77
Laundries 4670 2.30 76.07 2.30 76.07
Laundry 4310 2.12 78.19 2.12 78.19
Locksmith 4101 2.02 80.20 2.02 80.20
Debt Collection Agency 4064 2.00 82.20 2.00 82.20
Process Server Individual 3987 1.96 84.16 1.96 84.16
Ticket Seller 3428 1.69 85.84 1.69 85.84
Electronic Cigarette Dealer 3182 1.56 87.41 1.56 87.41
Sidewalk Cafe 3135 1.54 88.95 1.54 88.95
Laundry Jobber 3053 1.50 90.45 1.50 90.45
Garage 2525 1.24 91.69 1.24 91.69
Dealer In Products 1972 0.97 92.66 0.97 92.66
Secondhand Dealer - Auto 1710 0.84 93.50 0.84 93.50
Amusement Device Portable 1530 0.75 94.25 0.75 94.25
Tow Truck Company 1193 0.59 94.84 0.59 94.84
Parking Lot 1090 0.54 95.38 0.54 95.38
Employment Agency 961 0.47 95.85 0.47 95.85
Pawnbroker 952 0.47 96.32 0.47 96.32
Amusement Device Temporary 794 0.39 96.71 0.39 96.71
Auctioneer 760 0.37 97.08 0.37 97.08
Pedicab Business 660 0.32 97.40 0.32 97.40
Motion Picture Projectionist 652 0.32 97.73 0.32 97.73
Newsstand 544 0.27 97.99 0.27 97.99
Special Sale 538 0.26 98.26 0.26 98.26
Horse Drawn Driver 465 0.23 98.49 0.23 98.49
Process Serving Agency 352 0.17 98.66 0.17 98.66
Cabaret 333 0.16 98.82 0.16 98.82
Amusement Device Permanent 289 0.14 98.96 0.14 98.96
Garage and Parking Lot 259 0.13 99.09 0.13 99.09
Games of Chance 235 0.12 99.21 0.12 99.21
Car Wash 184 0.09 99.30 0.09 99.30
Gaming Cafe 148 0.07 99.37 0.07 99.37
Scrap Metal Processor 141 0.07 99.44 0.07 99.44
Pool or Billiard Room 139 0.07 99.51 0.07 99.51
Horse Drawn Cab Owner 138 0.07 99.58 0.07 99.58
Catering Establishment 124 0.06 99.64 0.06 99.64
Bingo Game Operator 120 0.06 99.70 0.06 99.70
Tow Truck Exemption 118 0.06 99.75 0.06 99.75
Storage Warehouse 105 0.05 99.81 0.05 99.81
Auction House Premises 98 0.05 99.85 0.05 99.85
Scale Dealer Repairer 70 0.03 99.89 0.03 99.89
Locksmith Apprentice 55 0.03 99.92 0.03 99.92
Amusement Arcade 54 0.03 99.94 0.03 99.94
Sightseeing Bus 52 0.03 99.97 0.03 99.97
General Vendor Distributor 18 0.01 99.98 0.01 99.98
Commercial Lessor 14 0.01 99.98 0.01 99.98
Booting Company 12 0.01 99.99 0.01 99.99
Secondhand Dealer - Firearms 12 0.01 100.00 0.01 100.00
Ticket Seller Business 10 0.00 100.00 0.00 100.00
<NA> 0 0.00 100.00
Total 203429 100.00 100.00 100.00 100.00

Here is a summary of the data.

summary(data)
                         Industry      License.Status   License.Creation.Date
 Home Improvement Salesperson:33363   Active  : 74165   Min.   :1977-01-24   
 Home Improvement Contractor :30377   Inactive:129264   1st Qu.:2008-07-11   
 Tobacco Retail Dealer       :24534                     Median :2012-05-04   
 Secondhand Dealer - General :12661                     Mean   :2011-10-16   
 Electronics Store           :10718                     3rd Qu.:2016-03-24   
 Tow Truck Driver            : 7770                     Max.   :2020-04-24   
 (Other)                     :84006                                          
 License.Expiration.Date     License.Type      DCA.License.Number
 Min.   :2010-01-02      Business  :130091   1374839-DCA:     2  
 1st Qu.:2014-04-30      Individual: 73338   2003600-DCA:     2  
 Median :2018-03-31                          0002902-DCA:     1  
 Mean   :2017-08-08                          0006840-DCA:     1  
 3rd Qu.:2021-02-28                          0010669-DCA:     1  
 Max.   :2022-12-15                          0010699-DCA:     1  
 NA's   :35                                  (Other)    :203421  
                Business.Name            Business.Name.2   Address.Building
 T-MOBILE NORTHEAST LLC:   609                   :170329          : 74084  
 RADIOSHACK CORPORATION:   432   T-MOBILE        :   293   1      :   487  
 DUANE READE           :   292   T-Mobile        :   188   200    :   454  
 SP PLUS CORPORATION   :   220   AT&T MOBILITY   :   182   2      :   347  
 DUANE READE INC       :   209   VERIZON WIRELESS:   141   10     :   330  
 SPRINT SPECTRUM L.P.  :   207   T-Mobile 4110   :   131   50     :   310  
 (Other)               :201460   (Other)         : 32165   (Other):127417  
    Address.Street.Name  Secondary.Address.Street.Name        Address.City  
              : 73338                   :201395        BROOKLYN     :50438  
 BROADWAY     :  3769   6 AVENUE        :    49        NEW YORK     :37681  
 3RD AVE      :  1835   8 AVENUE        :    48        BRONX        :24700  
 5TH AVE      :  1667   BROADWAY        :    34        STATEN ISLAND:11223  
 JAMAICA AVE  :  1591   LEXINGTON AVENUE:    27        JAMAICA      : 5038  
 ROOSEVELT AVE:  1189   3 AVENUE        :    24        FLUSHING     : 4763  
 (Other)      :120040   (Other)         :  1852        (Other)      :69586  
 Address.State     Address.ZIP          Address.Borough 
 NY     :191494   11385  :  2580   Bronx        :24766  
 NJ     :  5465   11214  :  2460   Brooklyn     :50550  
 PA     :  1069   11220  :  2389   Manhattan    :36444  
        :   831   11368  :  2204   Queens       :48156  
 CT     :   610   11235  :  2175   Staten Island:11247  
 CA     :   411   11218  :  2014   NA's         :32266  
 (Other):  3549   (Other):189607                        

Here is an additional summary of the license data. Note the earliest license creation date for this data is January 24, 1977 and the latest is April 24, 2020.

print(dfSummary(data[, c(1:5)], graph.magnif = 0.75), method = 'render')

Data Frame Summary


Dimensions: 203429 x 5
Duplicates: 89678
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 Industry [factor] 1. Amusement Arcade 2. Amusement Device Permanen 3. Amusement Device Portable 4. Amusement Device Temporar 5. Auction House Premises 6. Auctioneer 7. Bingo Game Operator 8. Booting Company 9. Cabaret 10. Car Wash [ 49 others ]
54(0.0%)
289(0.1%)
1530(0.8%)
794(0.4%)
98(0.0%)
760(0.4%)
120(0.1%)
12(0.0%)
333(0.2%)
184(0.1%)
199255(98.0%)
203429 (100%) 0 (0%)
2 License.Status [factor] 1. Active 2. Inactive
74165(36.5%)
129264(63.5%)
203429 (100%) 0 (0%)
3 License.Creation.Date [Date] min : 1977-01-24 med : 2012-05-04 max : 2020-04-24 range : 43y 3m 0d 6977 distinct values 203429 (100%) 0 (0%)
4 License.Expiration.Date [Date] min : 2010-01-02 med : 2018-03-31 max : 2022-12-15 range : 12y 11m 13d 1578 distinct values 203394 (99.98%) 35 (0.02%)
5 License.Type [factor] 1. Business 2. Individual
130091(63.9%)
73338(36.0%)
203429 (100%) 0 (0%)

2001: Tobacco and Tow Trucks

Looking at the top 10 industries for new licenses in 2001 for the five boroughs:

  • “Tobacco Retail Dealer” licenses were the most frequent across the five boroughs
  • “Home Improvement Salesperson and Contractor” were top industries for Brooklyn and Staten Island.
  • “Tow Truck Driver” was in the top 10 for all the boroughs except for Manhattan.

2007 to 2009

2019

Visuals

This section has some graphical representations of the data.

New Licenses Created by Year

The following graph is the frequency of licenses created by year.

# Histogram of licenses created by year
freq <- data %>% filter(License.Status == "Active") %>%
  mutate(year = as.numeric(format(License.Creation.Date, '%Y'))) %>%
  group_by(year) %>% 
  tally()

# Basic barplot
p <- ggplot(data=freq, aes(x=year, y=n)) + 
  geom_bar(stat="identity", fill="steelblue") +
  labs(title = "Frequency of Licenses by Year") + 
  xlab("Year") +
  ylab("Count")
p

Bar Graph by Industry

The following graph is the licenses created for industries.

theme_set(theme_classic())

# Histogram on a Categorical variable
g <- ggplot(data, aes(Industry)) + coord_flip()
g + geom_bar(aes(fill=Address.Borough), width = 0.5) + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6)) + 
  labs(title="Histogram on Industry", 
       subtitle="Industry across NYC Borough") 

Circular Bargraph

The following graph depicts the frequency of licenses by industry for each of the 5 boroughs.

# https://www.r-graph-gallery.com/295-basic-circular-barplot.html was used to assist in making the circular histogram
data1 <- data.frame(
  id=seq(1,59),
  value=sample( seq(10,100), 59, replace=T)
)

data1 <- cbind(data1, "individual" = unique(data$Industry))

# ----- This section prepare a dataframe for labels ---- #
# Get the name and the y position of each label
label_data <- data1
 
# calculate the ANGLE of the labels
number_of_bar <- nrow(label_data)
angle <-  90 - 360 * (label_data$id-0.5) / number_of_bar     

# calculate the alignment of labels: right or left
# If I am on the left part of the plot, my labels have currently an angle < -90
label_data$hjust<-ifelse( angle < -90, 1, 0)
 
# flip angle BY to make them readable
label_data$angle<-ifelse(angle < -90, angle+180, angle)
# ----- ------------------------------------------- ---- #

 

theme_set(theme_classic())

# Histogram on a Categorical variable
g <- ggplot(data, aes(Industry)) + ylim(-40,120) + coord_polar()
g + geom_bar(aes(fill=Address.Borough), width = 0.5) + 

  # Add the labels, using the label_data dataframe that we have created before
  geom_text(data=label_data, 
            aes(x=id, y=value+10, label=individual, hjust=hjust), 
            color="black", fontface="bold",alpha=0.6, size=2.5, angle= label_data$angle, inherit.aes = FALSE ) +
  
 theme(
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank()
  )
Warning: Removed 209 rows containing missing values (geom_bar).

Shiny App

A small app was constructed to understand the data by year and borough. It is located here: https://logicalschema.shinyapps.io/NYCLicenses/

Conclusion

I was not able to reach a conclusion with regards to finding a causal relationship between recessions and license data. There were only two recessions within the scope of the Open NYC Data. Overall, it seemed that NYC’s license applications climbed within a year of a recession. Business licenses such as contractors and those associated with real estate flourished during downturns. In the future, I would like to cross reference minority owned businesses and borough in addition to a review of average mortgage applications.

I did enjoy learning how to use Shiny but was unable to reach conclusive observations about the data.