Summary

This document lays out the process of identifying opportunities to expand SNAP retail offerings using open data on business licenses from the City of Philadelphia’s Department of Licenses and Inspections and SNAP data from the United States Department of Agriculture. Because this document was produced for the West Philadelphia Promise Zone, it highlights that geography. However, the code below could easily be modified to accommodate a different geography.

USDA data on SNAP retailers is not always consistent with the reality on the ground. Some stores listed as accepting SNAP are no longer open. Other stores are listed in the USDA database as no longer registered as SNAP retailers, but, when visited in person, appear to accept SNAP. This analysis compares business license registrations to SNAP retailer registrations in an attempt to corroborate the USDA dataset. By comparing currently-registered Philadelphia businesses to historic SNAP registration, we can identify: 1) Current business licenses matched to current SNAP registrations, 2) Current business licenses matched to expired SNAP registrations, 3) Current business licenses with no matching SNAP registration, or 4) Current SNAP registrations with no matching business license.

This document lays out a straightforward methodology for importing, wrangling, and joining business license and SNAP data. It also produces a set of maps to highlight SNAP access in the Promise Zone, as well as datasets summarizing likely current SNAP retailers in the Promise Zone and identifying retailers who do not currently accept snap but likely could be convinced to do so.

Business licenses matched to current SNAP registrations indicate a likely active SNAP retailer. Business licenses matched to an expired SNAP license indicate a possible opportunity to expand SNAP retail access: in many of these cases, a new food retail location has replaced an old one, or a SNAP registration has simply been allowed to expire. Current SNAP registrations with no matching business license are also worth investigating, but there are so few of these that they are a low priority.

Finally, current business licenses with no matching SNAP registration are worth further examination. It would be very helpful to identify a connection between the type of license(s) that a business has and the potential for it to accept SNAP. Unfortunately, I have not been able to find a consistent pattern; there is no Philadelphia business license common to all SNAP retailers. Additionally, my initial attempts to join NAICS codes to business licenses has not yielded any useful findings. Therefore, it would be extremely beneficial if some way could be found of identifying those retailers with the potential to accept SNAP who have never been registered in the USDA’s database.

Steps

1. Step Up Workspace

First we need to set up our workspace. For this project, we only need two things: 1) to call the R packages that we’ll need for everything below, and 2) to specify the working directory–the place on our computer where all of our files will go.

If you run this analysis yourself, the library will not change. However, you will have to set the filepath for the working directory so that it’s appropriate for your own computer.

library(tidyverse, quietly = T) #This is a standard package in R. You'll need it for almost anything you do in R.
library(sf, quietly = T) #This package allows you to work with spatial data. Most of our data are spatial, so this will be important.
library(mapview, quietly = T) #This package allows us to map things quickly and easily.
library(tidygeocoder, quietly = T) #This package will help us geocode data for which we don't have latitude/longitude coordinates. This will allow us to make those data spatial like all the others.
library(lubridate, quietly = T) #This is to calculate license durations later on.
library(downloadthis, quietly = T) #For embedding downloadable .csv's later in the document
require("knitr") #This and the following line of code will set out working directory. Mine is a folder where I keep all of my business data work. 
#IMPORTANT: MAKE SURE YOU SET THIS PROPERLY. If you don't, none of your data will load. This should be one of the first places you look when you troubleshoot your code.
opts_knit$set(root.dir = "C:/Users/Nissim.Lebovits/OneDrive - City of Philadelphia/Desktop/Transition Documents/Data/R Scripts and Datasets/WEO")

2. Download and Clean Business License Data

Next, we’ll import our business license data and clean it. Before importing your data, you’ll want to download it as a .csv (a spreadsheet, basically) from the following OpenData Philly link: https://www.opendataphilly.org/dataset/licenses-and-inspections-business-licenses

There are other filetypes we could use for this download, but the .csv does job. It contains the data that we need and, in contrast to a shapefile (more on this later), doesn’t take up too much memory. So, download the .csv file and make sure that it’s stored in the same folder as you’ve indicated in your working directory (see step 1).

Once we’ve downloaded it, we’ll import the .csv into R and clean it. This means making sure that the data include only what we’re looking for (so, for this step, getting rid of rental licenses), that there are no errors in the data, and that it’s formatted in a way that will be most useful for us.

pz <- read_sf("C:/Users/Nissim.Lebovits/OneDrive - City of Philadelphia/Desktop/Transition Documents/Data/R Scripts and Datasets/General Boundaries/Shapefiles/PZ_Shapefile",
              "PZ_Boundaries",
              stringsAsFactors = FALSE) |>
  st_transform(crs = st_crs("EPSG:4326"))


phl_licenses = read.csv("./Business Licenses/business_licenses.csv") |>
  filter(licensestatus == "Active",
         !is.na(lng),
         !is.na(lat),
         licensetype != "Rental") |> #remove rental licenses
  dplyr::select(-c(the_geom, the_geom_webmercator,
                   unit_type, unit_num,
                   numberofunits, owneroccupied,
                   geocode_x, geocode_y, council_district))

#filter for only in the PZ
phl_licenses_sf = st_as_sf(phl_licenses, coords = c("lng", "lat"), crs = st_crs("EPSG:4326"))

pz_licenses_sf = phl_licenses_sf[pz, ]

pz_licenses_clean = pz_licenses_sf |>
  group_by(opa_account_num) |>
  mutate(all_licenses = paste(licensetype, collapse = " | "),
            license_duration = if (is.na(expirationdate) | expirationdate == "" | expirationdate == " ") {
              as.duration(interval(initialissuedate, today()))
            } else {
              as.duration(interval(initialissuedate, expirationdate))
            }) |>
  dplyr::select(business_name, legalname, address, zip, censustract, opa_account_num,
                initialissuedate, expirationdate, all_licenses, license_duration, posse_jobid)
            
pz_licenses_clean$address = pz_licenses_clean$address |>
  tolower()|>
  str_replace_all("street", "st") |>
  str_replace_all("avenue", "ave") |>
  str_replace_all("  ", " ") |>
  str_remove_all("[[:punct:]]")

pz_licenses_clean = pz_licenses_clean[!duplicated(pz_licenses_clean$opa_account_num),]

3. Import and Clean SNAP Data

We’ll import SNAP data that we’ve downloaded from the USDA here: https://www.fns.usda.gov/snap/retailer/data. You’ll want to click on “Download currently authorized SNAP retailers”. Once we’ve downloaded it, we’ll read it into R and then do some cleaning. That includes: 1) formatting the address column and cleaning the data in i so it’s compatible with our business license dataset 2) renaming columns so that they will be clearer when we join it with the business license dataset 3) filtering for active licenses only 4) combining all the historic SNAP data at each address

#now import SNAP data, paste() addresses
phl_snap_points = read.csv("C:/Users/Nissim.Lebovits/OneDrive - City of Philadelphia/Desktop/Transition Documents/Data/R Scripts and Datasets/SNAP/SNAP_Data.csv") |>
  filter(State == "PA",
         City == "Philadelphia") 

#create single address column for matching
phl_snap_points = phl_snap_points |>
  mutate(full_address = tolower(paste(phl_snap_points$Street.Number, phl_snap_points$Street.Name, sep = " ")))

#convert to sf for filtering for pz
phl_snap_points_clean = phl_snap_points |>
  filter(!is.na(Latitude),
         !is.na(Longitude))

phl_snap_points_sf = st_as_sf(phl_snap_points_clean,
                              coords = c("Longitude", "Latitude"),
                              crs = st_crs("EPSG:4326"))

#filter for pz
pz_snap_points = phl_snap_points_sf[pz, ]|>
  as.data.frame() |>
  dplyr::select(Store.Name,
                End.Date,
                full_address,
                -geometry) |>
  rename(snap_business_name = Store.Name,
         snap_end_date = End.Date,
         snap_address = full_address)

pz_snap_points$snap_address = pz_snap_points$snap_address |>
  tolower() |>
  str_replace_all("street", "st") |>
  str_replace_all("avenue", "ave") |>
  str_replace_all("  ", " ") |>
  str_remove_all("[[:punct:]]")

pz_snap_points$snap_end_date[pz_snap_points$snap_end_date == " "] = "active"

pz_snap_points = pz_snap_points |>
  group_by(snap_address) |>
  summarize(historic_snap_names = paste(snap_business_name, collapse = " | "),
            historic_snap_dates = paste(snap_end_date, collapse = " | "))

4. Combine Business License and SNAP Data

We can now combine the Lancaster Avenue business license and SNAP data to create one dataset. We can simply join the two using their address columns. Then, we can create some new columns based on the combined dataset: 1) a snap_status column will tell us if the site accepts, used to accept, or has never accepted SNAP 2) a business_status column will tell us whether the business is open, closed, or possibly an error

We’ll also clean the dataset further to allow us to geocode any sites without lat/long coordinates. This will allow us to map the data correctly to confirm that everything looks the way it should.

joined = pz_licenses_clean |>
  full_join(pz_snap_points, 
            by = c(address = "snap_address"))

joined$snap_status = case_when(
  is.na(joined$historic_snap_dates) ~ "no snap ever",
  str_detect(joined$historic_snap_dates, "active") ~ "active snap",
  TRUE ~ "inactive snap"
)

joined$business_status = case_when(
  is.na(joined$business_name) & joined$snap_status == "inactive snap" ~ "closed",
  is.na(joined$business_name) & joined$snap_status == "active snap" ~ "mismatch?",
  TRUE ~ "open"
)

joined$state = "PA"
joined$city = "Philadelphia"

#still need to geocode any snap sites with no business license attached to them
#should be a simple ifelse statement

for_geocode = joined[is.na(joined$business_name), ]

for_geocode = geocode(for_geocode,
               street = "address",
              city = "city",
              state = "state")

for_geocode = for_geocode |>
                dplyr::select(-geometry) |>
                st_as_sf(coords = c("long", "lat"), crs = st_crs("EPSG:4326"))

joined = rbind(joined[!is.na(joined$business_name), ], for_geocode)

5. Produce Map

Here, we’ll currently map all open businesses in the Promise Zone that either currently offer SNAP or used to offer SNAP. These data can be used to target outreach to increase retailer acceptance of SNAP.

joined |>
  filter(snap_status %in% c("active snap", "inactive snap"),
         business_status == "open") |>
  mapview(zcol = "snap_status", legend = T)

6. Embed Full Dataset for Download

Finally, we’ll pass the data through a function that will allow readers to download a simple Excel sheet containing the raw (non-spatial) data on the businesses shown in the map above.

joined |>
  filter(snap_status %in% c("active snap", "inactive snap"),
         business_status == "open") |>
    download_this(
    output_name = "Promise Zone SNAP Retail_Active and Inactive",
    output_extension = ".xlsx",
    button_label = "Download Promise Zone SNAP Retail Data",
    button_type = "warning",
    has_icon = TRUE,
    icon = "fa fa-save")