DRAFT IN PROGRESS

Do excuse the dust.

Welcome to a companion piece to the post “Inspection scores of Berkeley restaurants: An exercise in data manipulation and geospatial data visualization.” If you would like to relive my data cleaning saga or otherwise audit my work, you’re in the right place.

Loading packages

Let’s begin by loading packages I used in this analysis. If any of these packages are missing from your environment, you can use the package name as a string as the first argument in install.packages(package_name, dependencies=TRUE).

library('dplyr')
library('ggmap')
library('ggplot2')
library('kableExtra')
library('knitr')
library('leaflet')
library('leaflet.extras')
library('readxl')
library('sf')
library('stringr')
library('tidyr')

Importing data

The Berkeley restaurant inspections data

Let’s review my blurb about the restaurant inspections data from the main post.

“The City of Berkeley publishes the results of its most recent restaurant inspections through its Open Data portal. For this exercise, I will examine the file available on February 25, 2019. The file’s records/rows are the most recent inspections of hundreds of Berkeley restaurants and food vendors, and among its features/columns are inspection scores along criteria such as equipment contamination, food source, and personal hygiene. The most recent of the inspections that contributed to this file took place on February 21, 2019.”

I originally worked with a download from the City of Berkeley Open Data portal that I imported into R locally. The portal regularly updates the file provided; thus, the file downloaded by someone following along with this supplement in R may differ from the file I downloaded on February 25, 2019. Here, I provide code to import data from a copy of the original file that I uploaded to GitHub.

inspections = read.csv("https://raw.githubusercontent.com/ninopierre/berkeley-restaurant-inspections/master/Restaurant_Inspections.csv", stringsAsFactors = FALSE, header=TRUE) %>% 
  tbl_df() %>%
  rename(Inspection_ID = ï..W3ALCD) #change the inspection ID variable's abstruse name

Here are peeks at two angles of the imported data.

inspections %>%
  head(5) %>%
    kable('html') %>%
  kable_styling(position="center") %>%
  scroll_box(width = "100%") 
Inspection_ID Doing_Business_As Restaurant_Address Inspection_Date Major_Violation_Improper_Holding_Temperature Minor_Violation_Improper_Holding_Temperature Major_Violation_Inadequate_Cooking Minor_Violation_Inadequate_Cooking Major_Violation_Personal_Hygiene Minor_Violation_Personal_Hygiene Major_Violation_Contaminated_Equipment Minor_Violation_Contaminated_Equipment Major_Violation_Unsafe_Food_Source Minor_Violation_Unsafe_Food_Source InDbDate
FA0000793 LIK LIQUORS 2495 SACRAMENTO ST BERKELEY, CA (37.862346, -122.28105) 10/16/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0000193 FAT APPLE’S INC. 1346 M L KING JR WY BERKELEY, CA 10/16/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0000568 SEABREEZE MARKET & DELI 598 UNIVERSITY AVE BERKELEY, CA (37.866427, -122.305095) 11/29/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0001366 ALCHEMY COOPERATIVE INC. 1741 ALCATRAZ AVE BERKELEY, CA (37.848597, -122.272524) 12/13/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0000353 CHAAT CAFE 1902 UNIVERSITY AVE BERKELEY, CA (37.871509, -122.272955) 10/24/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
inspections %>% glimpse()
## Observations: 751
## Variables: 15
## $ Inspection_ID                                <chr> "FA0000793", "FA0...
## $ Doing_Business_As                            <chr> "LIK LIQUORS", "F...
## $ Restaurant_Address                           <chr> "2495 SACRAMENTO ...
## $ Inspection_Date                              <chr> "10/16/2018", "10...
## $ Major_Violation_Improper_Holding_Temperature <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Improper_Holding_Temperature <int> 0, 0, 0, 0, 0, 1,...
## $ Major_Violation_Inadequate_Cooking           <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Inadequate_Cooking           <int> 0, 0, 0, 0, 0, 0,...
## $ Major_Violation_Personal_Hygiene             <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Personal_Hygiene             <int> 0, 0, 0, 0, 0, 0,...
## $ Major_Violation_Contaminated_Equipment       <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Contaminated_Equipment       <int> 0, 0, 0, 0, 0, 0,...
## $ Major_Violation_Unsafe_Food_Source           <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Unsafe_Food_Source           <int> 0, 0, 0, 0, 0, 0,...
## $ InDbDate                                     <chr> "02/25/2019 05:00...

The Berkeley city council district polygon data

Recall that the analysis groups inspection data by inspected restaurant’s city council district.

I downloaded the district polygons from the “City Council District Boundaries” page of the City of Berkeley Open Data portal. The data are contained in four related files of different file extensions (.dbf, .prj, .shp, .shx) that must all be in the same directory before proper import into R is possible.

I uploaded a copies of the polygon data files I used to GitHub; download the files if you’d like to follow along interactively:

With the .shp file and its supporting files in place, let’s import the polygon information with st_read. You might find typing file.choose() into the console helpful for identifying the file path to the shapefile.

district_polygons_shp_file = "path_on_your_device.shp"

districts = st_read(district_polygons_shp_file)

A quick leaflet plot permits a quick check of the imported data.

district_palette <- colorFactor(
  palette = 'Dark2',
  domain =districts$district
)


leaflet() %>% 
  addProviderTiles("CartoDB") %>% 
  addPolygons(data =districts,
              label = ~district,
              color = ~district_palette(district),
              weight = 2
              )

Data saved from a geocode

The analysis involved parsing coordinate information from strings already provided in the inspections data file; notice the coordinate information at the end of most of these restaurant address strings.

inspections %>%
  select(Doing_Business_As, Restaurant_Address) %>%
  head(8) %>%
  kable('html') %>%
  kable_styling(position="center") %>%
  scroll_box(width = "100%") 
Doing_Business_As Restaurant_Address
LIK LIQUORS 2495 SACRAMENTO ST BERKELEY, CA (37.862346, -122.28105)
FAT APPLE’S INC. 1346 M L KING JR WY BERKELEY, CA
SEABREEZE MARKET & DELI 598 UNIVERSITY AVE BERKELEY, CA (37.866427, -122.305095)
ALCHEMY COOPERATIVE INC. 1741 ALCATRAZ AVE BERKELEY, CA (37.848597, -122.272524)
CHAAT CAFE 1902 UNIVERSITY AVE BERKELEY, CA (37.871509, -122.272955)
BERKELEY BOWL KITCHEN 2020 OREGON ST BERKELEY, CA (37.857544, -122.267497)
SHATTUCK MARKET 2441 SHATTUCK AVE BERKELEY, CA (37.865051, -122.267421)
BREADS OF INDIA 2448 SACRAMENTO ST BERKELEY, CA (37.862293, -122.281218)

However, it is clear that not every restaurant address string contained coordinate information.

As I later describe, for cases missing coordinate information, I used modified versions of their restaurant address strings in a geocode that furnished the missing coordinates. This method queries the Google Maps API, requires a registered key, and, because high volumes of geocode queries could result in charges, should not be performed needlessly. After I yielded what I sought from a Google Maps API query, I exported the output from R for future use (such that I wouldn’t need to repeat these queries in the future). The file is also available on GitHub for your use (in case you aren’t interested in making progress towards query charges or in registering a key).

In the case that you’d like to import the copy of data that I uploaded to GitHub, this code would be appropriate.

geocoded_coords = read.csv("https://raw.githubusercontent.com/ninopierre/berkeley-restaurant-inspections/master/geocoded_coords.csv",  stringsAsFactors = FALSE, header=TRUE)

The first few rows of the imported data look should like this.

geocoded_coords  %>%
  head() %>%
  kable('html') %>%
  kable_styling(position="center") %>%
  scroll_box(width = "100%") 
Doing_Business_As Restaurant_Address lon lat
ASHBY SUPERMARKET 1099/1643 2948 MARTIN LUTHER KING JR WY BERKELEY, CA -122.2714 37.85440
SUSHI CALIFORNIA 2033 MARTIN LUTHER KING JR WY BERKELEY, CA -122.2728 37.87115
CITY LEE MARKET 2700 MARTIN LUTHER KING JR WAY BERKELEY, CA -122.2720 37.85960
#1 GAS 1099/1616 1900 MARTIN LUTHER KING JR WY BERKELEY, CA -122.2736 37.87306
FAT APPLE’S INC. 1346 MARTIN LUTHER KING JR WY BERKELEY, CA -122.2743 37.88157
CHLOE CAFE 2080 MARTIN LUTHER KING JR WAY BERKELEY, CA -122.2731 37.87075

Learning the extent of missing data

I counted how many items in the inspections data file were imported as NA.

sum(is.na(inspections))
## [1] 0

While finding 0 entries imported as NA is a relief, I planned to still be thorough in my approach to data cleaning.

Removing repeated restaurant-address combinations

Because I wanted only one record per restaurant per address, I checked whether any restaurant-address combinations were repeated.

repeated_restaurant_address_combos = inspections %>%
  group_by(Doing_Business_As, Restaurant_Address) %>%
  summarize(count = n()) %>%
  filter(count>1)

repeated_restaurant_address_combos %>% 
  kable('html', caption="Restaurant-address combinations found to have multiple records") %>%
  kable_styling(position="center")
Restaurant-address combinations found to have multiple records
Doing_Business_As Restaurant_Address count
MI TIERRA FOODS #2 2082 SAN PABLO AVE BERKELEY, CA (37.868162, -122.291919) 3
SAFEWAY COMMUNITY MARKETS #2451 1850 SOLANO AVE BERKELEY, CA (37.891404, -122.278607) 4
SAFEWAY COMMUNITY MARKETS #2453 1550 SHATTUCK AVE BERKELEY, CA (37.878899, -122.269196) 4

Three restaurant-address combinations accounted for eleven data points instead of three data points.

I examined records with these restaurant-address combinations more closely, sorting the records by restaurant name, address, inspection date, and then inspection ID.

repeated_locations = inspections %>%
  filter(Doing_Business_As %in% repeated_restaurant_address_combos$Doing_Business_As) %>%
  arrange(Doing_Business_As, Restaurant_Address, Inspection_Date, Inspection_ID)

repeated_locations %>%
  kable('html') %>%
  kable_styling(full_width = T) %>%
  scroll_box(width = "100%", height = "500px") 
Inspection_ID Doing_Business_As Restaurant_Address Inspection_Date Major_Violation_Improper_Holding_Temperature Minor_Violation_Improper_Holding_Temperature Major_Violation_Inadequate_Cooking Minor_Violation_Inadequate_Cooking Major_Violation_Personal_Hygiene Minor_Violation_Personal_Hygiene Major_Violation_Contaminated_Equipment Minor_Violation_Contaminated_Equipment Major_Violation_Unsafe_Food_Source Minor_Violation_Unsafe_Food_Source InDbDate
FA0000280 MI TIERRA FOODS #2 2082 SAN PABLO AVE BERKELEY, CA (37.868162, -122.291919) 08/14/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0000280 MI TIERRA FOODS #2 2082 SAN PABLO AVE BERKELEY, CA (37.868162, -122.291919) 10/26/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0000280 MI TIERRA FOODS #2 2082 SAN PABLO AVE BERKELEY, CA (37.868162, -122.291919) 12/06/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0001775 SAFEWAY COMMUNITY MARKETS #2451 1850 SOLANO AVE BERKELEY, CA (37.891404, -122.278607) 09/27/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0001776 SAFEWAY COMMUNITY MARKETS #2451 1850 SOLANO AVE BERKELEY, CA (37.891404, -122.278607) 09/27/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0001777 SAFEWAY COMMUNITY MARKETS #2451 1850 SOLANO AVE BERKELEY, CA (37.891404, -122.278607) 09/27/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0001778 SAFEWAY COMMUNITY MARKETS #2451 1850 SOLANO AVE BERKELEY, CA (37.891404, -122.278607) 09/27/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0001779 SAFEWAY COMMUNITY MARKETS #2453 1550 SHATTUCK AVE BERKELEY, CA (37.878899, -122.269196) 09/28/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0001780 SAFEWAY COMMUNITY MARKETS #2453 1550 SHATTUCK AVE BERKELEY, CA (37.878899, -122.269196) 09/28/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0001781 SAFEWAY COMMUNITY MARKETS #2453 1550 SHATTUCK AVE BERKELEY, CA (37.878899, -122.269196) 09/28/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0001782 SAFEWAY COMMUNITY MARKETS #2453 1550 SHATTUCK AVE BERKELEY, CA (37.878899, -122.269196) 09/28/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM

Notes and plans in light of this output

  • MI TIERRA FOODS #2: All three inspection records at restaurants named MI TIERRA FOODS #2 had the same address and scores; they however had different inspection dates. I planned to drop records with this restaurant-address combination except for the most recent record (2 records dropped).

  • SAFEWAY COMMUNITY MARKETS #2451: All four inspection records at establishments named SAFEWAY COMMUNITY MARKETS #2451 had the same address, date, and scores. I planned to drop all but one record with this restaurant-address combination (3 records dropped).

  • SAFEWAY COMMUNITY MARKETS #2453: All four inspection records at establishments named SAFEWAY COMMUNITY MARKETS #2453 had the same address, date, and scores. I planned to drop all but one record with this restaurant-address combination (3 records dropped).

The following code results in a data set that counts every restaurant-address location only once.

inspections_uniquelocations = inspections %>%
  arrange(Doing_Business_As, Restaurant_Address, Inspection_Date, Inspection_ID) %>%
  group_by(Doing_Business_As, Restaurant_Address) %>%
  filter(row_number()==n()) %>%
  ungroup()

I then examined how the frequencies of restaurant-address combinations changed from the original data set to the new data set.

table((inspections %>%
  group_by(Doing_Business_As, Restaurant_Address) %>%
  summarize(count = n()))$count) %>% 
  kable('html', caption="A frequency table of restaurant-address combination counts in the original data set: 740 restaurant-address combinations appeared once; 3 combinations appeared multiple times") %>%
  kable_styling(position="center")
A frequency table of restaurant-address combination counts in the original data set: 740 restaurant-address combinations appeared once; 3 combinations appeared multiple times
Var1 Freq
1 740
3 1
4 2
table((inspections_uniquelocations %>%
  group_by(Doing_Business_As, Restaurant_Address) %>%
  summarize(count = n()))$count) %>% 
  kable('html', caption="A frequency table of restaurant-address combination counts in the new data set: 0 restaurant-address combinations appeared more than once") %>%
  kable_styling(position="center")
A frequency table of restaurant-address combination counts in the new data set: 0 restaurant-address combinations appeared more than once
Var1 Freq
1 743

The distribution of counts for restaurant-address combinations changed to what I sought: one per restaurant per address. Also, the number of records dropped from 751 to 743 as I expected.

Acquiring coordinates

Parsing coordinates from Restaurant_Address strings

Most records appeared to have coordinates embedded in the Restaurant_Address column.

inspections %>%
  select(Doing_Business_As, Restaurant_Address) %>%
  head(8) %>%
  kable('html') %>%
  kable_styling(position="center") %>%
  scroll_box(width = "100%") 
Doing_Business_As Restaurant_Address
LIK LIQUORS 2495 SACRAMENTO ST BERKELEY, CA (37.862346, -122.28105)
FAT APPLE’S INC. 1346 M L KING JR WY BERKELEY, CA
SEABREEZE MARKET & DELI 598 UNIVERSITY AVE BERKELEY, CA (37.866427, -122.305095)
ALCHEMY COOPERATIVE INC. 1741 ALCATRAZ AVE BERKELEY, CA (37.848597, -122.272524)
CHAAT CAFE 1902 UNIVERSITY AVE BERKELEY, CA (37.871509, -122.272955)
BERKELEY BOWL KITCHEN 2020 OREGON ST BERKELEY, CA (37.857544, -122.267497)
SHATTUCK MARKET 2441 SHATTUCK AVE BERKELEY, CA (37.865051, -122.267421)
BREADS OF INDIA 2448 SACRAMENTO ST BERKELEY, CA (37.862293, -122.281218)

Since the entire string wouldn’t be helpful for simple point-mapping, I used the pattern I observed in the Restaurant_Address strings to parse each address’s latitude and longitude where coordinates were available.

coords_pattern ="(\\d+.\\d+),\\s(-\\d+.\\d+)"
inspections_uniquelocations = inspections_uniquelocations %>%
  mutate(lat = str_match(Restaurant_Address, coords_pattern)[,2] %>% as.numeric(),
         lon =str_match(Restaurant_Address, coords_pattern)[,3] %>% as.numeric()
         )

Notice the new coordinate columns below.

inspections_uniquelocations %>%
  select(Doing_Business_As, Restaurant_Address, lat, lon) %>%
  head() %>% 
  kable('html', caption=NULL) %>%
  kable_styling(position="center")
Doing_Business_As Restaurant_Address lat lon
#1 GAS 1099/1616 1900 M L KING JR WY BERKELEY, CA NA NA
#POKI 3075 TELEGRAPH AVE BERKELEY, CA (37.854242, -122.259905) 37.85424 -122.2599
24 HOUR FITNESS #583 2072 ADDISON ST BERKELEY, CA (37.871082, -122.268954) 37.87108 -122.2690
24 HOUR FITNESS #704 1775 SOLANO AVE BERKELEY, CA (37.891296, -122.280682) 37.89130 -122.2807
2900 COLLEGE AVE CAFFE LLC 2900 COLLEGE AVE BERKELEY, CA (37.858356, -122.253179) 37.85836 -122.2532
7-ELEVEN #16192 1099/1615 2887 COLLEGE AVE BERKELEY, CA (37.858756, -122.253219) 37.85876 -122.2532

Coordinates were extracted where they were available, but the (a.b, -c.d) pattern clearly was not detected in all records.

I determined that the data set currently lacked coordinates for 12 locations (two pieces of missing data each).

sum(is.na(inspections_uniquelocations))
## [1] 24

I examined the restaurant address strings for records where this pattern was not detected.

inspections_uniquelocations %>%
  filter(is.na(lat)) %>%
  select(Doing_Business_As, Restaurant_Address) %>% 
  kable('html', caption="Records where the `(a.b, -c.d)` pattern was not detected in the `Restaurant_Address` string") %>%
  kable_styling(position="center")
Records where the (a.b, -c.d) pattern was not detected in the Restaurant_Address string
Doing_Business_As Restaurant_Address
#1 GAS 1099/1616 1900 M L KING JR WY BERKELEY, CA
ASHBY SUPERMARKET 1099/1643 2948 M L KING JR WY BERKELEY, CA
BAIANO PIZZERIA 1916 M L KING JR WAY BERKELEY, CA
CAFE NOSDOS 1930 M L KING JR WAY BERKELEY, CA
CHLOE CAFE 2080 M L KING JR WAY BERKELEY, CA
CITY LEE MARKET 2700 M L KING JR WAY BERKELEY, CA
FAT APPLE’S INC. 1346 M L KING JR WY BERKELEY, CA
GOLD LEAF CAFE 1947 M L KING JR WAY BERKELEY, CA
NEIGHBORS MARKET 1343 M L KING JR WAY BERKELEY, CA
SECRET SCOOP 1922 M L KING JR WAY BERKELEY, CA
SUSHI CALIFORNIA 2033 M L KING JR WY BERKELEY, CA
XTRA OIL CO/CHEVRON/THE ALAM 1201 THE BERKELEY, CA

This printout showed me that there was no other regex pattern for coordinates that I failed to capture; the reason for the failure to parse coordinates was

  • not because of multiple regex patterns but

  • because of the absence of coordinate information.

Also, it was very curious that so many of the records that lacked coordinate information were located on Martin Luther King, Jr. Way.

I decided to fill in the NAs by geocoding; however, before I did so, for future reference, I created a variable that would distinguish records by whether the original data set lacked coordinate information.

inspections_uniquelocations = inspections_uniquelocations %>%
  mutate(lacked_online_coords = is.na(lat))

Acquiring missing coordinates. Original approach: Geocoding

Note: readers following along in R who would like to both skip registration of a Google Maps API key and avoid possible fees should apply code in the next “alternate approach” section instead of this section.

To acquire the missing coordinates, I used the mutate_geocode function from the ggmap package. This method queries the Google Maps API, requires a registered key, and can result in charges if query volume is large.

I first created a data frame that had

  • rows only for restaurants that were missing coordinate information (geocoding for all records is unnecessary and could result in increased charges)

  • columns for only the restaurant name and the restaurant address, the primary substrate for the geocoding.

lacking_online_coords = inspections_uniquelocations %>%
  filter(lacked_online_coords==TRUE) %>%
  select(Doing_Business_As, Restaurant_Address) %>%
  as.data.frame()

Let’s inspect that data frame.

lacking_online_coords %>%
  rmarkdown::paged_table()

Since Google Maps might not recognize the location described by those Restaurant_Address strings (especially that “1201 THE\nBERKELEY” in the 12th record refers to “1201 The Alameda, Berkeley”), I performed some string manipulation for clarity.

lacking_online_coords$Restaurant_Address = lacking_online_coords$Restaurant_Address %>% 
  str_replace_all("M L KING", "MARTIN LUTHER KING") %>%
  str_replace_all("\\n", " ") %>%
  str_replace_all("1201 THE", "1201 THE ALAMEDA")

Here’s what the strings became after.

lacking_online_coords%>%
  rmarkdown::paged_table()

Now armed with strings more similar to what I’d type into a Google Maps search, I proceeded with the geocode. Then I filled in the coordinate NAs in the main inspections tibble with the newly acquired coordinates, which involved a left join.

register_google(key=my_key)

lacking_online_coords = lacking_online_coords %>% 
  mutate_geocode(Restaurant_Address) # geocode yields columns lat and lon

inspections_allcoords = left_join(x=inspections_uniquelocations, 
                                  y=lacking_online_coords,
                                  by = "Doing_Business_As"
                                  ) %>%
  mutate(lat = coalesce(lat.x, lat.y),
         lon = coalesce(lon.x, lon.y)) %>%
  select(-lat.x, -lon.x, -lat.y, -lon.y, -Restaurant_Address.y) %>% # dropped extra variables generated by the join
  rename(Restaurant_Address = Restaurant_Address.x)

Acquiring missing coordinates. Alternate approach to sidestep key registration and possible fees: Joining with saved geocoded data

Earlier in the supplement, I provided code to import the output of the approach above. Here, I left-join that data with the inspections data on the left.

inspections_allcoords = left_join(x=inspections_uniquelocations, 
                                  y=geocoded_coords, 
                                  by = "Doing_Business_As"
                                  ) %>%
  mutate(lat = coalesce(lat.x,lat.y),
         lon = coalesce(lon.x,lon.y)) %>%
  select(-lat.x, -lon.x, -lat.y, -lon.y, -Restaurant_Address.y)

Inspecting final coordinate data

At this point, the data set contained no more missing coordinate data.

sum(is.na(inspections_allcoords))
## [1] 0

This was the current state of the first six rows (with newly complete coordinate data in the rightmost columns).

inspections_allcoords %>% 
  head() %>%
  rmarkdown::paged_table()

I mapped the points as a quality check.

attention_palette = colorFactor(c('dodgerblue', 'maroon'),
                                domain = c('FALSE', 'TRUE'))

leaflet() %>%
  addProviderTiles("CartoDB") %>%
  addCircleMarkers(data = inspections_allcoords,
                   lng = ~lon,
                   lat = ~lat,
                   label = ~Doing_Business_As,
                   radius = 2,
                   color = ~attention_palette(lacked_online_coords),
                   fillOpacity = 0.2) %>%
  addLegend(data = inspections_allcoords,
            pal = attention_palette,
            values = c(FALSE, TRUE),
            title = "Whether a geocode was necessary"
            ) %>%
  addResetMapButton()

The locations of the points made sense: increased density of points could be seen along the highly commercial segments of San Pablo, Shattuck, Solano, Telegraph, and University Avenues.

In the map above, I used color to distinguish the locations that lacked coordinate information in the original file. All of those locations–even the single location without a street address on Martin Luther King, Jr. Way–curiously fell on the same line.