DRAFT IN PROGRESS

Do excuse the dust.

Welcome to a companion piece to the post “Inspection scores of Berkeley restaurants: An exercise in data manipulation and geospatial data visualization.” If you would like to relive my data cleaning saga or otherwise audit my work, you’re in the right place.

Loading packages

Let’s begin by loading packages I used in this analysis. If any of these packages are missing from your environment, you can use the package name as a string as the first argument in install.packages(package_name, dependencies=TRUE).

library('dplyr')
library('ggmap')
library('ggplot2')
library('kableExtra')
library('knitr')
library('leaflet')
library('leaflet.extras')
library('readxl')
library('sf')
library('stringr')
library('tidyr')

Importing data

The Berkeley restaurant inspections data

Let’s review my blurb about the restaurant inspections data from the main post.

“The City of Berkeley publishes the results of its most recent restaurant inspections through its Open Data portal. For this exercise, I will examine the file available on February 25, 2019. The file’s records/rows are the most recent inspections of hundreds of Berkeley restaurants and food vendors, and among its features/columns are inspection scores along criteria such as equipment contamination, food source, and personal hygiene. The most recent of the inspections that contributed to this file took place on February 21, 2019.”

I originally worked with a download from the City of Berkeley Open Data portal that I imported into R locally. The portal regularly updates the file provided; thus, the file downloaded by someone following along with this supplement in R may differ from the file I downloaded on February 25, 2019. Here, I provide code to import data from a copy of the original file that I uploaded to GitHub.

inspections = read.csv("https://raw.githubusercontent.com/ninopierre/berkeley-restaurant-inspections/master/Restaurant_Inspections.csv", stringsAsFactors = FALSE, header=TRUE) %>% 
  tbl_df() %>%
  rename(Inspection_ID = ï..W3ALCD) #change the inspection ID variable's abstruse name

Here are peeks at two angles of the imported data.

inspections %>%
  head(5) %>%
    kable('html') %>%
  kable_styling(position="center") %>%
  scroll_box(width = "100%")

Inspection_ID	Doing_Business_As	Restaurant_Address	Inspection_Date	InDbDate
FA0000793	LIK LIQUORS	2495 SACRAMENTO ST BERKELEY, CA (37.862346, -122.28105)	10/16/2018	02/25/2019 05:00:02 AM
FA0000193	FAT APPLE’S INC.	1346 M L KING JR WY BERKELEY, CA	10/16/2018	02/25/2019 05:00:02 AM
FA0000568	SEABREEZE MARKET & DELI	598 UNIVERSITY AVE BERKELEY, CA (37.866427, -122.305095)	11/29/2018	02/25/2019 05:00:02 AM
FA0001366	ALCHEMY COOPERATIVE INC.	1741 ALCATRAZ AVE BERKELEY, CA (37.848597, -122.272524)	12/13/2018	02/25/2019 05:00:02 AM
FA0000353	CHAAT CAFE	1902 UNIVERSITY AVE BERKELEY, CA (37.871509, -122.272955)	10/24/2018	02/25/2019 05:00:02 AM

inspections %>% glimpse()

## Observations: 751
## Variables: 15
## $ Inspection_ID                                <chr> "FA0000793", "FA0...
## $ Doing_Business_As                            <chr> "LIK LIQUORS", "F...
## $ Restaurant_Address                           <chr> "2495 SACRAMENTO ...
## $ Inspection_Date                              <chr> "10/16/2018", "10...
## $ Major_Violation_Improper_Holding_Temperature <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Improper_Holding_Temperature <int> 0, 0, 0, 0, 0, 1,...
## $ Major_Violation_Inadequate_Cooking           <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Inadequate_Cooking           <int> 0, 0, 0, 0, 0, 0,...
## $ Major_Violation_Personal_Hygiene             <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Personal_Hygiene             <int> 0, 0, 0, 0, 0, 0,...
## $ Major_Violation_Contaminated_Equipment       <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Contaminated_Equipment       <int> 0, 0, 0, 0, 0, 0,...
## $ Major_Violation_Unsafe_Food_Source           <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Unsafe_Food_Source           <int> 0, 0, 0, 0, 0, 0,...
## $ InDbDate                                     <chr> "02/25/2019 05:00...

The Berkeley city council district polygon data

Recall that the analysis groups inspection data by inspected restaurant’s city council district.

I downloaded the district polygons from the “City Council District Boundaries” page of the City of Berkeley Open Data portal. The data are contained in four related files of different file extensions (.dbf, .prj, .shp, .shx) that must all be in the same directory before proper import into R is possible.

I uploaded a copies of the polygon data files I used to GitHub; download the files if you’d like to follow along interactively:

With the .shp file and its supporting files in place, let’s import the polygon information with st_read. You might find typing file.choose() into the console helpful for identifying the file path to the shapefile.

district_polygons_shp_file = "path_on_your_device.shp"

districts = st_read(district_polygons_shp_file)

A quick leaflet plot permits a quick check of the imported data.

district_palette <- colorFactor(
  palette = 'Dark2',
  domain =districts$district
)


leaflet() %>% 
  addProviderTiles("CartoDB") %>% 
  addPolygons(data =districts,
              label = ~district,
              color = ~district_palette(district),
              weight = 2
              )

Data saved from a geocode

The analysis involved parsing coordinate information from strings already provided in the inspections data file; notice the coordinate information at the end of most of these restaurant address strings.

inspections %>%
  select(Doing_Business_As, Restaurant_Address) %>%
  head(8) %>%
  kable('html') %>%
  kable_styling(position="center") %>%
  scroll_box(width = "100%")

Doing_Business_As	Restaurant_Address
LIK LIQUORS	2495 SACRAMENTO ST BERKELEY, CA (37.862346, -122.28105)
FAT APPLE’S INC.	1346 M L KING JR WY BERKELEY, CA
SEABREEZE MARKET & DELI	598 UNIVERSITY AVE BERKELEY, CA (37.866427, -122.305095)
ALCHEMY COOPERATIVE INC.	1741 ALCATRAZ AVE BERKELEY, CA (37.848597, -122.272524)
CHAAT CAFE	1902 UNIVERSITY AVE BERKELEY, CA (37.871509, -122.272955)
BERKELEY BOWL KITCHEN	2020 OREGON ST BERKELEY, CA (37.857544, -122.267497)
SHATTUCK MARKET	2441 SHATTUCK AVE BERKELEY, CA (37.865051, -122.267421)
BREADS OF INDIA	2448 SACRAMENTO ST BERKELEY, CA (37.862293, -122.281218)

However, it is clear that not every restaurant address string contained coordinate information.

As I later describe, for cases missing coordinate information, I used modified versions of their restaurant address strings in a geocode that furnished the missing coordinates. This method queries the Google Maps API, requires a registered key, and, because high volumes of geocode queries could result in charges, should not be performed needlessly. After I yielded what I sought from a Google Maps API query, I exported the output from R for future use (such that I wouldn’t need to repeat these queries in the future). The file is also available on GitHub for your use (in case you aren’t interested in making progress towards query charges or in registering a key).

In the case that you’d like to import the copy of data that I uploaded to GitHub, this code would be appropriate.

geocoded_coords = read.csv("https://raw.githubusercontent.com/ninopierre/berkeley-restaurant-inspections/master/geocoded_coords.csv",  stringsAsFactors = FALSE, header=TRUE)

The first few rows of the imported data look should like this.

geocoded_coords  %>%
  head() %>%
  kable('html') %>%
  kable_styling(position="center") %>%
  scroll_box(width = "100%")

Doing_Business_As	Restaurant_Address	lon	lat
ASHBY SUPERMARKET 1099/1643	2948 MARTIN LUTHER KING JR WY BERKELEY, CA	-122.2714	37.85440
SUSHI CALIFORNIA	2033 MARTIN LUTHER KING JR WY BERKELEY, CA	-122.2728	37.87115
CITY LEE MARKET	2700 MARTIN LUTHER KING JR WAY BERKELEY, CA	-122.2720	37.85960
#1 GAS 1099/1616	1900 MARTIN LUTHER KING JR WY BERKELEY, CA	-122.2736	37.87306
FAT APPLE’S INC.	1346 MARTIN LUTHER KING JR WY BERKELEY, CA	-122.2743	37.88157
CHLOE CAFE	2080 MARTIN LUTHER KING JR WAY BERKELEY, CA	-122.2731	37.87075

Learning the extent of missing data

I counted how many items in the inspections data file were imported as NA.

sum(is.na(inspections))

## [1] 0

While finding 0 entries imported as NA is a relief, I planned to still be thorough in my approach to data cleaning.

Removing repeated restaurant-address combinations

Because I wanted only one record per restaurant per address, I checked whether any restaurant-address combinations were repeated.

repeated_restaurant_address_combos = inspections %>%
  group_by(Doing_Business_As, Restaurant_Address) %>%
  summarize(count = n()) %>%
  filter(count>1)

repeated_restaurant_address_combos %>% 
  kable('html', caption="Restaurant-address combinations found to have multiple records") %>%
  kable_styling(position="center")

Restaurant-address combinations found to have multiple records
Doing_Business_As	Restaurant_Address	count
MI TIERRA FOODS #2	2082 SAN PABLO AVE BERKELEY, CA (37.868162, -122.291919)	3
SAFEWAY COMMUNITY MARKETS #2451	1850 SOLANO AVE BERKELEY, CA (37.891404, -122.278607)	4
SAFEWAY COMMUNITY MARKETS #2453	1550 SHATTUCK AVE BERKELEY, CA (37.878899, -122.269196)	4

Three restaurant-address combinations accounted for eleven data points instead of three data points.

I examined records with these restaurant-address combinations more closely, sorting the records by restaurant name, address, inspection date, and then inspection ID.

repeated_locations = inspections %>%
  filter(Doing_Business_As %in% repeated_restaurant_address_combos$Doing_Business_As) %>%
  arrange(Doing_Business_As, Restaurant_Address, Inspection_Date, Inspection_ID)

repeated_locations %>%
  kable('html') %>%
  kable_styling(full_width = T) %>%
  scroll_box(width = "100%", height = "500px")

Inspection_ID	Doing_Business_As	Restaurant_Address	Inspection_Date	InDbDate
FA0000280	MI TIERRA FOODS #2	2082 SAN PABLO AVE BERKELEY, CA (37.868162, -122.291919)	08/14/2018	02/25/2019 05:00:02 AM
FA0000280	MI TIERRA FOODS #2	2082 SAN PABLO AVE BERKELEY, CA (37.868162, -122.291919)	10/26/2018	02/25/2019 05:00:02 AM
FA0000280	MI TIERRA FOODS #2	2082 SAN PABLO AVE BERKELEY, CA (37.868162, -122.291919)	12/06/2018	02/25/2019 05:00:02 AM
FA0001775	SAFEWAY COMMUNITY MARKETS #2451	1850 SOLANO AVE BERKELEY, CA (37.891404, -122.278607)	09/27/2018	02/25/2019 05:00:02 AM
FA0001776	SAFEWAY COMMUNITY MARKETS #2451	1850 SOLANO AVE BERKELEY, CA (37.891404, -122.278607)	09/27/2018	02/25/2019 05:00:02 AM
FA0001777	SAFEWAY COMMUNITY MARKETS #2451	1850 SOLANO AVE BERKELEY, CA (37.891404, -122.278607)	09/27/2018	02/25/2019 05:00:02 AM
FA0001778	SAFEWAY COMMUNITY MARKETS #2451	1850 SOLANO AVE BERKELEY, CA (37.891404, -122.278607)	09/27/2018	02/25/2019 05:00:02 AM
FA0001779	SAFEWAY COMMUNITY MARKETS #2453	1550 SHATTUCK AVE BERKELEY, CA (37.878899, -122.269196)	09/28/2018	02/25/2019 05:00:02 AM
FA0001780	SAFEWAY COMMUNITY MARKETS #2453	1550 SHATTUCK AVE BERKELEY, CA (37.878899, -122.269196)	09/28/2018	02/25/2019 05:00:02 AM
FA0001781	SAFEWAY COMMUNITY MARKETS #2453	1550 SHATTUCK AVE BERKELEY, CA (37.878899, -122.269196)	09/28/2018	02/25/2019 05:00:02 AM
FA0001782	SAFEWAY COMMUNITY MARKETS #2453	1550 SHATTUCK AVE BERKELEY, CA (37.878899, -122.269196)	09/28/2018	02/25/2019 05:00:02 AM

Notes and plans in light of this output

MI TIERRA FOODS #2: All three inspection records at restaurants named MI TIERRA FOODS #2 had the same address and scores; they however had different inspection dates. I planned to drop records with this restaurant-address combination except for the most recent record (2 records dropped).
SAFEWAY COMMUNITY MARKETS #2451: All four inspection records at establishments named SAFEWAY COMMUNITY MARKETS #2451 had the same address, date, and scores. I planned to drop all but one record with this restaurant-address combination (3 records dropped).
SAFEWAY COMMUNITY MARKETS #2453: All four inspection records at establishments named SAFEWAY COMMUNITY MARKETS #2453 had the same address, date, and scores. I planned to drop all but one record with this restaurant-address combination (3 records dropped).

The following code results in a data set that counts every restaurant-address location only once.

inspections_uniquelocations = inspections %>%
  arrange(Doing_Business_As, Restaurant_Address, Inspection_Date, Inspection_ID) %>%
  group_by(Doing_Business_As, Restaurant_Address) %>%
  filter(row_number()==n()) %>%
  ungroup()

I then examined how the frequencies of restaurant-address combinations changed from the original data set to the new data set.

table((inspections %>%
  group_by(Doing_Business_As, Restaurant_Address) %>%
  summarize(count = n()))$count) %>% 
  kable('html', caption="A frequency table of restaurant-address combination counts in the original data set: 740 restaurant-address combinations appeared once; 3 combinations appeared multiple times") %>%
  kable_styling(position="center")

A frequency table of restaurant-address combination counts in the original data set: 740 restaurant-address combinations appeared once; 3 combinations appeared multiple times
Var1	Freq
1	740
3	1
4	2

table((inspections_uniquelocations %>%
  group_by(Doing_Business_As, Restaurant_Address) %>%
  summarize(count = n()))$count) %>% 
  kable('html', caption="A frequency table of restaurant-address combination counts in the new data set: 0 restaurant-address combinations appeared more than once") %>%
  kable_styling(position="center")

A frequency table of restaurant-address combination counts in the new data set: 0 restaurant-address combinations appeared more than once
Var1	Freq
1	743

The distribution of counts for restaurant-address combinations changed to what I sought: one per restaurant per address. Also, the number of records dropped from 751 to 743 as I expected.

Acquiring coordinates

Parsing coordinates from `Restaurant_Address` strings

Most records appeared to have coordinates embedded in the Restaurant_Address column.

inspections %>%
  select(Doing_Business_As, Restaurant_Address) %>%
  head(8) %>%
  kable('html') %>%
  kable_styling(position="center") %>%
  scroll_box(width = "100%")

Doing_Business_As	Restaurant_Address
LIK LIQUORS	2495 SACRAMENTO ST BERKELEY, CA (37.862346, -122.28105)
FAT APPLE’S INC.	1346 M L KING JR WY BERKELEY, CA
SEABREEZE MARKET & DELI	598 UNIVERSITY AVE BERKELEY, CA (37.866427, -122.305095)
ALCHEMY COOPERATIVE INC.	1741 ALCATRAZ AVE BERKELEY, CA (37.848597, -122.272524)
CHAAT CAFE	1902 UNIVERSITY AVE BERKELEY, CA (37.871509, -122.272955)
BERKELEY BOWL KITCHEN	2020 OREGON ST BERKELEY, CA (37.857544, -122.267497)
SHATTUCK MARKET	2441 SHATTUCK AVE BERKELEY, CA (37.865051, -122.267421)
BREADS OF INDIA	2448 SACRAMENTO ST BERKELEY, CA (37.862293, -122.281218)

Since the entire string wouldn’t be helpful for simple point-mapping, I used the pattern I observed in the Restaurant_Address strings to parse each address’s latitude and longitude where coordinates were available.

coords_pattern ="(\\d+.\\d+),\\s(-\\d+.\\d+)"
inspections_uniquelocations = inspections_uniquelocations %>%
  mutate(lat = str_match(Restaurant_Address, coords_pattern)[,2] %>% as.numeric(),
         lon =str_match(Restaurant_Address, coords_pattern)[,3] %>% as.numeric()
         )

Notice the new coordinate columns below.

inspections_uniquelocations %>%
  select(Doing_Business_As, Restaurant_Address, lat, lon) %>%
  head() %>% 
  kable('html', caption=NULL) %>%
  kable_styling(position="center")

Doing_Business_As	Restaurant_Address	lat	lon
#1 GAS 1099/1616	1900 M L KING JR WY BERKELEY, CA	NA	NA
#POKI	3075 TELEGRAPH AVE BERKELEY, CA (37.854242, -122.259905)	37.85424	-122.2599
24 HOUR FITNESS #583	2072 ADDISON ST BERKELEY, CA (37.871082, -122.268954)	37.87108	-122.2690
24 HOUR FITNESS #704	1775 SOLANO AVE BERKELEY, CA (37.891296, -122.280682)	37.89130	-122.2807
2900 COLLEGE AVE CAFFE LLC	2900 COLLEGE AVE BERKELEY, CA (37.858356, -122.253179)	37.85836	-122.2532
7-ELEVEN #16192 1099/1615	2887 COLLEGE AVE BERKELEY, CA (37.858756, -122.253219)	37.85876	-122.2532

Coordinates were extracted where they were available, but the (a.b, -c.d) pattern clearly was not detected in all records.

I determined that the data set currently lacked coordinates for 12 locations (two pieces of missing data each).

sum(is.na(inspections_uniquelocations))

## [1] 24

I examined the restaurant address strings for records where this pattern was not detected.

inspections_uniquelocations %>%
  filter(is.na(lat)) %>%
  select(Doing_Business_As, Restaurant_Address) %>% 
  kable('html', caption="Records where the `(a.b, -c.d)` pattern was not detected in the `Restaurant_Address` string") %>%
  kable_styling(position="center")

Records where the `(a.b, -c.d)` pattern was not detected in the `Restaurant_Address` string
Doing_Business_As	Restaurant_Address
#1 GAS 1099/1616	1900 M L KING JR WY BERKELEY, CA
ASHBY SUPERMARKET 1099/1643	2948 M L KING JR WY BERKELEY, CA
BAIANO PIZZERIA	1916 M L KING JR WAY BERKELEY, CA
CAFE NOSDOS	1930 M L KING JR WAY BERKELEY, CA
CHLOE CAFE	2080 M L KING JR WAY BERKELEY, CA
CITY LEE MARKET	2700 M L KING JR WAY BERKELEY, CA
FAT APPLE’S INC.	1346 M L KING JR WY BERKELEY, CA
GOLD LEAF CAFE	1947 M L KING JR WAY BERKELEY, CA
NEIGHBORS MARKET	1343 M L KING JR WAY BERKELEY, CA
SECRET SCOOP	1922 M L KING JR WAY BERKELEY, CA
SUSHI CALIFORNIA	2033 M L KING JR WY BERKELEY, CA
XTRA OIL CO/CHEVRON/THE ALAM	1201 THE BERKELEY, CA

This printout showed me that there was no other regex pattern for coordinates that I failed to capture; the reason for the failure to parse coordinates was

not because of multiple regex patterns but
because of the absence of coordinate information.

Also, it was very curious that so many of the records that lacked coordinate information were located on Martin Luther King, Jr. Way.

I decided to fill in the NAs by geocoding; however, before I did so, for future reference, I created a variable that would distinguish records by whether the original data set lacked coordinate information.

inspections_uniquelocations = inspections_uniquelocations %>%
  mutate(lacked_online_coords = is.na(lat))

Acquiring missing coordinates. Original approach: Geocoding

Note: readers following along in R who would like to both skip registration of a Google Maps API key and avoid possible fees should apply code in the next “alternate approach” section instead of this section.

To acquire the missing coordinates, I used the mutate_geocode function from the ggmap package. This method queries the Google Maps API, requires a registered key, and can result in charges if query volume is large.

I first created a data frame that had

rows only for restaurants that were missing coordinate information (geocoding for all records is unnecessary and could result in increased charges)
columns for only the restaurant name and the restaurant address, the primary substrate for the geocoding.

lacking_online_coords = inspections_uniquelocations %>%
  filter(lacked_online_coords==TRUE) %>%
  select(Doing_Business_As, Restaurant_Address) %>%
  as.data.frame()

Let’s inspect that data frame.

lacking_online_coords %>%
  rmarkdown::paged_table()

Since Google Maps might not recognize the location described by those Restaurant_Address strings (especially that “1201 THE\nBERKELEY” in the 12th record refers to “1201 The Alameda, Berkeley”), I performed some string manipulation for clarity.

lacking_online_coords$Restaurant_Address = lacking_online_coords$Restaurant_Address %>% 
  str_replace_all("M L KING", "MARTIN LUTHER KING") %>%
  str_replace_all("\\n", " ") %>%
  str_replace_all("1201 THE", "1201 THE ALAMEDA")

Here’s what the strings became after.

lacking_online_coords%>%
  rmarkdown::paged_table()

Now armed with strings more similar to what I’d type into a Google Maps search, I proceeded with the geocode. Then I filled in the coordinate NAs in the main inspections tibble with the newly acquired coordinates, which involved a left join.

register_google(key=my_key)

lacking_online_coords = lacking_online_coords %>% 
  mutate_geocode(Restaurant_Address) # geocode yields columns lat and lon

inspections_allcoords = left_join(x=inspections_uniquelocations, 
                                  y=lacking_online_coords,
                                  by = "Doing_Business_As"
                                  ) %>%
  mutate(lat = coalesce(lat.x, lat.y),
         lon = coalesce(lon.x, lon.y)) %>%
  select(-lat.x, -lon.x, -lat.y, -lon.y, -Restaurant_Address.y) %>% # dropped extra variables generated by the join
  rename(Restaurant_Address = Restaurant_Address.x)

Acquiring missing coordinates. Alternate approach to sidestep key registration and possible fees: Joining with saved geocoded data

Earlier in the supplement, I provided code to import the output of the approach above. Here, I left-join that data with the inspections data on the left.

inspections_allcoords = left_join(x=inspections_uniquelocations, 
                                  y=geocoded_coords, 
                                  by = "Doing_Business_As"
                                  ) %>%
  mutate(lat = coalesce(lat.x,lat.y),
         lon = coalesce(lon.x,lon.y)) %>%
  select(-lat.x, -lon.x, -lat.y, -lon.y, -Restaurant_Address.y)

Inspecting final coordinate data

At this point, the data set contained no more missing coordinate data.

sum(is.na(inspections_allcoords))

## [1] 0

This was the current state of the first six rows (with newly complete coordinate data in the rightmost columns).

inspections_allcoords %>% 
  head() %>%
  rmarkdown::paged_table()

I mapped the points as a quality check.

attention_palette = colorFactor(c('dodgerblue', 'maroon'),
                                domain = c('FALSE', 'TRUE'))

leaflet() %>%
  addProviderTiles("CartoDB") %>%
  addCircleMarkers(data = inspections_allcoords,
                   lng = ~lon,
                   lat = ~lat,
                   label = ~Doing_Business_As,
                   radius = 2,
                   color = ~attention_palette(lacked_online_coords),
                   fillOpacity = 0.2) %>%
  addLegend(data = inspections_allcoords,
            pal = attention_palette,
            values = c(FALSE, TRUE),
            title = "Whether a geocode was necessary"
            ) %>%
  addResetMapButton()

The locations of the points made sense: increased density of points could be seen along the highly commercial segments of San Pablo, Shattuck, Solano, Telegraph, and University Avenues.

In the map above, I used color to distinguish the locations that lacked coordinate information in the original file. All of those locations–even the single location without a street address on Martin Luther King, Jr. Way–curiously fell on the same line.

[Draft in progress] Inspection scores of Berkeley restaurants: Data cleaning supplement

Niño-Pierre Galang

Updated April 24, 2019

Loading packages

Importing data

The Berkeley restaurant inspections data

The Berkeley city council district polygon data

Data saved from a geocode

Learning the extent of missing data

Removing repeated restaurant-address combinations

Acquiring coordinates

Parsing coordinates from `Restaurant_Address` strings

Acquiring missing coordinates. Original approach: Geocoding

Acquiring missing coordinates. Alternate approach to sidestep key registration and possible fees: Joining with saved geocoded data

Inspecting final coordinate data

[Draft in progress] Inspection scores of Berkeley restaurants: Data cleaning supplement

Niño-Pierre Galang

Updated April 24, 2019

Loading packages

Importing data

The Berkeley restaurant inspections data

The Berkeley city council district polygon data

Data saved from a geocode

Learning the extent of missing data

Removing repeated restaurant-address combinations

Acquiring coordinates

Parsing coordinates from Restaurant_Address strings

Acquiring missing coordinates. Original approach: Geocoding

Acquiring missing coordinates. Alternate approach to sidestep key registration and possible fees: Joining with saved geocoded data

Inspecting final coordinate data

Parsing coordinates from `Restaurant_Address` strings