I have witnessed quite a lot of restaurant turnover in my 8+ years as a Berkeley resident.

  • At the end of last year, I selected a particular Shattuck Avenue restaurant for one of my birthday week gatherings. It was only as the last of us were leaving the restaurant that I noticed a faded 8.5" x 11" posting by the door that stated that that evening was the restaurant’s last day in business (!). Beyond the faded sheet of paper that I didn’t notice until after the meal, there were no other clues that this was the restaurant’s final meal service; in retrospect, everything seemed so business-as-usual about the restaurant’s last supper. I just happened to place my reservation for the last possible evening I suppose.

  • I spend much of my time near UC Berkeley, where the snacking landscape has transitioned dramatically: boba shops have filled in where the froyo shops of my undergraduate years have folded. The extent of the boba shop takeover is nothing short of meme-tastic. Please take these following memes from the Facebook group UC Berkeley Memes for Edgy Teens as evidence that I’m not using the highly scientific term “meme-tastic” loosely. At all. In the slightest.

(Also, the five memes take a bit to scroll through so, in case you simply are in no mood for memes, allow me to point out that you can quickly navigate through this page with the bar on the left.)

Despite Berkeley’s landmark excise tax on distributors of sugar-sweetened beverages…

October 2018 | >1.5K reacts | 884+ comments | 10+ shares | Source: UC Berkeley Memes for Edgy Teens

September 2018 | >3.7K reacts | >1.3 comments | 33+ shares | Source: UC Berkeley Memes for Edgy Teens

April 2017 | >1.4K reacts | >1.3 comments | 3+ shares | Source: UC Berkeley Memes for Edgy Teens

November 2018 | 710+ reacts | 265+ comments | 3+ shares | Source: UC Berkeley Memes for Edgy Teens

October 2018 | 765+ reacts | 649+ comments | 6+ shares | Source: UC Berkeley Memes for Edgy Teens

One might suppose from these memes that TZONE, Sweetheart, and Quickly may not be the height of boba craftsmanship around UC Berkeley.

The restaurant inspection data set

Just as the final meme speaks to the ideas of chaos/law, good/evil, and Berkeley-specific nourishment, so will I today! To an extent.

The City of Berkeley publishes the results of its most recent restaurant inspections through its Open Data portal. For this exercise, I will examine the file available on February 25, 2019. The file’s records/rows are the most recent inspections of hundreds of Berkeley restaurants and food vendors, and among its features/columns are inspection scores along criteria such as equipment contamination, food source, and personal hygiene. The most recent of the inspections that contributed to this file took place on February 21, 2019.

Allow me to offer a preview of this file’s contents.

inspections = read.csv(inspections_file, stringsAsFactors = F) %>% 
  tbl_df()

Help yourself to the first five rows of the file–

inspections %>%
  head(5) %>%
    kable('html') %>%
  kable_styling(position="center") %>%
  scroll_box(width = "100%") 
Inspection_ID Doing_Business_As Restaurant_Address Inspection_Date Major_Violation_Improper_Holding_Temperature Minor_Violation_Improper_Holding_Temperature Major_Violation_Inadequate_Cooking Minor_Violation_Inadequate_Cooking Major_Violation_Personal_Hygiene Minor_Violation_Personal_Hygiene Major_Violation_Contaminated_Equipment Minor_Violation_Contaminated_Equipment Major_Violation_Unsafe_Food_Source Minor_Violation_Unsafe_Food_Source InDbDate
FA0000793 LIK LIQUORS 2495 SACRAMENTO ST BERKELEY, CA (37.862346, -122.28105) 10/16/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0000193 FAT APPLE’S INC. 1346 M L KING JR WY BERKELEY, CA 10/16/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0000568 SEABREEZE MARKET & DELI 598 UNIVERSITY AVE BERKELEY, CA (37.866427, -122.305095) 11/29/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0001366 ALCHEMY COOPERATIVE INC. 1741 ALCATRAZ AVE BERKELEY, CA (37.848597, -122.272524) 12/13/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM
FA0000353 CHAAT CAFE 1902 UNIVERSITY AVE BERKELEY, CA (37.871509, -122.272955) 10/24/2018 0 0 0 0 0 0 0 0 0 0 02/25/2019 05:00:02 AM

–as well as to an overview (a glimpse if you will) of the file.

inspections %>% 
  glimpse()
## Observations: 751
## Variables: 15
## $ Inspection_ID                                <chr> "FA0000793", "FA0...
## $ Doing_Business_As                            <chr> "LIK LIQUORS", "F...
## $ Restaurant_Address                           <chr> "2495 SACRAMENTO ...
## $ Inspection_Date                              <chr> "10/16/2018", "10...
## $ Major_Violation_Improper_Holding_Temperature <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Improper_Holding_Temperature <int> 0, 0, 0, 0, 0, 1,...
## $ Major_Violation_Inadequate_Cooking           <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Inadequate_Cooking           <int> 0, 0, 0, 0, 0, 0,...
## $ Major_Violation_Personal_Hygiene             <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Personal_Hygiene             <int> 0, 0, 0, 0, 0, 0,...
## $ Major_Violation_Contaminated_Equipment       <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Contaminated_Equipment       <int> 0, 0, 0, 0, 0, 0,...
## $ Major_Violation_Unsafe_Food_Source           <int> 0, 0, 0, 0, 0, 0,...
## $ Minor_Violation_Unsafe_Food_Source           <int> 0, 0, 0, 0, 0, 0,...
## $ InDbDate                                     <chr> "02/25/2019 05:00...

The glimpse output indicates 751 observations, but I recognize only 743 observations in my restaurant-level visualizations. I note in the forthcoming data cleaning supplement that I dropped eight rows. 3 restaurant-address combinations are repeated across 11 rows; I decided to keep only one record for the most recent inspection for each restaurant-address combination, bringing the number of restaurant observations from 751 to 743.

Here are the locations of all 743 inspected restaurants; feel free to zoom around and pan about; hovering above a circle marker reveals the point’s restaurant name .

#see supplement for data cleaning/prep

leaflet() %>%
  addProviderTiles("CartoDB") %>%
  addCircleMarkers(data = inspections_allcoords,
                   lng = ~lon,
                   lat = ~lat,
                   label = ~Doing_Business_As,
                   radius = 2,
                   color = "gold",
                   fillOpacity = 0.1) %>%
  addResetMapButton()

The locations of the points made sense: increased density of points can be seen along the highly commercial segments of San Pablo, Shattuck, Solano, Telegraph, and University Avenues.

When I zoom into the main UC Berkeley campus area (between the areas labeled “Northside” and “Southside”), I notice that campus eateries are not represented in the file obtained from the City of Berkeley Open Data portal. Restaurants and other food vendors affiliated with UC Berkeley may not appear in this city-produced data set because they seem to be regulated by a non-city entity, the Office of Environment, Health, & Safety at the University of California, Berkeley. The University restaurant inspection data

These properties of the University restaurant inspection data make them difficult to reconcile with the city restaurant inspection data; I will thus restrict this exercise to only the 743 vendors inspected by the City of Berkeley.

Data cleaning

My data cleaning saga will be documented in another RPubs page that will be found in this collection.

Status: coming soon! So stay tuned if thou so chooseth. Documentation in that forthcoming supplementary page awaits anyone interested in that level of gory detail in R.

For now, however, let’s talk Berkeley restaurant inspection strikes.

Types and severity of restaurant inspection strikes

As we gathered from our head and glimpse previews above of the city restaurant inspection data, inspectors recorded whether Berkeley restaurants were found in minor as well as major violation of five particular offenses:

  • Contaminated equipment

  • Improper holding temperature

  • Inadequate cooking

  • Poor personal hygiene

  • Unsafe food source.

Let’s visualize how restaurant violations varied across type and severity.

#see supplement for data cleaning/prep

type_and_severity_counts = by_violation_type_and_severity %>%
  filter(type_and_severity_violation_observed==1) %>%
  count(violation_type, violation_severity) %>%
  complete(violation_type, violation_severity, fill=list(n=0))

type_and_severity_counts %>%
  ggplot(aes(x=violation_type, y=n, fill=violation_severity, label=violation_type)) +
  geom_bar(position='stack', stat='identity') +
  scale_fill_manual(values = c('Major violation' = 'maroon', 'Minor violation' = 'tomato')) +
  geom_text(aes(violation_type, 0.7), hjust = 0,
              size = 5) +
  geom_text(data=type_and_severity_counts[type_and_severity_counts$n != 0,], 
            aes(label=n, y= c(44.5, 81.5,72.5))
            ) +
  scale_y_continuous(breaks=seq(0,100, 10)) +
  coord_flip() +
  scale_x_discrete(limits = rev(levels(by_violation_type_and_severity$violation_type))) +
  labs(x = "Type of violation", fill = "Severity of violation", y= "Number of recorded violations", 
       title = "Restaurant inspection violations across violation type and severity", 
       subtitle = "At Berkeley restaurants' most recent health inspections through February 21, 2019, \nviolations of only two types were recorded.",
       caption = "Data source: City of Berkeley Open Data, 2019") +
  theme(legend.position = "bottom",
        axis.text.y=element_blank(),
        axis.ticks = element_blank())

Of the violations recorded at these inspections of 743 restaurants, 49 of the 135 recorded violations were for contaminated equipment, and 86 were for improper holding temperature.

Distribution of violation count

Restaurants need not incur violations at their inspections (thankfully) so let’s also visually compare the counts of restaurants that incurred 0 and any other number of violations at their inspections.

#see supplement for data cleaning/prep

violation_count_frequencies %>%
  ggplot(aes(x=violation_count, y=violation_count_frequency, label= violation_count_frequency)) +
  geom_col(fill='gold', width=0.6) +
  labs(x = "Number of violations", y= "Number of restaurants", 
       caption = "Data source: City of Berkeley Open Data, 2019") +
  geom_text(nudge_y = 25)  + 
  theme(axis.ticks = element_blank()) 

Recall from the final meme above that a memer selected three boba shops to represent the chaotic alignment (Asha Tea House, Quickly, and U-Cha) and three to represent the evil alignment (Quickly, Sweetheart Cafe & Tea, and TZONE). Recall further that, in reference to an Ariana Grande single, another of the memes above compared Sweetheart Cafe & Tea to the sort of ex that demonstrates the painful dimensions of dating and relationships. I wonder whether any of these five boba shops committed violations at their most recent restaurant inspections; are any of them “chaotic”, “evil”, or “painful” enough to fall into that right bar that represents the 27 vendors that committed multiple violations?

#see supplement for data cleaning/prep

by_violation_type_and_severity %>% 
  group_by(Doing_Business_As, Restaurant_Address) %>% 
  summarize(violation_count = sum(type_and_severity_violation_observed) %>% 
              as.factor()
            ) %>% 
  filter(str_detect(Doing_Business_As, "ASHA TEA|HEART CAFE & TEA|QUICKLY|TZONE|U-CHA")) %>%
  kable('html') %>%
  kable_styling(position="center") %>%
  scroll_box(width = "100%") 
Doing_Business_As Restaurant_Address violation_count
ASHA TEA HOUSE 2086 UNIVERSITY AVE BERKELEY, CA (37.872066, -122.2687) 0
CAL QUICKLY 2505-G HEARST AVE BERKELEY, CA (37.875072, -122.260226) 0
SWEET HEART CAFE & TEA 2523 DURANT AVE BERKELEY, CA (37.867878, -122.258483) 0
TZONE 2328 TELEGRAPH AVE BERKELEY, CA (37.868157, -122.259068) 0
U-CHA 2199 BANCROFT WAY BERKELEY, CA (37.867822, -122.266003) 0

Oh okay; so much for chaos and evil .-.

Variation in restaurant inspection data across regions of Berkeley

Since I was able to acquire coordinate information for each inspected restaurant (the process involved the packages stringr and ggmap), I grew curious about whether inspection scores varied spatially across Berkeley. Choropleth maps could facilitate a visual approach to this investigation. I selected Berkeley city council districts as the polygons by which I’d compute summaries of the restaurant inspection data. Explain that choice, Niño-Pierre.

Selecting city council districts as choropleth map polygons

The Berkeley Open Data Portal offers several options for polygon sets that partition Berkeley. Below are links to current pages that display those polygons and that have links for acquiring the polygon data. Ordered from most to least number of polygons:

  • Zoning Districts - not useful for this exercise; most blocks get their own polygons

  • Census Block Polygons 2010 - also not helpful; most blocks get their own polygons

  • Census Block Group Polygons 2010 - still too fine for aggregations to provide useful summary values

  • Zipcodes - a good number of polygons (9) for the number of records in the data set (743); perimeter-to-area ratio seems a bit high; borders seem a bit arbitrary compared to how regions of Berkeley come to mind for me, a local for 8+ years

  • Council Districts - a good number of polygons (8) for the number of records in the data set (743); polygons appear roughly more compact than those of the zip code polygons; borders seem to fashion a more intuitive partitioning of Berkeley

Onward with aggregation by district!

#see supplement for data cleaning/prep

leaflet() %>% 
  addProviderTiles("CartoDB") %>% 
  addPolygons(data =inspections_bydistrict_untransf,
              label = ~district,
              color='black'
              ) %>%
  addCircleMarkers(data=inspections_allcoords,
                   lng = ~lon,
                   lat = ~lat,
                   label = ~Doing_Business_As,
                   radius = 2,
                   color = "gold",
                   fillOpacity = 0.2) %>%
  addResetMapButton()

(A forthcoming supplement documents the data cleaning process, which includes handling of the point that fell far from any of the district polygons.)

Choropleth maps

I included a point layer above the polygon layer to help facilitate surface-level quality checks of the gray shade computed for the polygons.

Choropleth maps: General properties

Restaurant count

Restaurants per area

Violation count

Violations per area

Restaurants’ mean violation count

Restaurants that committed violations

Restaurants that committed violations per area

Restaurants that committed violations (%)

Restaurants that committed multiple violations

Restaurants that committed multiple violations per area

Restaurants that committed multiple violations (%)

Major violations

Major violations per area

Major violations (%)

Choropleth maps: Properties specific to violation types

Recall that

  • violations of only two types were observed

  • the only major violation type observed was improper holding temperature.

Minor holding temperature violation

Minor holding temperature violation (%)

Holding temperature violation

Holding temperature violation (%)

Contaminated equipment violation

Contaminated equipment violation (%)

Corresponding bar plots

While choropleth maps are helpful for building an intuition about geospatial trends, they can be misleading if consumers of these maps are unaware of certain biases.

When polygons vary greatly in area, raw counts of particular variables may be misleading. More events could be observed in some polygons merely because those polygons are bigger or have more people; thus it could be good form to somehow standardize raw counts, e.g. by dividing counts by variables such as area or population. I attempted to discourage this sort of bias in the visualizations above by offering multiple options per variable: raw counts, counts per area, and counts per number of district restaurants.

Zone issues relating to the modifiable areal unit problem: aggregate measurements can vary greatly based on the polygons used for aggregation (e.g. election results, which aggregate vote data, can be greatly influenced by gerrymandering).

  • I engaged with this idea earlier when deciding whether I’d split Berkeley by city council districts or by zip codes. As I mentioned above, I felt that the zip code polygons appeared more arbitrary and appeared to have higher perimeter-to-area ratios than those of the city council district polygons, which I found to be more compact and more intuitive.

  • Neither set of polygons, however, is free of MAUP-related zone issues: both have some borders drawn along highly commercial streets where many of the restaurant points lie, and, even if a point could reasonably belong to either of two bordering polygons, generation of these choropleth maps required that each restaurant point be assigned to only one district polygon. Closer inspection reveals some arbitrary assignment of borderline restaurant points; the interactive map below shows that University Avenue (orange, dark green, and fuchsia) and Telegraph Avenue (brown and gray) restaurants experienced an undeniable amount of arbitrary district assignment.

#see supplement for data cleaning/prep

district_palette <- colorFactor(
  palette = 'Dark2',
  domain =inspections_bydistrict_untransf$district
)


leaflet() %>% 
  addProviderTiles("CartoDB") %>% 
  addPolygons(data =inspections_bydistrict_untransf,
              label = ~district,
              color = ~district_palette(district),
              weight = 2
              ) %>%
  addLegend(data =inspections_bydistrict_untransf, "topright", pal = district_palette, values = ~district,
    title = "District",
    opacity = 0.5
    ) %>%
  addCircleMarkers(data=inspections_allcoords %>% 
                     mutate(district = inspections_buffered_districts$district),
                   lng = ~lon,
                   lat = ~lat,
                   label = ~Doing_Business_As,
                   radius = 1.5,
                   color = ~district_palette(district),
                   fillOpacity = 1,
                   stroke=F) %>%
  addResetMapButton()

Also, larger polygons in a choropleth map are said to receive more attention from or “visual weight” with their audiences. To mitigate this particular phonemenon’s effects, I offer below bar plots that correspond to the choropleth maps above. In the bar plots, each district occupies the same amount of visual space, a row (filled with a variable amount of color) in each horizontal bar plot.

Bar plots: General properties

Maroon is the fill color when bar plots are violation-oriented; otherwise the fill color is gold.

Restaurant count

Restaurants per area

Violation count

Violations per area

Restaurants’ mean violation count

Restaurants that committed violations

Restaurants that committed violations per area

Restaurants that committed violations (%)

Restaurants that committed multiple violations

Restaurants that committed multiple violations per area

Restaurants that committed multiple violations (%)

Major violations

Major violations per area

Major violations (%)

Bar plots: Properties specific to violation types

Recall that

  • violations of only two types were observed

  • the only major violation type observed was improper holding temperature.

Minor holding temperature violation

Minor holding temperature violation (%)

Holding temperature violation

Holding temperature violation (%)

Contaminated equipment violation

Contaminated equipment violation (%)

Remarks in light of visualizations

District 4 (Central Berkeley), which contains highly commerical segments of Shattuck Avenue and University Avenue, is unmatched with regard to both restaurant density and restaurant count, boasting almost 200 vendors and sitting over 75 inspected vendors ahead of the district with the second-most restaurants.

In this analysis, that district, District 7 (UC Berkeley, Southside), has its actual restaurant count and density underestimated; its actual restaurant count and density draw

  • not only from the inspected restaurants immediately to the south of the main UC Berkeley campus (including the notable Durant Avenue concentration of vendors known as “the Asian Ghetto,” a reference to which an unblinking eye might notice among the memes on this page)

  • but also from the many vendors absent from this data set (absent from the city-produced file because these vendors are regulated by the University and not by the city).

District 7 (UC Berkeley, Southside) has a high restaurant-to-area ratio. This ratio is so high that , when districts are compared along their numbers of restaurants that committed multiple violations, District 7 seems less relatively concerning when this variable is standardized by district restaurant count. The change relative to other districts is not as dramatic when this variable is standardized by district area.

In contrast to the restaurant-rich Districts 4 (Central Berkeley) and 7 (UC Berkeley, Southside), the highly residential District 6 (Northeast Berkeley, Northside) has very few restaurants, which are all clustered by its southern border. One could make the argument that all of District 6’s restaurants could be meaningfully binned into the neighboring District 7 polygon. None of District 6’s restaurants were found in violation of any inspection criteria; thus, District 6 has bars of length 0 whenever any bar plots are violation-oriented (feature maroon bars). Despite District 6’s relatively small number of restaurants (all far from its centroid), it occupies a sizable portion of the visual space of the choropleth maps.

Even when standardizing for district area and district restaurant counts, Berkeley city council districts to the south tended to have both

  • more restaurants found in violation of particular inspection criteria (restaurant-level) and

  • greater violation counts (violation-level).

Such an observation could have health implications if city inspections have equal ability to detect violations across all regions of Berkeley (e.g. if inspections aren’t merely stricter or more observant in Berkeley city council districts to the south).

Final thoughts

I conclude by pondering a question.

Is it perfectly fine if a hospital distributes food but a city restaurant inspector finds the food-distributing hospital in violation of any restaurant inspection criteria?

inspections_allcoords %>%
  filter(str_detect(Doing_Business_As, "ALTA BATES MEDICAL") & 
           str_detect(Restaurant_Address, "2450 ASHBY AVE")) %>%
  select(Doing_Business_As, any_violations, total_violations, Minor_Violation_Improper_Holding_Temperature, Restaurant_Address, Inspection_Date) %>%
  kable('html') %>%
  kable_styling(full_width = T) %>%
  scroll_box(width = "100%") 
Doing_Business_As any_violations total_violations Minor_Violation_Improper_Holding_Temperature Restaurant_Address Inspection_Date
ALTA BATES MEDICAL CTR FOOD SERV TRUE 1 TRUE 2450 ASHBY AVE BERKELEY, CA (37.856488, -122.257329) 01/22/2019

I mean, if any customers fell ill from the Alta Bates Summit Medical Center’s food, they likely would be aware of a place to seek medical attention. Actually, I take that back; this could be problematic; I hear it now.

Unfortunately, this particular data point echoes the trend described earlier about Berkeley city council districts to the south having more food vendors in violation of restaurant inspection criteria.

A parting gift~

People who are regular or prospective customers of Berkeley food vendors might find this useful: an interactive map that indicates the number of violations recorded in this February 2019 file for restaurants inspected by the City of Berkeley.

#see supplement for data cleaning/prep

violation_palette <- colorFactor(c("mediumturquoise", "tomato", "maroon"),
                                 domain = 0:2)

leaflet() %>% 
  addProviderTiles("CartoDB") %>% 
  addCircleMarkers(data=inspections_allcoords %>% 
                     mutate(district = inspections_buffered_districts$district),
                   lng = ~lon,
                   lat = ~lat,
                   label = ~Doing_Business_As,
                   radius = 4,
                   color = ~violation_palette(total_violations),
                   fillOpacity = 0.65,
                   stroke=F) %>%
  addLegend(data=inspections_allcoords, "topright", pal = violation_palette, values = ~total_violations,
    title = "Number of violations",
    opacity = 1
    ) %>%
  addResetMapButton()