How I analyzed food truck inspection data

Story Idea

This project dates back to last summer when I just started my internship at WCVB. Mike Beaudet, one of the investigative reporters at the station, mentioned a hype in the mobile food industry and asked me to check the city’s sanitary inspection data. He wanted to know whether food trucks were as clean as brick-and-mortar restaurants.

Data Identification & Acquisition

Mike already had the list of all food trucks in Boston from Boston Inspectional Services (ISD), and I knew the city had an interactive database of food establishment sanitary inspection results, called “Mayor’s Food Court.”

At first, I hand-searched each truck’s name in Mayor’s Food Court and entered all of their previous inspection results in a spreadsheet. After spending dozens of hours hand-searching and entering data, I realized that it would never work. I wasn’t sure how much more time I would have to spend. And I was sure that the spreadsheet would be dirty as I often made typos.

Only after wasting dozens of hours, I was able to think of another option of asking ISD Public Records Accessor to give me the raw data, based on which Mayor’s Food Court was made. Lisa Timberlake, the Director of Publicity for ISD, directed us to Analyze Boston, the City of Boston’s open data hub. The dataset was already available online with other datasets, including crime and fire incident reports. I learned that it was the go-to website for city data.

Finally, I had two tables - the list of all food trucks and all food establishment inspection results - in my hands.

Data Manipulation

# Import all food establishment inspection result data obtained from Analyze Boston. When importing, prepare fields for future analysis.
all_food_establishments <- 
  read_csv("all_food_establishments.csv", 
           col_types = cols(expdttm = col_datetime(format = "%Y-%m-%d %H:%M:%S"), 
                            issdttm = col_datetime(format = "%Y-%m-%d %H:%M:%S"), 
                            licenseno = col_character(), 
                            property_id = col_character(), 
                            resultdttm = col_datetime(format = "%Y-%m-%d %H:%M:%S"), 
                            violdttm = col_datetime(format = "%Y-%m-%d %H:%M:%S")))

# Import the list of food trucks obtained from ISD Public Records Accessor.
food_truck_list <- 
  read_csv("/Volumes/YunChoi/WCVB/FoodTruck/food_truck_list.csv", 
           col_types = cols(`Expire Date` = col_datetime(format = "%m/%d/%Y %H:%M"), 
                            `Grace Expiration Date` = col_datetime(format = "%m/%d/%Y %H:%M"), 
                            `Issued Date` = col_datetime(format = "%m/%d/%Y %H:%M"), 
                            `Last Renewed` = col_datetime(format = "%m/%d/%Y %H:%M"), 
                            `License #` = col_character(), 
                            `Renewal Date` = col_datetime(format = "%m/%d/%Y %H:%M")))

# Select only columns necessary for analysis from both tables. Rename columns that has a white space in their name. Otherwise, you have to wrap their names with quotation marks whenever you use them.
food_truck_list <- food_truck_list %>%
  select(licenseno = `License #`, businessname = 'Business Name', 
         namefirst = 'First Name', namelast = 'Last Name', 
         issdttm = 'Issued Date', renewal_date = 'Renewal Date', 
         expdttm = 'Expire Date', last_renewed = 'Last Renewed')

all_food_establishments <- all_food_establishments %>%
  select(licenseno, businessname, namefirst, namelast, issdttm, expdttm, 
         licstatus, licensecat, result, resultdttm, violation, viollevel, 
         violdesc, violdttm, violstatus, statusdate, comments)

Both of the two tables had the “license number” and “business name” columns. License numbers were all consistent 6-digit numbers, but business names were inconsistent. Some business names were written in all capital letters, and others were not. So, I decided to use the license number column to filter the large inspection dataset.

# Leave food truck-related data only from all food truck inspection data by filtering rows with any of food truck license numbers. Assign it to "food_truck_data".
food_truck_data <- all_food_establishments %>%
  filter(licenseno %in% food_truck_list$licenseno)
# Save rows that don't have any of food truck license numbers as "non-food_truck_data" for future comparision between food trucks and brick-and-mortar restaurants.
non_food_truck_data <- all_food_establishments %>% 
  filter(licenseno %nin% food_truck_list$licenseno)

To compare food trucks and brick-and-mortar restaurants, I had to make sure that both datasets had the same timeframe. I assumed they would have different timeframes as food trucks were a relatively new type compared to regular restaurants. I selected the “restultdttm (inspection result date and time)” column and arranged it in chronological order to see the earliest inspection dates of each type.

food_truck_data %>%
  select(resultdttm) %>%
  arrange(resultdttm)

## # A tibble: 3,421 x 1
##    resultdttm         
##    <dttm>             
##  1 2007-04-01 12:24:00
##  2 2010-04-23 10:32:42
##  3 2010-04-23 10:32:42
##  4 2010-04-23 14:36:55
##  5 2010-05-28 09:20:55
##  6 2010-07-23 11:16:01
##  7 2010-07-23 11:16:01
##  8 2010-07-23 11:16:01
##  9 2010-08-06 12:52:36
## 10 2010-08-06 12:52:36
## # … with 3,411 more rows

non_food_truck_data %>%
  select(resultdttm) %>%
  arrange(resultdttm)

## # A tibble: 584,901 x 1
##    resultdttm         
##    <dttm>             
##  1 2006-04-04 08:49:18
##  2 2006-04-04 08:49:18
##  3 2006-04-04 08:49:18
##  4 2006-04-04 08:49:18
##  5 2006-04-04 08:49:18
##  6 2006-04-04 08:49:18
##  7 2006-04-04 08:49:18
##  8 2006-04-04 08:49:18
##  9 2006-04-04 08:49:18
## 10 2006-08-28 12:12:52
## # … with 584,891 more rows

As expected, two data sets had different timeframes. In the food truck data, the earliest inspection date was “2007-04-01”, but the second earliest inspection date was “2010-04-23”, which means food trucks hadn’t been regularly inspected until “2010-04-23” - unlike brick-and-mortar restaurants, which had been regularly inspected since “2006-04-04”.

I filtered “non_food_truck_data” again to filter out inspections conducted before “2010-04-23”, the date when food truck inspections actually began.

non_food_truck_data <- non_food_truck_data %>% 
  filter(violdttm >= "2010-04-23")

food_truck_data <- food_truck_data %>% 
  filter(violdttm >= "2010-04-23")

Data Analysis

With two datasets with the same timeframe, I started analyzing them. Below were distinct values in the “result” column, which were not that informative.

# See unique values in the "result" column.
sort(unique(all_food_establishments$result))

##  [1] "DATAERR"    "Fail"       "Failed"     "HE_Closure" "HE_Fail"   
##  [6] "HE_FailExt" "HE_FAILNOR" "HE_Filed"   "HE_Hearing" "HE_Hold"   
## [11] "HE_Misc"    "HE_NotReq"  "HE_OutBus"  "HE_Pass"    "HE_TSOP"   
## [16] "HE_VolClos" "Pass"       "PassViol"

I contacted Lisa again to understand the codes. Lisa emailed me back with explanations, but I can’t find the email as my company email account automatically erased old emails. But I do remember that only “HE_Pass” means a business passed inspection without any problems. Other codes specify what results a failed inspection brought. For example, “HE_TSOP” means a business was temporarily suspended because of a failed inspection.

Food trucks

I analyzed how many inspections had done and what results they had brought.

## [1] 758

## # A tibble: 7 x 3
##   result     count percent
##   <chr>      <int>   <dbl>
## 1 HE_Fail      291  38.4  
## 2 HE_Pass      273  36.0  
## 3 HE_Filed     121  16.0  
## 4 HE_TSOP       28   3.69 
## 5 HE_Hearing    23   3.03 
## 6 HE_FailExt    21   2.77 
## 7 HE_Closure     1   0.132

There had been 758 food truck inspections during the timeframe, and food truck inspections resulted in more failures than passes.

I filtered only failed inspections and printed it out. Then, I identified the most serious violators who had a long history of failing inspections with critical violations. Most serious violators included Indulge India, The Bacon Truck, Roxy’s Gourmet Grilled Cheese, Chubby Chickpea Mobile, TONY’S, Momogoose 3/4 and IQ Company.

To see their most current sanitary grade, I hand-searched them in Mayor’s Food Truck and learned that all of them were retaining an A grade. I couldn’t understand.

How come they have an A grade despite their hideous past records? If these trucks are getting an A, then who’s getting a B or C?

To better understand the letter grading system, I went to ISD website (https://www.boston.gov/departments/inspectional-services/how-we-grade-restaurants-and-food-trucks), perused how this system worked and found some serious loopholes as below:

Business owners can get as many inspections as they want until they get the grade they want if they pay for extra inspections.
Due to the nature of this “ordering” system, business owners know when inspectors are coming.
Food establishments that got an A grade don’t have to get inspected for a year.

The system was allowing a restaurant owner to pay to get an A grade and then evade inspections for one year.

Based on the abovementioned findings, I drafted questions for an interview with Dion Irish, ISD Commissioner, which Mike gad scheduled early on. Also, I sent the list of serious violators and their schedules to camera crew and told them to watch whether the chefs do something gross.

A few days before the interview, the camera crew brought amazing videos of food truck chefs that grossed us out. Mike and I decided to show the videos to Mr. Irish when reviewing my questions before the interview.

The interview went great. We were able to get the Commissioner’s response to the videos, and Mike asked great questions, better-phrasing questions I drafted.

While putting the story together (I drafted the storyline.), I realized a good story is born when everyone in a newsroom is doing their job. When a data person digs data well, a camera person brings a good video and a reporter report well. My story wouldn’t have been that strong if the camera crew didn’t bring the amazing videos or if Mike just read out the questions in the way I phrased.

Brick-and-mortar Restaurants

Early on, I knew the grading system was a joke, but I still thought I would be able to find something interesting from the comparison between food truck and brick-and-mortar restaurant inspection results.

## [1] 74785

## # A tibble: 13 x 3
##    result     count  percent
##    <chr>      <int>    <dbl>
##  1 HE_Fail    29929 40.0    
##  2 HE_Pass    24879 33.3    
##  3 HE_Filed   13569 18.1    
##  4 HE_FailExt  4041  5.40   
##  5 HE_Hearing  1746  2.33   
##  6 HE_TSOP      422  0.564  
##  7 HE_OutBus     87  0.116  
##  8 HE_Closure    82  0.110  
##  9 HE_VolClos    11  0.0147 
## 10 HE_FAILNOR     7  0.00936
## 11 HE_NotReq      6  0.00802
## 12 HE_Misc        5  0.00669
## 13 Failed         1  0.00134

There had been 74,785 brick-and-mortar restaurant inspections. Unlike my expectations, the regular restaurants had a higher failure rate than food trucks, which wouldn’t help bolster our story. But I noticed that food trucks were seven times more likely to be temporarily suspended. (3.6 percent of food truck inspections led to temporary suspensions while only 0.56 percent of regular restaurant inspections brought the same result.) So, I looked into why food trucks were more likely to be shut down.

Temporarily Suspended Food Trucks

I found 12 trucks had been temporarily suspended since Jan. 1, 2018. I searched the failed inspection results in Mayor’s Food Court and found the 12 trucks had had a common violation: Failure to provide water for workers to wash their hands. This was later used in the script.

food_truck_inspection %>%
  filter(result == "HE_TSOP", resultdttm >= "2018-01-01")

## # A tibble: 12 x 4
##    licenseno businessname                        result resultdttm         
##    <chr>     <chr>                               <chr>  <dttm>             
##  1 330456    BM CART1:BON ME YELLOW CART         HE_TS… 2018-03-16 11:12:27
##  2 78489     CAPTAIN MARDEN'S SEAFOODS           HE_TS… 2019-01-23 14:48:59
##  3 333496    Kebabish                            HE_TS… 2018-03-16 12:14:51
##  4 116142    Moyzilla                            HE_TS… 2019-06-07 13:39:45
##  5 128072    Pennypackers 2                      HE_TS… 2018-03-21 12:50:25
##  6 157072    Quebrada Baking Company             HE_TS… 2018-03-19 09:46:47
##  7 80632     Roxy's Gourmet Grilled Cheese No. 1 HE_TS… 2018-02-28 11:43:53
##  8 80632     Roxy's Gourmet Grilled Cheese No. 1 HE_TS… 2018-04-24 09:19:48
##  9 74860     Roxy's Gourmet Grilled Cheese No. 2 HE_TS… 2018-03-07 10:49:59
## 10 74860     Roxy's Gourmet Grilled Cheese No. 2 HE_TS… 2018-03-12 09:34:30
## 11 97097     The Bacon Truck                     HE_TS… 2019-01-29 14:50:19
## 12 87070     TONY'S                              HE_TS… 2018-12-10 11:25:52

Given that all of the 12 trucks were suspended for failing to provide water for workers to wash their hands, it was obvious that the two types of food establishments had different challenges. So, I also looked into the most common violations of each type.

Most common violations - Food Trucks

# Group by violation descriptions and their levels. Then, arrange the most common results in that order. 
food_truck_data %>%
  filter(!is.na(violstatus)) %>%
  group_by(violdesc, viollevel) %>%
  summarise(count = n()) %>%
  arrange(desc(count))

## # A tibble: 103 x 3
## # Groups:   violdesc [102]
##    violdesc                               viollevel count
##    <chr>                                  <chr>     <int>
##  1 PIC Performing Duties                  ***         242
##  2 Improper Maintenance of Walls/Ceilings *           240
##  3 Cold Holding                           ***         205
##  4 Non-Food Contact Surfaces              *           141
##  5 Non-Food Contact Surfaces Clean        *           139
##  6 Dishwashng Facilities                  *           121
##  7 Hot and Cold Water                     ***         117
##  8 Installed and Maintained               *           114
##  9 Food Contact Surfaces Design           *           105
## 10 Food Protection                        *           100
## # … with 93 more rows

Three of the 10 most common violations of food trucks were Level 3. “PIC Performing Duties” means an owner didn’t fulfill their operational duties - like posting food allergy warnings. “Cold Holding” means an operator failed to store ingredients at a temperature cold enough to prevent them from going bad. I also looked into the most common critical violations.

# Do the same thing. But this time, filter out Level 1 violations.  
food_truck_data %>%
  filter(!is.na(violstatus), viollevel != "*") %>%
  group_by(violdesc, viollevel) %>%
  summarise(count = n()) %>%
  arrange(desc(count))

## # A tibble: 44 x 3
## # Groups:   violdesc [43]
##    violdesc                            viollevel count
##    <chr>                               <chr>     <int>
##  1 PIC Performing Duties               ***         242
##  2 Cold Holding                        ***         205
##  3 Hot and Cold Water                  ***         117
##  4 Hot Holding                         ***          88
##  5 Food Contact Surfaces Clean         ***          70
##  6 Location  Accessible                ***          63
##  7 Adequate Handwashing/Where/When/How ***          53
##  8 PIC Knowledge                       ***          52
##  9 Food Contact Surfaces Clean         **           49
## 10 Separation/Sanitizer Criteria       ***          32
## # … with 34 more rows

Most common violations - Brick-and-mortar Restaurants

# Group by violation descriptions and their levels. Then, arrange the most common results in that order. 
non_food_truck_data %>%
  filter(!is.na(violstatus)) %>%
  group_by(violdesc, viollevel) %>%
  summarise(count = n()) %>%
  arrange(desc(count))

## # A tibble: 354 x 3
## # Groups:   violdesc [353]
##    violdesc                               viollevel count
##    <chr>                                  <chr>     <int>
##  1 Non-Food Contact Surfaces Clean        *         32680
##  2 Improper Maintenance of Walls/Ceilings *         31189
##  3 Non-Food Contact Surfaces              *         26934
##  4 Improper Maintenance of Floors         *         25995
##  5 Food Protection                        *         22468
##  6 Hand Cleaner  Drying  Tissue Signage   *         16423
##  7 Premises Maintained                    *         16310
##  8 Installed and Maintained               *         13781
##  9 Wiping Cloths  Clean  Sanitize         *         13021
## 10 Food Contact Surfaces Design           *         12725
## # … with 344 more rows

Unlike food trucks - three of 10 most common violations of which were Level 3 - all 10 most common violations of brick-and-mortar restaurants were Level 1. This proved my assumption that the two types of restaurants face different challenges.

# Do the same thing. But this time, filter out Level 1 violations.  
non_food_truck_data %>%
  filter(!is.na(violstatus), viollevel != "*") %>%
  group_by(violdesc, viollevel) %>%
  summarise(count = n()) %>%
  arrange(desc(count))

## # A tibble: 161 x 3
## # Groups:   violdesc [160]
##    violdesc                                    viollevel count
##    <chr>                                       <chr>     <int>
##  1 Food Contact Surfaces Clean                 **        12585
##  2 Insects  Rodents  Animals                   **        11330
##  3 Cold Holding                                ***       10524
##  4 PIC Performing Duties                       ***        9876
##  5 Food Contact Surfaces Clean                 ***        6876
##  6 PIC Knowledge                               ***        6428
##  7 Location  Accessible                        ***        6023
##  8 Separation  Segregation Cross Contamination ***        4622
##  9 Hot Holding                                 ***        4609
## 10 Consumer Advisories                         ***        3178
## # … with 151 more rows

I couldn’t believe my eyes, seeing “Insects Rodents Animals” was the second common violations for regular restaurants. And it was not even a Level 3 violation. It was a shocking finding, but I moved on as we were focusing on food trucks.

If I had an opportunity to do another story related to this topic, I would like to focus on restaurants where insects, rodents or other animals were found. I would like to learn which creature has been most commonly found, which neighborhoods have suffered the most and whether the numbers are increasing or decreasing.