This project dates back to last summer when I just started my internship at WCVB. Mike Beaudet, one of the investigative reporters at the station, mentioned a hype in the mobile food industry and asked me to check the city’s sanitary inspection data. He wanted to know whether food trucks were as clean as brick-and-mortar restaurants.
Mike already had the list of all food trucks in Boston from Boston Inspectional Services (ISD), and I knew the city had an interactive database of food establishment sanitary inspection results, called “Mayor’s Food Court.”
At first, I hand-searched each truck’s name in Mayor’s Food Court and entered all of their previous inspection results in a spreadsheet. After spending dozens of hours hand-searching and entering data, I realized that it would never work. I wasn’t sure how much more time I would have to spend. And I was sure that the spreadsheet would be dirty as I often made typos.
Only after wasting dozens of hours, I was able to think of another option of asking ISD Public Records Accessor to give me the raw data, based on which Mayor’s Food Court was made. Lisa Timberlake, the Director of Publicity for ISD, directed us to Analyze Boston, the City of Boston’s open data hub. The dataset was already available online with other datasets, including crime and fire incident reports. I learned that it was the go-to website for city data.
Finally, I had two tables - the list of all food trucks and all food establishment inspection results - in my hands.
# Import all food establishment inspection result data obtained from Analyze Boston. When importing, prepare fields for future analysis.
all_food_establishments <-
read_csv("all_food_establishments.csv",
col_types = cols(expdttm = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
issdttm = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
licenseno = col_character(),
property_id = col_character(),
resultdttm = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
violdttm = col_datetime(format = "%Y-%m-%d %H:%M:%S")))
# Import the list of food trucks obtained from ISD Public Records Accessor.
food_truck_list <-
read_csv("/Volumes/YunChoi/WCVB/FoodTruck/food_truck_list.csv",
col_types = cols(`Expire Date` = col_datetime(format = "%m/%d/%Y %H:%M"),
`Grace Expiration Date` = col_datetime(format = "%m/%d/%Y %H:%M"),
`Issued Date` = col_datetime(format = "%m/%d/%Y %H:%M"),
`Last Renewed` = col_datetime(format = "%m/%d/%Y %H:%M"),
`License #` = col_character(),
`Renewal Date` = col_datetime(format = "%m/%d/%Y %H:%M")))
# Select only columns necessary for analysis from both tables. Rename columns that has a white space in their name. Otherwise, you have to wrap their names with quotation marks whenever you use them.
food_truck_list <- food_truck_list %>%
select(licenseno = `License #`, businessname = 'Business Name',
namefirst = 'First Name', namelast = 'Last Name',
issdttm = 'Issued Date', renewal_date = 'Renewal Date',
expdttm = 'Expire Date', last_renewed = 'Last Renewed')
all_food_establishments <- all_food_establishments %>%
select(licenseno, businessname, namefirst, namelast, issdttm, expdttm,
licstatus, licensecat, result, resultdttm, violation, viollevel,
violdesc, violdttm, violstatus, statusdate, comments)
Both of the two tables had the “license number” and “business name” columns. License numbers were all consistent 6-digit numbers, but business names were inconsistent. Some business names were written in all capital letters, and others were not. So, I decided to use the license number column to filter the large inspection dataset.
# Leave food truck-related data only from all food truck inspection data by filtering rows with any of food truck license numbers. Assign it to "food_truck_data".
food_truck_data <- all_food_establishments %>%
filter(licenseno %in% food_truck_list$licenseno)
# Save rows that don't have any of food truck license numbers as "non-food_truck_data" for future comparision between food trucks and brick-and-mortar restaurants.
non_food_truck_data <- all_food_establishments %>%
filter(licenseno %nin% food_truck_list$licenseno)
To compare food trucks and brick-and-mortar restaurants, I had to make sure that both datasets had the same timeframe. I assumed they would have different timeframes as food trucks were a relatively new type compared to regular restaurants. I selected the “restultdttm (inspection result date and time)” column and arranged it in chronological order to see the earliest inspection dates of each type.
food_truck_data %>%
select(resultdttm) %>%
arrange(resultdttm)
## # A tibble: 3,421 x 1
## resultdttm
## <dttm>
## 1 2007-04-01 12:24:00
## 2 2010-04-23 10:32:42
## 3 2010-04-23 10:32:42
## 4 2010-04-23 14:36:55
## 5 2010-05-28 09:20:55
## 6 2010-07-23 11:16:01
## 7 2010-07-23 11:16:01
## 8 2010-07-23 11:16:01
## 9 2010-08-06 12:52:36
## 10 2010-08-06 12:52:36
## # … with 3,411 more rows
non_food_truck_data %>%
select(resultdttm) %>%
arrange(resultdttm)
## # A tibble: 584,901 x 1
## resultdttm
## <dttm>
## 1 2006-04-04 08:49:18
## 2 2006-04-04 08:49:18
## 3 2006-04-04 08:49:18
## 4 2006-04-04 08:49:18
## 5 2006-04-04 08:49:18
## 6 2006-04-04 08:49:18
## 7 2006-04-04 08:49:18
## 8 2006-04-04 08:49:18
## 9 2006-04-04 08:49:18
## 10 2006-08-28 12:12:52
## # … with 584,891 more rows
As expected, two data sets had different timeframes. In the food truck data, the earliest inspection date was “2007-04-01”, but the second earliest inspection date was “2010-04-23”, which means food trucks hadn’t been regularly inspected until “2010-04-23” - unlike brick-and-mortar restaurants, which had been regularly inspected since “2006-04-04”.
I filtered “non_food_truck_data” again to filter out inspections conducted before “2010-04-23”, the date when food truck inspections actually began.
non_food_truck_data <- non_food_truck_data %>%
filter(violdttm >= "2010-04-23")
food_truck_data <- food_truck_data %>%
filter(violdttm >= "2010-04-23")
With two datasets with the same timeframe, I started analyzing them. Below were distinct values in the “result” column, which were not that informative.
# See unique values in the "result" column.
sort(unique(all_food_establishments$result))
## [1] "DATAERR" "Fail" "Failed" "HE_Closure" "HE_Fail"
## [6] "HE_FailExt" "HE_FAILNOR" "HE_Filed" "HE_Hearing" "HE_Hold"
## [11] "HE_Misc" "HE_NotReq" "HE_OutBus" "HE_Pass" "HE_TSOP"
## [16] "HE_VolClos" "Pass" "PassViol"
I contacted Lisa again to understand the codes. Lisa emailed me back with explanations, but I can’t find the email as my company email account automatically erased old emails. But I do remember that only “HE_Pass” means a business passed inspection without any problems. Other codes specify what results a failed inspection brought. For example, “HE_TSOP” means a business was temporarily suspended because of a failed inspection.
I analyzed how many inspections had done and what results they had brought.
## [1] 758
## # A tibble: 7 x 3
## result count percent
## <chr> <int> <dbl>
## 1 HE_Fail 291 38.4
## 2 HE_Pass 273 36.0
## 3 HE_Filed 121 16.0
## 4 HE_TSOP 28 3.69
## 5 HE_Hearing 23 3.03
## 6 HE_FailExt 21 2.77
## 7 HE_Closure 1 0.132
There had been 758 food truck inspections during the timeframe, and food truck inspections resulted in more failures than passes.
I filtered only failed inspections and printed it out. Then, I identified the most serious violators who had a long history of failing inspections with critical violations. Most serious violators included Indulge India, The Bacon Truck, Roxy’s Gourmet Grilled Cheese, Chubby Chickpea Mobile, TONY’S, Momogoose 3/4 and IQ Company.
To see their most current sanitary grade, I hand-searched them in Mayor’s Food Truck and learned that all of them were retaining an A grade. I couldn’t understand.
How come they have an A grade despite their hideous past records? If these trucks are getting an A, then who’s getting a B or C?
To better understand the letter grading system, I went to ISD website (https://www.boston.gov/departments/inspectional-services/how-we-grade-restaurants-and-food-trucks), perused how this system worked and found some serious loopholes as below:
The system was allowing a restaurant owner to pay to get an A grade and then evade inspections for one year.
Based on the abovementioned findings, I drafted questions for an interview with Dion Irish, ISD Commissioner, which Mike gad scheduled early on. Also, I sent the list of serious violators and their schedules to camera crew and told them to watch whether the chefs do something gross.
A few days before the interview, the camera crew brought amazing videos of food truck chefs that grossed us out. Mike and I decided to show the videos to Mr. Irish when reviewing my questions before the interview.
The interview went great. We were able to get the Commissioner’s response to the videos, and Mike asked great questions, better-phrasing questions I drafted.
While putting the story together (I drafted the storyline.), I realized a good story is born when everyone in a newsroom is doing their job. When a data person digs data well, a camera person brings a good video and a reporter report well. My story wouldn’t have been that strong if the camera crew didn’t bring the amazing videos or if Mike just read out the questions in the way I phrased.
Early on, I knew the grading system was a joke, but I still thought I would be able to find something interesting from the comparison between food truck and brick-and-mortar restaurant inspection results.
## [1] 74785
## # A tibble: 13 x 3
## result count percent
## <chr> <int> <dbl>
## 1 HE_Fail 29929 40.0
## 2 HE_Pass 24879 33.3
## 3 HE_Filed 13569 18.1
## 4 HE_FailExt 4041 5.40
## 5 HE_Hearing 1746 2.33
## 6 HE_TSOP 422 0.564
## 7 HE_OutBus 87 0.116
## 8 HE_Closure 82 0.110
## 9 HE_VolClos 11 0.0147
## 10 HE_FAILNOR 7 0.00936
## 11 HE_NotReq 6 0.00802
## 12 HE_Misc 5 0.00669
## 13 Failed 1 0.00134
There had been 74,785 brick-and-mortar restaurant inspections. Unlike my expectations, the regular restaurants had a higher failure rate than food trucks, which wouldn’t help bolster our story. But I noticed that food trucks were seven times more likely to be temporarily suspended. (3.6 percent of food truck inspections led to temporary suspensions while only 0.56 percent of regular restaurant inspections brought the same result.) So, I looked into why food trucks were more likely to be shut down.
I found 12 trucks had been temporarily suspended since Jan. 1, 2018. I searched the failed inspection results in Mayor’s Food Court and found the 12 trucks had had a common violation: Failure to provide water for workers to wash their hands. This was later used in the script.
food_truck_inspection %>%
filter(result == "HE_TSOP", resultdttm >= "2018-01-01")
## # A tibble: 12 x 4
## licenseno businessname result resultdttm
## <chr> <chr> <chr> <dttm>
## 1 330456 BM CART1:BON ME YELLOW CART HE_TS… 2018-03-16 11:12:27
## 2 78489 CAPTAIN MARDEN'S SEAFOODS HE_TS… 2019-01-23 14:48:59
## 3 333496 Kebabish HE_TS… 2018-03-16 12:14:51
## 4 116142 Moyzilla HE_TS… 2019-06-07 13:39:45
## 5 128072 Pennypackers 2 HE_TS… 2018-03-21 12:50:25
## 6 157072 Quebrada Baking Company HE_TS… 2018-03-19 09:46:47
## 7 80632 Roxy's Gourmet Grilled Cheese No. 1 HE_TS… 2018-02-28 11:43:53
## 8 80632 Roxy's Gourmet Grilled Cheese No. 1 HE_TS… 2018-04-24 09:19:48
## 9 74860 Roxy's Gourmet Grilled Cheese No. 2 HE_TS… 2018-03-07 10:49:59
## 10 74860 Roxy's Gourmet Grilled Cheese No. 2 HE_TS… 2018-03-12 09:34:30
## 11 97097 The Bacon Truck HE_TS… 2019-01-29 14:50:19
## 12 87070 TONY'S HE_TS… 2018-12-10 11:25:52
Given that all of the 12 trucks were suspended for failing to provide water for workers to wash their hands, it was obvious that the two types of food establishments had different challenges. So, I also looked into the most common violations of each type.
# Group by violation descriptions and their levels. Then, arrange the most common results in that order.
food_truck_data %>%
filter(!is.na(violstatus)) %>%
group_by(violdesc, viollevel) %>%
summarise(count = n()) %>%
arrange(desc(count))
## # A tibble: 103 x 3
## # Groups: violdesc [102]
## violdesc viollevel count
## <chr> <chr> <int>
## 1 PIC Performing Duties *** 242
## 2 Improper Maintenance of Walls/Ceilings * 240
## 3 Cold Holding *** 205
## 4 Non-Food Contact Surfaces * 141
## 5 Non-Food Contact Surfaces Clean * 139
## 6 Dishwashng Facilities * 121
## 7 Hot and Cold Water *** 117
## 8 Installed and Maintained * 114
## 9 Food Contact Surfaces Design * 105
## 10 Food Protection * 100
## # … with 93 more rows
Three of the 10 most common violations of food trucks were Level 3. “PIC Performing Duties” means an owner didn’t fulfill their operational duties - like posting food allergy warnings. “Cold Holding” means an operator failed to store ingredients at a temperature cold enough to prevent them from going bad. I also looked into the most common critical violations.
# Do the same thing. But this time, filter out Level 1 violations.
food_truck_data %>%
filter(!is.na(violstatus), viollevel != "*") %>%
group_by(violdesc, viollevel) %>%
summarise(count = n()) %>%
arrange(desc(count))
## # A tibble: 44 x 3
## # Groups: violdesc [43]
## violdesc viollevel count
## <chr> <chr> <int>
## 1 PIC Performing Duties *** 242
## 2 Cold Holding *** 205
## 3 Hot and Cold Water *** 117
## 4 Hot Holding *** 88
## 5 Food Contact Surfaces Clean *** 70
## 6 Location Accessible *** 63
## 7 Adequate Handwashing/Where/When/How *** 53
## 8 PIC Knowledge *** 52
## 9 Food Contact Surfaces Clean ** 49
## 10 Separation/Sanitizer Criteria *** 32
## # … with 34 more rows
# Group by violation descriptions and their levels. Then, arrange the most common results in that order.
non_food_truck_data %>%
filter(!is.na(violstatus)) %>%
group_by(violdesc, viollevel) %>%
summarise(count = n()) %>%
arrange(desc(count))
## # A tibble: 354 x 3
## # Groups: violdesc [353]
## violdesc viollevel count
## <chr> <chr> <int>
## 1 Non-Food Contact Surfaces Clean * 32680
## 2 Improper Maintenance of Walls/Ceilings * 31189
## 3 Non-Food Contact Surfaces * 26934
## 4 Improper Maintenance of Floors * 25995
## 5 Food Protection * 22468
## 6 Hand Cleaner Drying Tissue Signage * 16423
## 7 Premises Maintained * 16310
## 8 Installed and Maintained * 13781
## 9 Wiping Cloths Clean Sanitize * 13021
## 10 Food Contact Surfaces Design * 12725
## # … with 344 more rows
Unlike food trucks - three of 10 most common violations of which were Level 3 - all 10 most common violations of brick-and-mortar restaurants were Level 1. This proved my assumption that the two types of restaurants face different challenges.
# Do the same thing. But this time, filter out Level 1 violations.
non_food_truck_data %>%
filter(!is.na(violstatus), viollevel != "*") %>%
group_by(violdesc, viollevel) %>%
summarise(count = n()) %>%
arrange(desc(count))
## # A tibble: 161 x 3
## # Groups: violdesc [160]
## violdesc viollevel count
## <chr> <chr> <int>
## 1 Food Contact Surfaces Clean ** 12585
## 2 Insects Rodents Animals ** 11330
## 3 Cold Holding *** 10524
## 4 PIC Performing Duties *** 9876
## 5 Food Contact Surfaces Clean *** 6876
## 6 PIC Knowledge *** 6428
## 7 Location Accessible *** 6023
## 8 Separation Segregation Cross Contamination *** 4622
## 9 Hot Holding *** 4609
## 10 Consumer Advisories *** 3178
## # … with 151 more rows
I couldn’t believe my eyes, seeing “Insects Rodents Animals” was the second common violations for regular restaurants. And it was not even a Level 3 violation. It was a shocking finding, but I moved on as we were focusing on food trucks.
If I had an opportunity to do another story related to this topic, I would like to focus on restaurants where insects, rodents or other animals were found. I would like to learn which creature has been most commonly found, which neighborhoods have suffered the most and whether the numbers are increasing or decreasing.