1 | INTRODUCTION

Research question

NYC has a high rate of dining out and is known for its volume and variety of restaurants. Since July 2010, restaurants in NYC have been required to post letter grades that reflect the results of inspections by the NYC Health Department. These letter grades are based on numerical scoring of the number and severity of food safety violations identifed during inspections.

As members of the dining public, we are interested in whether the cuisine a restaurant serves has a relationship with the violation points (and corresponding grade) the restaurant receives. More specifically, do the scores of cuisines that are most prevalent and most inspected differ significantly from one another?

While it would be interesting to factor in price, indicators of menu pricing by restaurant are hard to come by. Accordingly, price won’t be considered in this exercise.

knitr::opts_chunk$set(error = TRUE)

# Load a few libraries
if (!require('dplyr')) install.packages('dplyr')
if (!require('tidyr')) install.packages('tidyr')
if (!require('stringr')) install.packages('stringr')
if (!require('readr')) install.packages('readr')
if (!require('magrittr')) install.packages('magrittr')
if (!require('forcats')) install.packages('forcats')
if (!require('purrr')) install.packages('purrr')
if (!require('kableExtra')) install.packages('kableExtra')
if (!require('ggplot2')) install.packages('ggplot2')
if (!require('ggthemes')) install.packages('ggthemes')
if (!require('psych')) install.packages('psych')
if (!require('openintro')) install.packages('openintro')
if (!require('agricolae')) install.packages('agricolae')

2 | DATA

Data collection

The dataset used for this project was collected by the NYC Department of Health as a product of inspections conducted between 2010 and 2017. The Department of Health inspects restaurants unannounced and annually to check for compliance with food handling, food temperature, personal hygiene, and vermin control regulations, among others. Each inspection yields a score based on the number and severity of violations; this, in turn, can map to an immediate or future grade.

In terms of scale, a lower score indicates better compliance with food safety regulations; a higher score, worse:

** * Grade A: 0-13 points * Grade B: 14-27 points * Grade C: 28 or more points**

If a restaurant does not earn an A when first inspected it remains ungraded until the next inspection, which is also unannounced and typically occurs within a month following.

The lack of a point ceiling doesn’t mean restaurants automatically receive a C grade. The Department of Health closes restaurants that operate unsafely and will not allow them to reopen until hazardous conditions have been corrected.

The dataset includes information on restaurants and their locations, when inspections took place, violations and the points assessed for them, grades resulting from the inspections (if any), and information on adjudication. It was sourced through the NYCOpenData initiative and is available to the public online (see references section for a link).

As the dataset is voluminous – containing over 400,000 observations – we will focus on the last few years of inspections (2016 and 2017 calendar years) and the most inspected cuisines (which does not mean the most violations). We assume that the Department of Health is administering inspections without bias, so for the purpose of this analysis we can treat inspections as representative of the total populations of restaurants (i.e. inspections likely correspond to the number of dining establishments).

We ingest, clean, and prepare data for analysis below.

# Load the initial data
insp_raw <- read_csv(inspections.source, 
                         col_names = T,
                         col_types = NULL, 
                         na = character(),
                         quoted_na = T,
                         quote = "\"",
                         comment = "",
                         trim_ws = T,
                         skip = 0,
                         n_max= Inf
                         )

# Subset columns
col_excl <- c(1, 4:7, 12, 16:18)
insp_clean1 <- insp_raw[-col_excl]

# Rename columns
names <- c("name", "boro", "cuisine", "date", "action", "violation", "flag", "score", "grade")
colnames(insp_clean1) <- names

# Parse dates
insp_clean2 <- separate(insp_clean1, date, c("month", "date", "year"), sep = "/")

# Assign column data types
insp_clean2$month <- as.integer(insp_clean2$month)
insp_clean2$date <- as.integer(insp_clean2$date)
insp_clean2$year <- as.integer(insp_clean2$year)
insp_clean2$flag <- as.factor(insp_clean2$flag)

# Set factor levels for flag
insp_clean2$flag <- factor(insp_clean2$flag, levels = c("Critical", "Not Critical", "Not Applicable"))

# Set NA to 0 for scores
insp_clean2$score[is.na(insp_clean2$score)] <- 0
insp_clean2$score <- as.integer(insp_clean2$score)

# Filter out inspections before 2016
insp_clean3 <- insp_clean2 %>%
  filter(year >= 2016)

# Clean up the cuisine column
insp_clean3$cuisine <- insp_clean3$cuisine %>% 
  str_trim() %>% 
  str_replace_all("Deli.*", "Deli") %>% 
  str_replace_all("Latin.*", "Latin") %>% 
  str_replace_all("Ice.*", "Ice cream") %>%
  str_replace_all("Sandwiches.*", "Buffet") %>% 
  str_replace_all("Caf.*", "Cafe") %>%
  str_replace_all("Bottled.*", "Beverage") %>%
  str_replace_all("Juice.*", "Juice") %>% 
  str_replace_all("Not Listed.*", "NA") %>% 
  str_replace_all("Soups & Sandwiches", "Soup/Sandwich") %>%
  str_replace_all("Vietnamese.*", "SEAsian")

# Map violation codes to descriptions
insp_clean3$violation <- insp_clean3$violation %>% 
  str_trim() %>% 
  str_replace_all("^02.", "food temp") %>% 
  str_replace_all("^03.", "food source") %>% 
  str_replace_all("^04.", "food protect") %>% 
  str_replace_all("^05.", "facility design") %>%
  str_replace_all("^06.", "pers hygiene") %>% 
  str_replace_all("^07.", "other critical") %>% 
  str_replace_all("^08.", "vermin / garbage") %>% 
  str_replace_all("^09.", "food source") %>% 
  str_replace_all("^10.", "facility maint") %>% 
  str_replace_all("^15.", "tobacco / smoking") %>% 
  str_replace_all("^16.", "trans fat / labeling") %>% 
  str_replace_all("^18.", "admin / docs") %>% 
  str_replace_all("^20.", "signage") %>% 
  str_replace_all("^22.", "nuisance / misc") %>% 
  str_replace_all("^99.", "other general")

# Simplify description of action taken
insp_clean3$action <- insp_clean3$action %>%
  str_trim() %>% 
  str_replace_all("^Violations were cited.*", "violations") %>% 
  str_replace_all("^No violations.*", "no violations") %>% 
  str_replace_all("^Establishment Closed.*", "closed") %>% 
  str_replace_all("^Establishment re-opened by DOHMH", "re-opened") %>% 
  str_replace_all("^Establishment re-closed by DOHMH", "re-closed")

# Output the cleaned and prepared data to web-accessible CSV files hosted on github
write.csv(insp_clean3, file = "inspections.csv", row.names = F)

We initiate analysis with the fresh dataset made accessible via the web.

# For a more manageable dataset, read in the data cleaned and prepared above from gthub
insp <- read.csv("https://raw.githubusercontent.com/JeremyOBrien16/DATA-606/master/Final%20Project/inspections.csv")

# Reassert flag as factor
insp$flag <- as.factor(insp$flag)
insp$flag <- factor(insp$flag, levels = c("Critical", "Not Critical", "Not Applicable"))

Cases

Each case represents a food safety inspection administered by the NYC Health Department between January 2010 and August 2017. For our analysis over 2016 and 2017 there are 195,132 inspections in total. As explained above, each inspection produces a score but not necessarily a grade.

Variables

The response variable is the inspection score, which is numerical.

Restaurant grades represent an ordinal shorthand for the dining public. Grades are translated from points scored during inspections. Since every inspection can produce points, every case has a score. Scores have the added benefit of being numerical.

# Count inspections by cuisine
insp_cuisine <- insp %>% 
  select(cuisine) %>% 
  group_by(cuisine) %>% 
  summarise(total = n()) %>% 
  arrange(desc(total))
insp_cuisine$cuisine <- factor(insp_cuisine$cuisine, levels = insp_cuisine$cuisine)

# Create name vector of top cuisines by inspections
c5 <- as.vector(head(insp_cuisine$cuisine, 5))
c20 <- as.vector(head(insp_cuisine$cuisine, 20))

# Bar chart of top 20 cuisines by inspections
insp_cuisine %>% 
  filter(cuisine %in% c20) %>% 
  ggplot(aes(x = fct_rev(cuisine), y = total)) +
  geom_bar(stat = "identity", width = .5) +
  ggtitle("Top 20 Cuisines by Inspections") +
  coord_flip() +
  scale_fill_brewer(palette = "RdBu") +
  theme_economist() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_text(vjust = 0.5, size = 11),
        legend.position = "none")

The explanatory variable is cuisine, which is categorical. There are 83 possible categories of cuisine in the dataset. We will focus on the most prevalent five categories: American, Chinese, Pizza, Cafe, and Latin.

# Subset inspections for five cuisines: American, Chinese, Pizza, Cafe, and Latin
insp_c5 <- insp %>% 
  filter(cuisine %in% c5)

Type of study

This is an observational study based on empirical data collected for enforcement purposes. It was not designed as an experiment.

3 | SCOPE OF INFERENCE

Generalizability

The population of interest is the restaurants of NYC.

An assumption is that the Department of Health is consistent, timely, and unbiased in the administration of inspections - i.e. that it does not inspect some restaurants more or less than others either based on neighborhood, cuisine, time of year, or other factors.

Since scores are the product of inspectors’ judgement, it is conceivable that they may be effected by preferences, biases, inducements, and even corruption - for instance, effecting which restaurants are chosen for inspection and when, whether they are unannounced, the level of scrutiny, etc.

If we assume consistency, timeliness, and lack of bias, we can generalize findings to the population.

Causality

While we can treat the violation scores and grading system as an empirical signal of food safety, we should not make causal inferences. Scores and grades are the product of inspector evaluations and there is considerable room for inconsistent interpretation, selective enforcement, score inflation, and even temporal factors (i.e. overworked inspectors during a busy season may be lax or draconian).

4 | EXPLORATORY DATA ANALYSIS

# A grade cutoff for boxplot
cutoff <- data.frame(yintercept = 14, cutoff = factor(14))

# Boxplot of five cuisines
insp_c5 %>% 
  ggplot(aes(x = cuisine, y = score)) + 
  geom_boxplot(outlier.size=2) +
  ggtitle("Top 5 Cuisines by Scores") +
  geom_hline(aes(yintercept = yintercept, linetype = cutoff), data = cutoff, color = "red", size = 1.5) +
  geom_text(aes(0, 12, label = "A Grade", hjust = -.1, vjust = -1, color = "red")) +
    theme_economist() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_text(vjust = 0.5, size = 11),
        legend.position = "none")

# Summary stats on score for the five top cuisines
insp_Amr <- insp_c5 %>% filter(cuisine %in% c("American")) %>% select(score) %>% describe()
insp_Chi <- insp_c5 %>% filter(cuisine %in% c("Chinese")) %>% select(score) %>% describe()
insp_Piz <- insp_c5 %>% filter(cuisine %in% c("Pizza")) %>% select(score) %>% describe()
insp_Caf <- insp_c5 %>% filter(cuisine %in% c("Cafe")) %>% select(score) %>% describe()
insp_Ltn <- insp_c5 %>% filter(cuisine %in% c("Latin")) %>% select(score) %>% describe()
insp_c5_stats <- as.data.frame(rbind(insp_Amr, insp_Chi, insp_Piz, insp_Caf, insp_Ltn))
rownames(insp_c5_stats) <- c("American", "Chinese", "Pizza", "Cafe", "Latin")
kable(insp_c5_stats)

	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
American	1	43152	17.43340	13.11186	13	15.73452	8.8956	-1	137	138	1.891151	6.330112	0.0631195
Chinese	1	20383	19.80930	14.27297	16	17.85693	10.3782	-1	120	121	1.703715	4.615018	0.0999724
Pizza	1	8970	17.69810	13.40581	13	15.95819	8.8956	-1	106	107	1.901941	5.958992	0.1415458
Cafe	1	8863	15.60961	13.15831	12	13.65774	7.4130	0	87	87	1.821250	4.398140	0.1397687
Latin	1	8524	19.54329	14.58333	16	17.65352	8.8956	0	139	139	2.307180	9.957826	0.1579556

All cuisines are right-skewed – Chinese least of all, and Latin most of all – with a host of outliers exerting pull on means. The red horizontal line represents the violation point cutoff - below the line restaurants receive an A, above it a B, C, or no grade. Interestingly, the medianhovers just below the cutoff for American, Pizza, and Cafe, and just above it for Chinese and Latin – such that more than half of inspections for those cuisines would not merit A grades. Kurtosis scores are all above 0, meaning distributions are leptokurtic with steeper peaks than normal.

# Restaurant inspections by cuisine for each borough
insp_c5 %>% 
  ggplot(aes(boro)) +
  geom_bar(aes(fill = cuisine), width = .5) +
  ggtitle("Inspections by Cuisine") +
  scale_fill_brewer(palette = "RdBu") +
  theme_economist() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_text(vjust = 0.5, size = 11),
        legend.position = "right",
        legend.title = element_blank(),
        legend.text = element_text(size = 11))

# Restaurant inspection proportions by cuisine for each borough
insp_c5 %>% 
  ggplot(aes(boro, fill = cuisine)) +
  geom_bar(position = "fill") +
  ggtitle("Proportion of Inspections by Cuisine") +
  scale_fill_brewer(palette = "RdBu") +
  theme_economist() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_text(vjust = 0.5, size = 11),
        legend.position = "right",
        legend.title = element_blank(),
        legend.text = element_text(size = 11))

Each cuisine is represented in inspections within each boroughs, but in different proportions. In Manhattan and Staten Island, American restaurants are far and away the most inspected. In Queens Chinese restaurants eke out just ahead of American, and in Brooklyn just behind. In Brooklyn cafe inspections are relatively more common than in other boroughs; in the Bronx, Latin restaurants’. Staten Island and the Bronx see the highest proportion of pizza establishment inspections.

# Restaurant inspections by cuisine for each borough 
insp_c5 %>% 
  select(boro, cuisine, flag) %>% 
  group_by(boro) %>% 
  ggplot(aes(boro)) +
  geom_bar(aes(fill = flag), width = .5) +
  ggtitle("Critical Violations by Cuisine") +
  scale_fill_brewer(palette = "RdBu") +
  theme_economist() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_text(vjust = 0.5, size = 11),
        legend.position = "right",
        legend.title = element_blank(),
        legend.text = element_text(size = 11))

# Restaurant inspection proportions by cuisine for each borough
insp_c5 %>% 
  ggplot(aes(boro, fill = flag)) +
  geom_bar(position = "fill") +
  ggtitle("Proportion of Critical Violations by Cuisine") +

    scale_fill_brewer(palette = "RdBu") +
  theme_economist() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_text(vjust = 0.5, size = 11),
        legend.position = "right",
        legend.title = element_blank(),
        legend.text = element_text(size = 11))

Critical violations are fairly consistent across boroughs, ranging just above half of all inspections.

5 | INFERENCE

Hypothesis

H0: The cuisine a restaurant serves does not have a relationship with the score it receives when inspected.

HA: The cuisine a restaurant serves has a relationship with the score it receives whe inspected.

ANOVA

Before conducting a one-way ANOVA looking at the top five cuisines between 2016 and 2017 and their violation scores we confirm that the conditions for inference are satisifed:

Normality: There is some skew and kurtosis in the distributions of each cuisine, but ANOVA is typically robust to both with sufficiently large sample, which we have in spade. We’ll treat this conditions as satified.
Homoscedasticity: The standard deviation is not markedly different between groups - the Bartlett test (below) concurs.
Sample size: This condition is met by the large number of observations for each cuisine.
Independent observations: One of our initial assumptions is that the Department of Health has ensured there isn’t bias between or within the groups. We’ll treat this condition as met.

bartlett.test(data = insp_c5, score ~ cuisine)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  score by cuisine
## Bartlett's K-squared = 318.92, df = 4, p-value < 2.2e-16

We proceed with ANOVA.

# Fit an ANOVA model to the data
insp_c5_fit <- aov(data = insp_c5, score ~ cuisine)

# Summarize the findings
summary(insp_c5_fit)

##                Df   Sum Sq Mean Sq F value Pr(>F)    
## cuisine         4   152309   38077   207.1 <2e-16 ***
## Residuals   89887 16529595     184                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Visualize the findings
plot(insp_c5_fit)

The p-value is below the significance level of .05, so we can dimiss the null hypothesis and conclude that there are significant differences between cuisines.

The flat LOWESS (locally weighted scatterplot smoothing) line in the residuals vs. fitted plot indicates that the predictor is sound, and the relatively flat lowess line in the scale-location plot supports homogeneity of variance. The normal Q-Q plot does show pronounced right-side skew.

A Tukey Honest Significant Difference test can help us to identify between which cuisines there are differences in means, testing each pair and providing statistics.

# Tukey HSD test
tukey <- as.data.frame(TukeyHSD(insp_c5_fit)$cuisine)
tukey$pair <- rownames(tukey)
kable(tukey[,1:4])

	diff	lwr	upr	p adj
Cafe-American	-1.8237852	-2.2551698	-1.3924007	0.0000000
Chinese-American	2.3759036	2.0615177	2.6902896	0.0000000
Latin-American	2.1098913	1.6714481	2.5483345	0.0000000
Pizza-American	0.2647066	-0.1645381	0.6939513	0.4450073
Chinese-Cafe	4.1996889	3.7290369	4.6703409	0.0000000
Latin-Cafe	3.9336765	3.3725105	4.4948425	0.0000000
Pizza-Cafe	2.0884918	1.5344829	2.6425007	0.0000000
Latin-Chinese	-0.2660123	-0.7431425	0.2111178	0.5488382
Pizza-Chinese	-2.1111971	-2.5798886	-1.6425056	0.0000000
Pizza-Latin	-1.8451847	-2.4047075	-1.2856620	0.0000000

# Plot TukeyHSD pairs
tukey %>% 
  ggplot(aes(`p adj`)) +
  geom_hline(yintercept = 0, lty = "11", color = "grey30") +
  geom_errorbar(aes(pair, ymin = lwr, ymax = upr), width = 0.2) +
  geom_point(aes(pair, diff)) +
  ggtitle("Tukey Honest Significant Difference Test") +
  coord_flip() +
  scale_fill_brewer(palette = "RdBu") +
  theme_economist() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_text(vjust = 0.5, size = 11),
        legend.position = "right",
        legend.title = element_blank(),
        legend.text = element_text(size = 11))

We find differences below the significance level of .05 for every pairing except Pizza and American cuisine, and Latin and Chinese cuisine.

6 | CONCLUSION

The ANOVA test we conducted yielded a p-value below .05, allowing us to dismiss the null hypothesis that the cuisine a restaurant serves does not have a relastionship with the points and grades it receives from inspections. Using a post hoc Tukey HSD test, we investigated the pairwise combinations possible between the five most inspected cuisines and found a significant difference in eight of the ten combinations. Based on this analysis, we can say that there is a relationship between cuisine and score. The precise nature of that relationship requires further exploration.

We truncated the dataset for ease of computation, but this analysis could be extended to the full seven years of inspection results for all 83 cuisines. In that case, we would want to identify any longitudinal trends (i.e. score inflation / deflation, covariance in shrinking / growth of cuisine types, etc.) and account for them.

Our starting assumption was that the Department of Health adminsters inspection consistently, in a timely fashion, and without bias. Given that we’ve encountered significant differences between cuisines, it worth questioning this assumption and investigating ways bias may creep into inspection patterns and outputs – via geography, temporal clustering, types of violations, etc.

Another facet of our assumptions was that inspections represent the total population of restaurants. For a more extensive analysis, it would be advisable to deduplicate inspections by restaurants to identify patterns in inspection and re-inspection, between the large number of outliers, and whether there is covariance with certain types of violations.

Additionally, further exploring the relationship between cuisine and borough – the higher prevalence of Chinese restaurants in Queens and cafes in Brooklyn – could yield insight into how inpsections play for the long tail.

7 | REFERENCES

https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j http://www1.nyc.gov/site/doh/services/restaurant-grades.page http://www1.nyc.gov/assets/doh/downloads/pdf/rii/how-we-score-grade.pdf https://www1.nyc.gov/assets/doh/downloads/pdf/about/healthcode/health-code-chapter23.pdf

DATA 606 Data Project | NYC Restaurant Grades

Jeremy O’Brien

May 19, 2018