Introduction

The motivation for this study is to better understand New York City’s food establishment inspection and restaurant letter grading program. As a New York city resident, I typically choose to patronize or to not patronize a food establishment based on the letter grade displayed in the window. Ninety-five percent of the time the choice is an establishment with an ‘A’ letter grade rating.

The preference for food establishments with an ‘A’ letter grade rating is made on the assumption that an ‘A’ letter grade meant the establishment had no violations that would turn me away. Conversely, the assumption is a ‘C’ letter grade meant the conditions are less than desirable.

New York City is know for its restaurants. Many people from around the globe visit and look forward to sampling delicious food. Anyone who patronizes New York City food establishments would be interested in understanding the grading system so that they can make informed decisions about whether to choose an establishment based on the letter grade rating.

New York City Letter Grading Program

New York City’s inspection and letter grading program data set contains a listing of active restaurants and college cafeterias that have been inspected. The program started on July 27, 2010. The program allows for dual inspections where restaurants are allowed to improve before receiving a letter grade.

An initial inspection is followed by a re-inspection no less than 7 days later for restaurants that don’t receive an “A” grade on their initial inspection. A score less than 14 results in a letter grade of ‘A’ during the initial and re-inspection, a re-inspection score of 14-27 results in a ‘B’ and re-inspection of score greater than 27 results in ‘C’.

Data Collection

The data were compiled from several New York City Department of Health and Mental Hygiene administrative systems. There are some data quality issues related to data entry or data transfer.

Each row represent a violation with 18 variables about the inspection. There are 1,000 cases. Some cases are not complete. Cases with missing values were removed which resulted in 516 cases.

The data set contains every sustained or not yet adjudicated violation citation from every full or special program inspection conducted up to three years prior to the most recent inspection for restaurants and college cafeterias

Scope of inference

These data and findings are of interest to restaurant owners and those who are considering opening food establishments in New York City. This information can use to determine the relationship between the inspection and letter grade received.

Type of study

This is an observational study. The response variable is the grade the food establishment receives. The response variable is categorical. The explanatory variable is the inspection type, whether the inspection is initial or a re-inspection. The inspection type is a categorical variable.

Explore the data

library(jsonlite)
library(stringr)
library(dplyr)
library(tm)
library(qdap)
library(wordcloud)
library(RColorBrewer)
library(descr)
library(plyr)
library(tidyr)
library(ggplot2)
# Read the data
data <- fromJSON("https://data.cityofnewyork.us/resource/xx67-kt59.json")
unlink(data)
head(data)
##   cuisine_description                   dba         record_date  boro
## 1              Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## 2              Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## 3              Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## 4              Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## 5              Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## 6              Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
##       inspection_date building zipcode score      phone          street
## 1 2016-02-18T00:00:00     1007   10462    10 7188924968 MORRIS PARK AVE
## 2 2016-02-18T00:00:00     1007   10462    10 7188924968 MORRIS PARK AVE
## 3 2015-02-09T00:00:00     1007   10462     6 7188924968 MORRIS PARK AVE
## 4 2014-03-03T00:00:00     1007   10462     2 7188924968 MORRIS PARK AVE
## 5 2013-10-10T00:00:00     1007   10462  <NA> 7188924968 MORRIS PARK AVE
## 6 2013-09-11T00:00:00     1007   10462     6 7188924968 MORRIS PARK AVE
##   grade  critical_flag    camis
## 1     A       Critical 30075445
## 2     A   Not Critical 30075445
## 3     A       Critical 30075445
## 4     A   Not Critical 30075445
## 5  <NA> Not Applicable 30075445
## 6     A       Critical 30075445
##                                                        action
## 1             Violations were cited in the following area(s).
## 2             Violations were cited in the following area(s).
## 3             Violations were cited in the following area(s).
## 4             Violations were cited in the following area(s).
## 5 No violations were recorded at the time of this inspection.
## 6             Violations were cited in the following area(s).
##   violation_code
## 1            04L
## 2            08A
## 3            06C
## 4            10F
## 5           <NA>
## 6            04L
##                                                                                                                                                                                                                                                             violation_description
## 1                                                                                                                                                                                                 Evidence of mice or live mice present in facility's food and/or non-food areas.
## 2                                                                                                                                              Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exist.
## 3                                                                                                                                                      Food not protected from potential source of contamination during storage, preparation, transportation, display or service.
## 4 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
## 5                                                                                                                                                                                                                                                                            <NA>
## 6                                                                                                                                                                                                 Evidence of mice or live mice present in facility's food and/or non-food areas.
##            grade_date                          inspection_type
## 1 2016-02-18T00:00:00    Cycle Inspection / Initial Inspection
## 2 2016-02-18T00:00:00    Cycle Inspection / Initial Inspection
## 3 2015-02-09T00:00:00    Cycle Inspection / Initial Inspection
## 4 2014-03-03T00:00:00    Cycle Inspection / Initial Inspection
## 5                <NA> Trans Fat / Second Compliance Inspection
## 6 2013-09-11T00:00:00         Cycle Inspection / Re-inspection
# Explore the data
class(data) # explore class
## [1] "data.frame"
dim(data)   # view dimension
## [1] 1000   18
names(data) # view variable names
##  [1] "cuisine_description"   "dba"                  
##  [3] "record_date"           "boro"                 
##  [5] "inspection_date"       "building"             
##  [7] "zipcode"               "score"                
##  [9] "phone"                 "street"               
## [11] "grade"                 "critical_flag"        
## [13] "camis"                 "action"               
## [15] "violation_code"        "violation_description"
## [17] "grade_date"            "inspection_type"
str(data) # understand the structure
## 'data.frame':    1000 obs. of  18 variables:
##  $ cuisine_description  : chr  "Bakery" "Bakery" "Bakery" "Bakery" ...
##  $ dba                  : chr  "MORRIS PARK BAKE SHOP" "MORRIS PARK BAKE SHOP" "MORRIS PARK BAKE SHOP" "MORRIS PARK BAKE SHOP" ...
##  $ record_date          : chr  "2016-12-14T06:01:41" "2016-12-14T06:01:41" "2016-12-14T06:01:41" "2016-12-14T06:01:41" ...
##  $ boro                 : chr  "BRONX" "BRONX" "BRONX" "BRONX" ...
##  $ inspection_date      : chr  "2016-02-18T00:00:00" "2016-02-18T00:00:00" "2015-02-09T00:00:00" "2014-03-03T00:00:00" ...
##  $ building             : chr  "1007" "1007" "1007" "1007" ...
##  $ zipcode              : chr  "10462" "10462" "10462" "10462" ...
##  $ score                : chr  "10" "10" "6" "2" ...
##  $ phone                : chr  "7188924968" "7188924968" "7188924968" "7188924968" ...
##  $ street               : chr  "MORRIS PARK AVE" "MORRIS PARK AVE" "MORRIS PARK AVE" "MORRIS PARK AVE" ...
##  $ grade                : chr  "A" "A" "A" "A" ...
##  $ critical_flag        : chr  "Critical" "Not Critical" "Critical" "Not Critical" ...
##  $ camis                : chr  "30075445" "30075445" "30075445" "30075445" ...
##  $ action               : chr  "Violations were cited in the following area(s)." "Violations were cited in the following area(s)." "Violations were cited in the following area(s)." "Violations were cited in the following area(s)." ...
##  $ violation_code       : chr  "04L" "08A" "06C" "10F" ...
##  $ violation_description: chr  "Evidence of mice or live mice present in facility's food and/or non-food areas." "Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exi"| __truncated__ "Food not protected from potential source of contamination during storage, preparation, transportation, display or service." "Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly m"| __truncated__ ...
##  $ grade_date           : chr  "2016-02-18T00:00:00" "2016-02-18T00:00:00" "2015-02-09T00:00:00" "2014-03-03T00:00:00" ...
##  $ inspection_type      : chr  "Cycle Inspection / Initial Inspection" "Cycle Inspection / Initial Inspection" "Cycle Inspection / Initial Inspection" "Cycle Inspection / Initial Inspection" ...
sapply(data, function(x) sum(is.na(x))) # count the number of NAs
##   cuisine_description                   dba           record_date 
##                     0                     0                     0 
##                  boro       inspection_date              building 
##                     0                     0                     0 
##               zipcode                 score                 phone 
##                     0                    66                     0 
##                street                 grade         critical_flag 
##                     0                   470                     0 
##                 camis                action        violation_code 
##                     0                     0                    10 
## violation_description            grade_date       inspection_type 
##                    10                   470                     0
# Create dataset with needed variables
sudData <- select(data, camis, cuisine_description, dba, boro, zipcode, record_date, inspection_date, score, grade, critical_flag, action, violation_code,
violation_description, inspection_type)

glimpse(sudData)
## Observations: 1,000
## Variables: 14
## $ camis                 <chr> "30075445", "30075445", "30075445", "300...
## $ cuisine_description   <chr> "Bakery", "Bakery", "Bakery", "Bakery", ...
## $ dba                   <chr> "MORRIS PARK BAKE SHOP", "MORRIS PARK BA...
## $ boro                  <chr> "BRONX", "BRONX", "BRONX", "BRONX", "BRO...
## $ zipcode               <chr> "10462", "10462", "10462", "10462", "104...
## $ record_date           <chr> "2016-12-14T06:01:41", "2016-12-14T06:01...
## $ inspection_date       <chr> "2016-02-18T00:00:00", "2016-02-18T00:00...
## $ score                 <chr> "10", "10", "6", "2", NA, "6", "6", "32"...
## $ grade                 <chr> "A", "A", "A", "A", NA, "A", "A", NA, NA...
## $ critical_flag         <chr> "Critical", "Not Critical", "Critical", ...
## $ action                <chr> "Violations were cited in the following ...
## $ violation_code        <chr> "04L", "08A", "06C", "10F", NA, "04L", "...
## $ violation_description <chr> "Evidence of mice or live mice present i...
## $ inspection_type       <chr> "Cycle Inspection / Initial Inspection",...
noNa <- na.omit(sudData)
sapply(noNa, function(x) sum(is.na(x))) # count the number of NAs
##                 camis   cuisine_description                   dba 
##                     0                     0                     0 
##                  boro               zipcode           record_date 
##                     0                     0                     0 
##       inspection_date                 score                 grade 
##                     0                     0                     0 
##         critical_flag                action        violation_code 
##                     0                     0                     0 
## violation_description       inspection_type 
##                     0                     0
head(noNa)
##      camis cuisine_description                   dba  boro zipcode
## 1 30075445              Bakery MORRIS PARK BAKE SHOP BRONX   10462
## 2 30075445              Bakery MORRIS PARK BAKE SHOP BRONX   10462
## 3 30075445              Bakery MORRIS PARK BAKE SHOP BRONX   10462
## 4 30075445              Bakery MORRIS PARK BAKE SHOP BRONX   10462
## 6 30075445              Bakery MORRIS PARK BAKE SHOP BRONX   10462
## 7 30075445              Bakery MORRIS PARK BAKE SHOP BRONX   10462
##           record_date     inspection_date score grade critical_flag
## 1 2016-12-14T06:01:41 2016-02-18T00:00:00    10     A      Critical
## 2 2016-12-14T06:01:41 2016-02-18T00:00:00    10     A  Not Critical
## 3 2016-12-14T06:01:41 2015-02-09T00:00:00     6     A      Critical
## 4 2016-12-14T06:01:41 2014-03-03T00:00:00     2     A  Not Critical
## 6 2016-12-14T06:01:41 2013-09-11T00:00:00     6     A      Critical
## 7 2016-12-14T06:01:41 2013-09-11T00:00:00     6     A      Critical
##                                            action violation_code
## 1 Violations were cited in the following area(s).            04L
## 2 Violations were cited in the following area(s).            08A
## 3 Violations were cited in the following area(s).            06C
## 4 Violations were cited in the following area(s).            10F
## 6 Violations were cited in the following area(s).            04L
## 7 Violations were cited in the following area(s).            04N
##                                                                                                                                                                                                                                                                                  violation_description
## 1                                                                                                                                                                                                                      Evidence of mice or live mice present in facility's food and/or non-food areas.
## 2                                                                                                                                                                   Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exist.
## 3                                                                                                                                                                           Food not protected from potential source of contamination during storage, preparation, transportation, display or service.
## 4                      Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
## 6                                                                                                                                                                                                                      Evidence of mice or live mice present in facility's food and/or non-food areas.
## 7 Filth flies or food/refuse/sewage-associated (FRSA) flies present in facility\032s food and/or non-food areas. Filth flies include house flies, little house flies, blow flies, bottle flies and flesh flies. Food/refuse/sewage-associated flies include fruit flies, drain flies and Phorid flies.
##                         inspection_type
## 1 Cycle Inspection / Initial Inspection
## 2 Cycle Inspection / Initial Inspection
## 3 Cycle Inspection / Initial Inspection
## 4 Cycle Inspection / Initial Inspection
## 6      Cycle Inspection / Re-inspection
## 7      Cycle Inspection / Re-inspection
tail(noNa)
##        camis cuisine_description             dba   boro zipcode
## 989 40364449              German GOTTSCHEER HALL QUEENS   11385
## 990 40364449              German GOTTSCHEER HALL QUEENS   11385
## 991 40364449              German GOTTSCHEER HALL QUEENS   11385
## 992 40364449              German GOTTSCHEER HALL QUEENS   11385
## 995 40364449              German GOTTSCHEER HALL QUEENS   11385
## 996 40364449              German GOTTSCHEER HALL QUEENS   11385
##             record_date     inspection_date score grade critical_flag
## 989 2016-12-14T06:01:41 2015-10-22T00:00:00    26     B      Critical
## 990 2016-12-14T06:01:41 2015-10-22T00:00:00    26     B      Critical
## 991 2016-12-14T06:01:41 2015-10-22T00:00:00    26     B      Critical
## 992 2016-12-14T06:01:41 2015-10-22T00:00:00    26     B  Not Critical
## 995 2016-12-14T06:01:41 2015-04-23T00:00:00     8     A      Critical
## 996 2016-12-14T06:01:41 2015-04-23T00:00:00     8     A  Not Critical
##                                              action violation_code
## 989 Violations were cited in the following area(s).            04A
## 990 Violations were cited in the following area(s).            04L
## 991 Violations were cited in the following area(s).            06C
## 992 Violations were cited in the following area(s).            08A
## 995 Violations were cited in the following area(s).            06D
## 996 Violations were cited in the following area(s).            10F
##                                                                                                                                                                                                                                                               violation_description
## 989                                                                                                                                                                                                          Food Protection Certificate not held by supervisor of food operations.
## 990                                                                                                                                                                                                 Evidence of mice or live mice present in facility's food and/or non-food areas.
## 991                                                                                                                                                      Food not protected from potential source of contamination during storage, preparation, transportation, display or service.
## 992                                                                                                                                              Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exist.
## 995                                                                                                                                  Food contact surface not properly washed, rinsed and sanitized after each use and following any activity when contamination may have occurred.
## 996 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
##                      inspection_type
## 989 Cycle Inspection / Re-inspection
## 990 Cycle Inspection / Re-inspection
## 991 Cycle Inspection / Re-inspection
## 992 Cycle Inspection / Re-inspection
## 995 Cycle Inspection / Re-inspection
## 996 Cycle Inspection / Re-inspection
noNa <- noNa[noNa$grade != 'Z',] # remove misc grade coding
noNa <- noNa[noNa$inspection_type != 'Cycle Inspection / Reopening Inspection',]
table(noNa$inspection_type)
## 
## Cycle Inspection / Initial Inspection 
##                                   267 
##      Cycle Inspection / Re-inspection 
##                                   241
table(noNa$grade)
## 
##   A   B   C 
## 438  58  12
noNa$grade2 <- as.integer(factor(noNa$grade, levels=c('C','B','A'))) # convert to numeric
table(noNa$grade2)
## 
##   1   2   3 
##  12  58 438

Visualize the data

# Frequency 
counts <- table(noNa$grade) 
barplot(counts, col="red", main = "Restaurant Grade Frequency",
        xlab = "Grade", ylab = "Frequency")

type <- table (noNa$cuisine_description)
barplot(type, col="red", main = "Cusine Description Frequency",
        xlab = "Cusine Description", ylab = "Frequency")

summary(noNa)
##     camis           cuisine_description     dba           
##  Length:508         Length:508          Length:508        
##  Class :character   Class :character    Class :character  
##  Mode  :character   Mode  :character    Mode  :character  
##                                                           
##                                                           
##                                                           
##      boro             zipcode          record_date       
##  Length:508         Length:508         Length:508        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  inspection_date       score              grade          
##  Length:508         Length:508         Length:508        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  critical_flag         action          violation_code    
##  Length:508         Length:508         Length:508        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  violation_description inspection_type        grade2     
##  Length:508            Length:508         Min.   :1.000  
##  Class :character      Class :character   1st Qu.:3.000  
##  Mode  :character      Mode  :character   Median :3.000  
##                                           Mean   :2.839  
##                                           3rd Qu.:3.000  
##                                           Max.   :3.000
# Analyze violation description
noNa$violation_description <- tolower(noNa$violation_description) 
noNa$violation_description <- tm::removeNumbers(noNa$violation_description)
noNa$violation_description <- str_replace_all(noNa$violation_description, "  ", "") 
# replace double spaces with single space
noNa$violation_description <- str_replace_all(noNa$violation_description, pattern = "[[:punct:]]", " ")
noNa$violation_description <- tm::removeWords(x = noNa$violation_description, stopwords(kind = "SMART"))
crp <- Corpus(VectorSource(noNa$violation_description)) # convert into corpus
freq_terms(text.var = noNa$violation_description, top = 10) # find the 10 most frequent words
##    WORD        FREQ
## 1  food         598
## 2  contact      295
## 3  surface      292
## 4  properly     291
## 5  improperly   270
## 6  maintained   181
## 7  equipment    176
## 8  vermin       162
## 9  flies        144
## 10 constructed  128
pal = brewer.pal(9,"Blues")
wordcloud(words = noNa$violation_description, scale=c(5,0.5), max.words=100, random.order=FALSE, colors = brewer.pal(8, "Dark2")) # violation description

wordcloud(words = noNa$cuisine_description, max.words=100, random.order=FALSE, colors = brewer.pal(8, "Dark2")) # cuisine description

Data Analysis

Since the data are not random and the response variable is categorical with more than 2 levels the t- test is used to analyze the data.

\[H_0 :p =0.50\] Inspection type does not have any affect on the grade a restaurant receives \[H_a :p >0.50\] Inspection type does have any affect on the grade a restaurant receives

T Test

boxplot(noNa$grade2 ~ noNa$inspection_type,
        col=(c("gold","red")),
        main = "Letter Grade by Inspection Type",
        xlab = "Inspection Type", ylab = "Letter Grade")

t.test(noNa$grade2 ~ noNa$inspection_type) # grade vs inspection type
## 
##  Welch Two Sample t-test
## 
## data:  noNa$grade2 by noNa$inspection_type
## t = 9.2595, df = 240, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2678631 0.4126348
## sample estimates:
## mean in group Cycle Inspection / Initial Inspection 
##                                            3.000000 
##      mean in group Cycle Inspection / Re-inspection 
##                                            2.659751
Letter grad and inspection type

Based on the results of the t-test, it can be said at the 95% percent confidence interval, there is a significant difference between the inspection type and the letter grade received - the p-value is much smaller than .05 (p = 2.2e-16). The null hypothesis is rejected. The maximum differences in the mean is between .28 and .42.

This is a fair assessment, if we look at the box-plot above, it shows all establishments receive a letter grade of ‘A’ after the initial inspection. Establishments that receive an ‘A’ during the initial inspection will be placed on a 12 month inspection schedule.

Due to the size of the p-value there are concerns about the accuracy. A p-value < 2.2e-16 is smallest possible value that can be stored in the system.