The motivation for this study is to better understand New York City’s food establishment inspection and restaurant letter grading program. As a New York city resident, I typically choose to patronize or to not patronize a food establishment based on the letter grade displayed in the window. Ninety-five percent of the time the choice is an establishment with an ‘A’ letter grade rating.
The preference for food establishments with an ‘A’ letter grade rating is made on the assumption that an ‘A’ letter grade meant the establishment had no violations that would turn me away. Conversely, the assumption is a ‘C’ letter grade meant the conditions are less than desirable.
New York City is know for its restaurants. Many people from around the globe visit and look forward to sampling delicious food. Anyone who patronizes New York City food establishments would be interested in understanding the grading system so that they can make informed decisions about whether to choose an establishment based on the letter grade rating.
New York City’s inspection and letter grading program data set contains a listing of active restaurants and college cafeterias that have been inspected. The program started on July 27, 2010. The program allows for dual inspections where restaurants are allowed to improve before receiving a letter grade.
An initial inspection is followed by a re-inspection no less than 7 days later for restaurants that don’t receive an “A” grade on their initial inspection. A score less than 14 results in a letter grade of ‘A’ during the initial and re-inspection, a re-inspection score of 14-27 results in a ‘B’ and re-inspection of score greater than 27 results in ‘C’.
The data were compiled from several New York City Department of Health and Mental Hygiene administrative systems. There are some data quality issues related to data entry or data transfer.
Each row represent a violation with 18 variables about the inspection. There are 1,000 cases. Some cases are not complete. Cases with missing values were removed which resulted in 516 cases.
The data set contains every sustained or not yet adjudicated violation citation from every full or special program inspection conducted up to three years prior to the most recent inspection for restaurants and college cafeterias
These data and findings are of interest to restaurant owners and those who are considering opening food establishments in New York City. This information can use to determine the relationship between the inspection and letter grade received.
This is an observational study. The response variable is the grade the food establishment receives. The response variable is categorical. The explanatory variable is the inspection type, whether the inspection is initial or a re-inspection. The inspection type is a categorical variable.
library(jsonlite)
library(stringr)
library(dplyr)
library(tm)
library(qdap)
library(wordcloud)
library(RColorBrewer)
library(descr)
library(plyr)
library(tidyr)
library(ggplot2)
# Read the data
data <- fromJSON("https://data.cityofnewyork.us/resource/xx67-kt59.json")
unlink(data)
head(data)
## cuisine_description dba record_date boro
## 1 Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## 2 Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## 3 Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## 4 Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## 5 Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## 6 Bakery MORRIS PARK BAKE SHOP 2016-12-14T06:01:41 BRONX
## inspection_date building zipcode score phone street
## 1 2016-02-18T00:00:00 1007 10462 10 7188924968 MORRIS PARK AVE
## 2 2016-02-18T00:00:00 1007 10462 10 7188924968 MORRIS PARK AVE
## 3 2015-02-09T00:00:00 1007 10462 6 7188924968 MORRIS PARK AVE
## 4 2014-03-03T00:00:00 1007 10462 2 7188924968 MORRIS PARK AVE
## 5 2013-10-10T00:00:00 1007 10462 <NA> 7188924968 MORRIS PARK AVE
## 6 2013-09-11T00:00:00 1007 10462 6 7188924968 MORRIS PARK AVE
## grade critical_flag camis
## 1 A Critical 30075445
## 2 A Not Critical 30075445
## 3 A Critical 30075445
## 4 A Not Critical 30075445
## 5 <NA> Not Applicable 30075445
## 6 A Critical 30075445
## action
## 1 Violations were cited in the following area(s).
## 2 Violations were cited in the following area(s).
## 3 Violations were cited in the following area(s).
## 4 Violations were cited in the following area(s).
## 5 No violations were recorded at the time of this inspection.
## 6 Violations were cited in the following area(s).
## violation_code
## 1 04L
## 2 08A
## 3 06C
## 4 10F
## 5 <NA>
## 6 04L
## violation_description
## 1 Evidence of mice or live mice present in facility's food and/or non-food areas.
## 2 Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exist.
## 3 Food not protected from potential source of contamination during storage, preparation, transportation, display or service.
## 4 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
## 5 <NA>
## 6 Evidence of mice or live mice present in facility's food and/or non-food areas.
## grade_date inspection_type
## 1 2016-02-18T00:00:00 Cycle Inspection / Initial Inspection
## 2 2016-02-18T00:00:00 Cycle Inspection / Initial Inspection
## 3 2015-02-09T00:00:00 Cycle Inspection / Initial Inspection
## 4 2014-03-03T00:00:00 Cycle Inspection / Initial Inspection
## 5 <NA> Trans Fat / Second Compliance Inspection
## 6 2013-09-11T00:00:00 Cycle Inspection / Re-inspection
# Explore the data
class(data) # explore class
## [1] "data.frame"
dim(data) # view dimension
## [1] 1000 18
names(data) # view variable names
## [1] "cuisine_description" "dba"
## [3] "record_date" "boro"
## [5] "inspection_date" "building"
## [7] "zipcode" "score"
## [9] "phone" "street"
## [11] "grade" "critical_flag"
## [13] "camis" "action"
## [15] "violation_code" "violation_description"
## [17] "grade_date" "inspection_type"
str(data) # understand the structure
## 'data.frame': 1000 obs. of 18 variables:
## $ cuisine_description : chr "Bakery" "Bakery" "Bakery" "Bakery" ...
## $ dba : chr "MORRIS PARK BAKE SHOP" "MORRIS PARK BAKE SHOP" "MORRIS PARK BAKE SHOP" "MORRIS PARK BAKE SHOP" ...
## $ record_date : chr "2016-12-14T06:01:41" "2016-12-14T06:01:41" "2016-12-14T06:01:41" "2016-12-14T06:01:41" ...
## $ boro : chr "BRONX" "BRONX" "BRONX" "BRONX" ...
## $ inspection_date : chr "2016-02-18T00:00:00" "2016-02-18T00:00:00" "2015-02-09T00:00:00" "2014-03-03T00:00:00" ...
## $ building : chr "1007" "1007" "1007" "1007" ...
## $ zipcode : chr "10462" "10462" "10462" "10462" ...
## $ score : chr "10" "10" "6" "2" ...
## $ phone : chr "7188924968" "7188924968" "7188924968" "7188924968" ...
## $ street : chr "MORRIS PARK AVE" "MORRIS PARK AVE" "MORRIS PARK AVE" "MORRIS PARK AVE" ...
## $ grade : chr "A" "A" "A" "A" ...
## $ critical_flag : chr "Critical" "Not Critical" "Critical" "Not Critical" ...
## $ camis : chr "30075445" "30075445" "30075445" "30075445" ...
## $ action : chr "Violations were cited in the following area(s)." "Violations were cited in the following area(s)." "Violations were cited in the following area(s)." "Violations were cited in the following area(s)." ...
## $ violation_code : chr "04L" "08A" "06C" "10F" ...
## $ violation_description: chr "Evidence of mice or live mice present in facility's food and/or non-food areas." "Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exi"| __truncated__ "Food not protected from potential source of contamination during storage, preparation, transportation, display or service." "Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly m"| __truncated__ ...
## $ grade_date : chr "2016-02-18T00:00:00" "2016-02-18T00:00:00" "2015-02-09T00:00:00" "2014-03-03T00:00:00" ...
## $ inspection_type : chr "Cycle Inspection / Initial Inspection" "Cycle Inspection / Initial Inspection" "Cycle Inspection / Initial Inspection" "Cycle Inspection / Initial Inspection" ...
sapply(data, function(x) sum(is.na(x))) # count the number of NAs
## cuisine_description dba record_date
## 0 0 0
## boro inspection_date building
## 0 0 0
## zipcode score phone
## 0 66 0
## street grade critical_flag
## 0 470 0
## camis action violation_code
## 0 0 10
## violation_description grade_date inspection_type
## 10 470 0
# Create dataset with needed variables
sudData <- select(data, camis, cuisine_description, dba, boro, zipcode, record_date, inspection_date, score, grade, critical_flag, action, violation_code,
violation_description, inspection_type)
glimpse(sudData)
## Observations: 1,000
## Variables: 14
## $ camis <chr> "30075445", "30075445", "30075445", "300...
## $ cuisine_description <chr> "Bakery", "Bakery", "Bakery", "Bakery", ...
## $ dba <chr> "MORRIS PARK BAKE SHOP", "MORRIS PARK BA...
## $ boro <chr> "BRONX", "BRONX", "BRONX", "BRONX", "BRO...
## $ zipcode <chr> "10462", "10462", "10462", "10462", "104...
## $ record_date <chr> "2016-12-14T06:01:41", "2016-12-14T06:01...
## $ inspection_date <chr> "2016-02-18T00:00:00", "2016-02-18T00:00...
## $ score <chr> "10", "10", "6", "2", NA, "6", "6", "32"...
## $ grade <chr> "A", "A", "A", "A", NA, "A", "A", NA, NA...
## $ critical_flag <chr> "Critical", "Not Critical", "Critical", ...
## $ action <chr> "Violations were cited in the following ...
## $ violation_code <chr> "04L", "08A", "06C", "10F", NA, "04L", "...
## $ violation_description <chr> "Evidence of mice or live mice present i...
## $ inspection_type <chr> "Cycle Inspection / Initial Inspection",...
noNa <- na.omit(sudData)
sapply(noNa, function(x) sum(is.na(x))) # count the number of NAs
## camis cuisine_description dba
## 0 0 0
## boro zipcode record_date
## 0 0 0
## inspection_date score grade
## 0 0 0
## critical_flag action violation_code
## 0 0 0
## violation_description inspection_type
## 0 0
head(noNa)
## camis cuisine_description dba boro zipcode
## 1 30075445 Bakery MORRIS PARK BAKE SHOP BRONX 10462
## 2 30075445 Bakery MORRIS PARK BAKE SHOP BRONX 10462
## 3 30075445 Bakery MORRIS PARK BAKE SHOP BRONX 10462
## 4 30075445 Bakery MORRIS PARK BAKE SHOP BRONX 10462
## 6 30075445 Bakery MORRIS PARK BAKE SHOP BRONX 10462
## 7 30075445 Bakery MORRIS PARK BAKE SHOP BRONX 10462
## record_date inspection_date score grade critical_flag
## 1 2016-12-14T06:01:41 2016-02-18T00:00:00 10 A Critical
## 2 2016-12-14T06:01:41 2016-02-18T00:00:00 10 A Not Critical
## 3 2016-12-14T06:01:41 2015-02-09T00:00:00 6 A Critical
## 4 2016-12-14T06:01:41 2014-03-03T00:00:00 2 A Not Critical
## 6 2016-12-14T06:01:41 2013-09-11T00:00:00 6 A Critical
## 7 2016-12-14T06:01:41 2013-09-11T00:00:00 6 A Critical
## action violation_code
## 1 Violations were cited in the following area(s). 04L
## 2 Violations were cited in the following area(s). 08A
## 3 Violations were cited in the following area(s). 06C
## 4 Violations were cited in the following area(s). 10F
## 6 Violations were cited in the following area(s). 04L
## 7 Violations were cited in the following area(s). 04N
## violation_description
## 1 Evidence of mice or live mice present in facility's food and/or non-food areas.
## 2 Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exist.
## 3 Food not protected from potential source of contamination during storage, preparation, transportation, display or service.
## 4 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
## 6 Evidence of mice or live mice present in facility's food and/or non-food areas.
## 7 Filth flies or food/refuse/sewage-associated (FRSA) flies present in facility\032s food and/or non-food areas. Filth flies include house flies, little house flies, blow flies, bottle flies and flesh flies. Food/refuse/sewage-associated flies include fruit flies, drain flies and Phorid flies.
## inspection_type
## 1 Cycle Inspection / Initial Inspection
## 2 Cycle Inspection / Initial Inspection
## 3 Cycle Inspection / Initial Inspection
## 4 Cycle Inspection / Initial Inspection
## 6 Cycle Inspection / Re-inspection
## 7 Cycle Inspection / Re-inspection
tail(noNa)
## camis cuisine_description dba boro zipcode
## 989 40364449 German GOTTSCHEER HALL QUEENS 11385
## 990 40364449 German GOTTSCHEER HALL QUEENS 11385
## 991 40364449 German GOTTSCHEER HALL QUEENS 11385
## 992 40364449 German GOTTSCHEER HALL QUEENS 11385
## 995 40364449 German GOTTSCHEER HALL QUEENS 11385
## 996 40364449 German GOTTSCHEER HALL QUEENS 11385
## record_date inspection_date score grade critical_flag
## 989 2016-12-14T06:01:41 2015-10-22T00:00:00 26 B Critical
## 990 2016-12-14T06:01:41 2015-10-22T00:00:00 26 B Critical
## 991 2016-12-14T06:01:41 2015-10-22T00:00:00 26 B Critical
## 992 2016-12-14T06:01:41 2015-10-22T00:00:00 26 B Not Critical
## 995 2016-12-14T06:01:41 2015-04-23T00:00:00 8 A Critical
## 996 2016-12-14T06:01:41 2015-04-23T00:00:00 8 A Not Critical
## action violation_code
## 989 Violations were cited in the following area(s). 04A
## 990 Violations were cited in the following area(s). 04L
## 991 Violations were cited in the following area(s). 06C
## 992 Violations were cited in the following area(s). 08A
## 995 Violations were cited in the following area(s). 06D
## 996 Violations were cited in the following area(s). 10F
## violation_description
## 989 Food Protection Certificate not held by supervisor of food operations.
## 990 Evidence of mice or live mice present in facility's food and/or non-food areas.
## 991 Food not protected from potential source of contamination during storage, preparation, transportation, display or service.
## 992 Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exist.
## 995 Food contact surface not properly washed, rinsed and sanitized after each use and following any activity when contamination may have occurred.
## 996 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
## inspection_type
## 989 Cycle Inspection / Re-inspection
## 990 Cycle Inspection / Re-inspection
## 991 Cycle Inspection / Re-inspection
## 992 Cycle Inspection / Re-inspection
## 995 Cycle Inspection / Re-inspection
## 996 Cycle Inspection / Re-inspection
noNa <- noNa[noNa$grade != 'Z',] # remove misc grade coding
noNa <- noNa[noNa$inspection_type != 'Cycle Inspection / Reopening Inspection',]
table(noNa$inspection_type)
##
## Cycle Inspection / Initial Inspection
## 267
## Cycle Inspection / Re-inspection
## 241
table(noNa$grade)
##
## A B C
## 438 58 12
noNa$grade2 <- as.integer(factor(noNa$grade, levels=c('C','B','A'))) # convert to numeric
table(noNa$grade2)
##
## 1 2 3
## 12 58 438
# Frequency
counts <- table(noNa$grade)
barplot(counts, col="red", main = "Restaurant Grade Frequency",
xlab = "Grade", ylab = "Frequency")
type <- table (noNa$cuisine_description)
barplot(type, col="red", main = "Cusine Description Frequency",
xlab = "Cusine Description", ylab = "Frequency")
summary(noNa)
## camis cuisine_description dba
## Length:508 Length:508 Length:508
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## boro zipcode record_date
## Length:508 Length:508 Length:508
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## inspection_date score grade
## Length:508 Length:508 Length:508
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## critical_flag action violation_code
## Length:508 Length:508 Length:508
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## violation_description inspection_type grade2
## Length:508 Length:508 Min. :1.000
## Class :character Class :character 1st Qu.:3.000
## Mode :character Mode :character Median :3.000
## Mean :2.839
## 3rd Qu.:3.000
## Max. :3.000
# Analyze violation description
noNa$violation_description <- tolower(noNa$violation_description)
noNa$violation_description <- tm::removeNumbers(noNa$violation_description)
noNa$violation_description <- str_replace_all(noNa$violation_description, " ", "")
# replace double spaces with single space
noNa$violation_description <- str_replace_all(noNa$violation_description, pattern = "[[:punct:]]", " ")
noNa$violation_description <- tm::removeWords(x = noNa$violation_description, stopwords(kind = "SMART"))
crp <- Corpus(VectorSource(noNa$violation_description)) # convert into corpus
freq_terms(text.var = noNa$violation_description, top = 10) # find the 10 most frequent words
## WORD FREQ
## 1 food 598
## 2 contact 295
## 3 surface 292
## 4 properly 291
## 5 improperly 270
## 6 maintained 181
## 7 equipment 176
## 8 vermin 162
## 9 flies 144
## 10 constructed 128
pal = brewer.pal(9,"Blues")
wordcloud(words = noNa$violation_description, scale=c(5,0.5), max.words=100, random.order=FALSE, colors = brewer.pal(8, "Dark2")) # violation description
wordcloud(words = noNa$cuisine_description, max.words=100, random.order=FALSE, colors = brewer.pal(8, "Dark2")) # cuisine description
Since the data are not random and the response variable is categorical with more than 2 levels the t- test is used to analyze the data.
\[H_0 :p =0.50\] Inspection type does not have any affect on the grade a restaurant receives \[H_a :p >0.50\] Inspection type does have any affect on the grade a restaurant receives
boxplot(noNa$grade2 ~ noNa$inspection_type,
col=(c("gold","red")),
main = "Letter Grade by Inspection Type",
xlab = "Inspection Type", ylab = "Letter Grade")
t.test(noNa$grade2 ~ noNa$inspection_type) # grade vs inspection type
##
## Welch Two Sample t-test
##
## data: noNa$grade2 by noNa$inspection_type
## t = 9.2595, df = 240, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.2678631 0.4126348
## sample estimates:
## mean in group Cycle Inspection / Initial Inspection
## 3.000000
## mean in group Cycle Inspection / Re-inspection
## 2.659751
Based on the results of the t-test, it can be said at the 95% percent confidence interval, there is a significant difference between the inspection type and the letter grade received - the p-value is much smaller than .05 (p = 2.2e-16). The null hypothesis is rejected. The maximum differences in the mean is between .28 and .42.
This is a fair assessment, if we look at the box-plot above, it shows all establishments receive a letter grade of ‘A’ after the initial inspection. Establishments that receive an ‘A’ during the initial inspection will be placed on a 12 month inspection schedule.
Due to the size of the p-value there are concerns about the accuracy. A p-value < 2.2e-16 is smallest possible value that can be stored in the system.