Overview
Department of Health and Mental Hygiene (DOHMH) conducts unannounced inspections of restaurants at least once a year. Inspectors check that restaurants comply with food safety rules. Now, more than ever, these inspection results are important to evaluate the food safety measures taken by the restaurants. COVID-19 caused many of the NYC restaurants to close from March 2020. Recently, the restaurants started to open again but are inspections by DOHMH more frequent now? DO inspections happen more in the areas with high COVID cases ? Is there a relation between inspection rating and COVID cases?
Data Preparation
Step 1 is to install and load required libraries to extract data from sources
Step 2 is to extract data from different sources
Raw data from looks like this:
Health inspection data contains 26 columns and 400,120 rows. Each row is a citation for a restaurant recorded after inspection. Now, let’s load the latest COVID spread data as well.
The Covid spread data contains 10 columns and 177 rows. Each row contains metrics corresponding to covid spread at Zip code level.
For the analysis we would require dataset at Zip code level for 2020 that includes both inspection data and covid data.
nyrestaurant_analysis <- filter(nyrestaurant,substr(inspection_date,1,4) == "2020", zipcode != '') %>%select(inspection_date,dba,zipcode,boro, cuisine_description,score)
#Select zip code level NYC restaurant data with average inspection grading score
nyrestaurant_zip <- nyrestaurant_analysis %>% group_by(zipcode) %>% summarise(scores = mean(score,na.rm = TRUE))
nyrestaurant_boro <- nyrestaurant_analysis %>% group_by(boro) %>% summarise(scores = mean(score,na.rm = TRUE))
#Select zip code level covid case rate
covid_zip <- covid_raw %>%select(MODIFIED_ZCTA,COVID_CASE_RATE,PERCENT_POSITIVE)
covid_zip <- rename(covid_raw,zipcode = MODIFIED_ZCTA)
covid_boro <- covid_raw %>%select(BOROUGH_GROUP,COVID_CASE_RATE,PERCENT_POSITIVE) %>% group_by(BOROUGH_GROUP) %>% summarise(COVID_CASE_RATE = mean(COVID_CASE_RATE,na.rm = TRUE),PERCENT_POSITIVE = mean(PERCENT_POSITIVE,na.rm = TRUE))
covid_boro <- rename(covid_boro,boro = BOROUGH_GROUP)
#Remove the cases where COVID data is not available
master_data_zip <- merge(x = nyrestaurant_zip , y = covid_zip, by = c("zipcode"))
master_data_boro <- merge(x = nyrestaurant_boro , y = covid_boro, by = c("boro"))
Research question
Is COVID case rate (Rate of confirmed cases per 100,000 people by Zip code of residence) predictive of NYC restaurant’s health inspection grading score ?
Cases
Each case represents a NYC restaurant in the united states. There 175 observations in the given data set.
Data collection
DOHMH New York City Restaurant Inspection Results Health The data is collected by Department of Health and Mental Hygiene (DOHMH) by creating a repository of every sustained or not yet adjudicated violation citation from every full or special program inspection conducted up to three years prior to the most recent inspection for restaurants and college cafeterias in an active status
NYC Coronavirus Disease 2019 (COVID-19) Data Health Department has collected about people who have tested positive for COVID-19 in NYC
Type of Study
This is an observational study.
Response
The response variable is restaurant’s grading score (at Zip level) and is numerical.
Explanatory
The explanatory variable is COVID case rate and is numerical
Relevant summary statistics
At Borough code level, COVID cases are high even though the inspection scores seems to show better grade (lower score mean better grade from Health Department). This shows that the Health department’s restaurant inspection score might be influenced by covid case rate. Essentially in a way that restaurants are often inspected and ensured that top grade is maintained among the restaurants in high covid cases areas

At Zip code level, there seems to be no relation between COVID case rate and Health Department inspection scores.

At Zip code level, there seems to be no relation between COVID positive cases rate and Health Department inspection scores.

---
title: 'Data 606: Data Project'
author: "Bharani Nittala"
date: "`r Sys.Date()`"
always_allow_html: true
output:
  openintro::lab_report: default
  pdf_document: default
  html_document:
    includes:
      in_header: header.html
    css: ./lab.css
    highlight: pygments
    theme: cerulean
    toc: yes
    toc_float: yes
  word_document:
    toc: yes
editor_options:
  chunk_output_type: console
---

### Overview

Department of Health and Mental Hygiene (DOHMH) conducts unannounced inspections of restaurants at least once a year. Inspectors check that restaurants comply with food safety rules. Now, more than ever, these inspection results are important to evaluate the food safety measures taken by the restaurants. COVID-19 caused many of the NYC restaurants to close from March 2020. Recently, the restaurants started to open again but are inspections by DOHMH more frequent now? DO inspections happen more in the areas with high COVID cases ? Is there a relation between inspection rating and COVID cases?  


### Data Preparation

Step 1 is to install and load required libraries to extract data from sources

```{r libraries, message=FALSE, warning=FALSE}
knitr::opts_chunk$set(eval = TRUE, results = FALSE)
library(tidyverse)
library(RSocrata)
library(RCurl)
library(rmarkdown)
```


Step 2 is to extract data from different sources

Raw data from looks like this:
```{r , message=FALSE,results="asis"}
url <-"https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j"
nyrestaurant <-read.socrata(url)
paged_table(nyrestaurant, options = list(rows.print = 5))
```

Health inspection data contains 26 columns and 400,120 rows. Each row is a citation for a restaurant recorded after inspection. Now, let's load the latest COVID spread data as well.  

```{r , message=FALSE,echo= FALSE,results="asis"}
covid_url <- getURL("https://raw.githubusercontent.com/nychealth/coronavirus-data/master/data-by-modzcta.csv") 
covid_raw <- read.csv(text = covid_url)
paged_table(covid_raw, options = list(rows.print = 5))
```


The Covid spread data contains 10 columns and 177 rows. Each row contains metrics corresponding to covid spread at Zip code level.  

For the analysis we would require dataset at Zip code level for 2020 that includes both inspection data and covid data.

```{r , message=FALSE,results="asis"}
nyrestaurant_analysis <-  filter(nyrestaurant,substr(inspection_date,1,4) == "2020", zipcode != '') %>%select(inspection_date,dba,zipcode,boro, cuisine_description,score)

#Select zip code level NYC restaurant data with average inspection grading score 
nyrestaurant_zip <- nyrestaurant_analysis %>% group_by(zipcode) %>% summarise(scores = mean(score,na.rm = TRUE))
nyrestaurant_boro <- nyrestaurant_analysis %>% group_by(boro) %>% summarise(scores = mean(score,na.rm = TRUE))
  
#Select zip code level covid case rate
covid_zip <- covid_raw %>%select(MODIFIED_ZCTA,COVID_CASE_RATE,PERCENT_POSITIVE)
covid_zip <- rename(covid_raw,zipcode = MODIFIED_ZCTA)
covid_boro <- covid_raw %>%select(BOROUGH_GROUP,COVID_CASE_RATE,PERCENT_POSITIVE) %>% group_by(BOROUGH_GROUP) %>% summarise(COVID_CASE_RATE = mean(COVID_CASE_RATE,na.rm = TRUE),PERCENT_POSITIVE = mean(PERCENT_POSITIVE,na.rm = TRUE))
covid_boro <- rename(covid_boro,boro = BOROUGH_GROUP)
  
#Remove the cases where COVID data is not available
master_data_zip <- merge(x = nyrestaurant_zip , y = covid_zip, by = c("zipcode"))
master_data_boro <- merge(x = nyrestaurant_boro , y = covid_boro, by = c("boro"))
  
```


### Research question
Is COVID **case rate** (Rate of confirmed cases per 100,000 people by Zip code of residence) predictive of NYC restaurant's health inspection [grading score](https://a816-health.nyc.gov/ABCEatsRestaurants/#/faq) ?
 

### Cases
Each case represents a NYC restaurant in the united states. There *175* observations in the given data set.


### Data collection
*DOHMH New York City Restaurant Inspection Results Health*
The data is collected by 	Department of Health and Mental Hygiene (DOHMH) by creating a repository of every sustained or not yet adjudicated violation citation from every full or special program inspection conducted up to three years prior to the most recent inspection for restaurants and college cafeterias in an active status

*NYC Coronavirus Disease 2019 (COVID-19) Data*
Health Department has collected about people who have tested positive for COVID-19 in NYC

### Type of Study
This is an observational study.

### Data Source
*DOHMH New York City Restaurant Inspection Results Health*
This dataset and the information on the Health Department’s Restaurant Grading website come from the same data source. The Health Department’s Restaurant Grading website is here:
http://www1.nyc.gov/site/doh/services/restaurant-grades.page

Data is extracted using the R package RSocrata

*NYC Coronavirus Disease 2019 (COVID-19) Data*
Daily count of NYC residents who tested positive for SARS-CoV-2, who were hospitalized with COVID-19, and deaths among COVID-19 patients is collected from https://www1.nyc.gov/site/doh/covid/covid-19-data.page and specifically from their github repository https://github.com/nychealth/coronavirus-data


### Response
The response variable is restaurant's grading score (at Zip level) and is numerical.


### Explanatory
The explanatory variable is COVID case rate and is numerical


### Relevant summary statistics


At Borough code level, COVID cases are high even though the inspection scores seems to show better grade (lower score mean better grade from Health Department). This shows that the Health department's  restaurant inspection score might be influenced by covid case rate. Essentially in a way that restaurants are often inspected and ensured that top grade is maintained among the restaurants in high covid cases areas

```{r , message=FALSE,results="asis"}

ggplot(master_data_boro, aes(x = COVID_CASE_RATE , y= scores)) + geom_point() + geom_smooth(method=lm)+
   ggtitle("NYC Restaurants inspection score vs COVID case rate") + 
     theme(plot.title = element_text(lineheight=.8, face="bold"))


```

At Zip code level, there seems to be no relation between COVID case rate and Health Department inspection scores. 

```{r , message=FALSE,results="asis"}

ggplot(master_data_zip, aes(x = COVID_CASE_RATE , y= scores)) + geom_point() + geom_smooth(method=lm)+
   ggtitle("NYC Restaurants inspection score vs COVID case rate") + 
     theme(plot.title = element_text(lineheight=.8, face="bold"))


```

\n

At Zip code level, there seems to be no relation between COVID positive cases rate and Health Department inspection scores. 
```{r , message=FALSE,results="asis"}

ggplot(master_data_zip, aes(x = PERCENT_POSITIVE , y= scores)) + geom_point() + geom_smooth(method=lm)+
   ggtitle("NYC Restaurants inspection score vs COVID positive cases rate") + 
     theme(plot.title = element_text(lineheight=.8, face="bold"))


```

<div class="tocify-extend-page" data-unique="tocify-extend-page" style="height: 0;"></div>