Proposal

Through this project, I intend to analyze the restaurants available in NYC to identify how many of them receive good grades and what are the most common violations. I also want to check if there is any co-relation between the quality of these restaurants and their location/neighborhood real estate valuations.

Datasets

To identify this, I intend to use below two datasets, 1. Gives me the restaurant data and their violations https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j 2. Gives the neighborhood property sales https://data.cityofnewyork.us/City-Government/DOF-Summary-of-Neighborhood-Sales-by-Neighborhood-/5ebm-myj7

Relevance

In the current circumstances where there is a public health emergency, restaurants play a keyrole in improving the public health by maintaining proper code and adhering to the standards set by the City. By reviewing if these violations are in any way related to the neighborhood factors i.e., does being in a good neighborhood having high property valulations guarantees access to a better maintained restaurant.

propertySales <- read.csv("data/Neighborhood_sales.csv",sep = ';')
str(propertySales)
## 'data.frame':    5381 obs. of  9 variables:
##  $ ï..BOROUGH        : Factor w/ 5 levels "BRONX","BROOKLYN",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ NEIGHBORHOOD      : Factor w/ 250 levels "","AIRPORT LA GUARDIA",..: 3 3 3 43 43 43 47 73 73 73 ...
##  $ TYPE.OF.HOME      : Factor w/ 6 levels "01  ONE FAMILY HOMES",..: 2 4 6 2 4 6 2 2 4 6 ...
##  $ NUMBER.OF.SALES   : int  1 1 1 2 2 1 1 2 2 1 ...
##  $ LOWEST.SALE.PRICE : Factor w/ 1456 levels "1,000,000","1,020,000",..: 1216 31 1423 1148 30 1228 746 713 1096 720 ...
##  $ AVERAGE.SALE.PRICE: Factor w/ 5069 levels "1,000,000","1,000,600",..: 3506 344 4855 819 777 3560 1035 2810 802 1001 ...
##  $ MEDIAN.SALE.PRICE : Factor w/ 2231 levels "1,000,000","1,002,000",..: 1570 139 2131 404 376 1597 576 1342 389 548 ...
##  $ HIGHEST.SALE.PRICE: Factor w/ 1870 levels "1,000,000","1,004,493",..: 1179 158 1741 1017 718 1206 716 1592 1008 667 ...
##  $ YEAR              : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
restaurantRatings <- read.csv("data/Restaurant_ratings.csv",sep = ';')
str(restaurantRatings)
## 'data.frame':    402061 obs. of  26 variables:
##  $ ï..CAMIS             : int  50038175 50001911 50064420 50044438 50087867 50084744 41717445 50074075 50053935 41020286 ...
##  $ DBA                  : Factor w/ 21629 levels "","'CESCA","'ESSEN",..: 11196 10455 1408 18401 5150 12949 8631 10402 2879 19363 ...
##  $ BORO                 : Factor w/ 6 levels "0","Bronx","Brooklyn",..: 4 3 6 4 5 4 3 6 5 4 ...
##  $ BUILDING             : Factor w/ 7309 levels "",".1-A","0",..: 5133 2265 2488 3984 5498 6412 6516 5222 6020 4130 ...
##  $ STREET               : Factor w/ 3240 levels "- JFK AIRPORT",..: 1129 604 1805 1190 2608 748 578 2336 442 3000 ...
##  $ ZIPCODE              : Factor w/ 228 levels "","10000","10001",..: 10 145 90 5 197 5 154 86 197 36 ...
##  $ PHONE                : Factor w/ 25564 levels "","__________",..: 6601 13003 22615 2662 14599 59 18280 25371 21829 820 ...
##  $ CUISINE.DESCRIPTION  : Factor w/ 84 levels "Afghan","African",..: 27 22 7 47 20 27 49 17 7 3 ...
##  $ INSPECTION.DATE      : Factor w/ 1343 levels "01/01/1900","01/02/2018",..: 696 365 1036 1238 1115 1166 905 1170 505 1299 ...
##  $ ACTION               : Factor w/ 6 levels "","Establishment Closed by DOHMH.  Violations were cited in the following area(s) and those requiring immediate ac"| __truncated__,..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ VIOLATION.CODE       : Factor w/ 105 levels "","02A","02B",..: 3 43 61 32 51 51 30 31 30 32 ...
##  $ VIOLATION.DESCRIPTION: Factor w/ 90 levels "","\"\"No Smokingâ\200\235 and/or 'Smoking Permittedâ\200\235 sign not conspicuously posted. Health warning not pr"| __truncated__,..: 45 34 6 24 22 22 20 51 20 24 ...
##  $ CRITICAL.FLAG        : Factor w/ 3 levels "","N","Y": 3 3 2 3 2 2 3 3 3 3 ...
##  $ SCORE                : int  11 10 30 16 18 40 8 17 17 58 ...
##  $ GRADE                : Factor w/ 8 levels "","A","B","C",..: 2 2 4 1 1 1 2 3 1 1 ...
##  $ GRADE.DATE           : Factor w/ 1278 levels "","01/02/2018",..: 665 350 992 1 1 1 868 1118 1 1 ...
##  $ RECORD.DATE          : Factor w/ 1 level "03/22/2020": 1 1 1 1 1 1 1 1 1 1 ...
##  $ INSPECTION.TYPE      : Factor w/ 32 levels "","Administrative Miscellaneous / Compliance Inspection",..: 12 11 12 11 11 11 12 12 11 11 ...
##  $ Latitude             : num  40.7 40.6 40.6 40.7 40.7 ...
##  $ Longitude            : num  -74 -74 -74.1 -74 -73.9 ...
##  $ Community.Board      : int  106 311 502 103 405 102 315 501 405 104 ...
##  $ Council.District     : int  4 43 50 2 34 2 47 49 30 3 ...
##  $ Census.Tract         : int  4400 28200 11202 3800 55700 6100 39400 24700 63700 12700 ...
##  $ BIN                  : int  1082859 3168791 5053471 1083485 4086738 1009110 3192156 5025023 4092500 1025043 ...
##  $ BBL                  : num  1.01e+09 3.06e+09 5.04e+09 1.00e+09 4.04e+09 ...
##  $ NTA                  : Factor w/ 194 levels "","BK09","BK17",..: 117 9 192 104 132 105 7 187 131 99 ...

Technologies:

I plan to use R packages such as ggplot, plotly and Shiny (if time permits) to provide a interactive UI which will allow the user to find interesting attributes and their relations.