Week 7 Assignment - NYC Restaurants

Angie Chen

December 2, 2016


Introduction

The data we will be working with in Assignment 7 is the NYC restaurant data, which can be found here. The data set provides information on NYC restaurants, including restaurant inspections, violations, grades, and adjudication data. We will be using this data to apply what we learned from the tutorials for this week, which relates to writing functions.

Load Packages and Import Data

library(prettydoc) # document themes for R Markdown

library(RSocrata) # import data

library(tibble) # create tibbles

library(dplyr) # manipulate data

library(magrittr) # piping 

library(ggplot2) # data visualization

NYCdf <- read.socrata("https://data.cityofnewyork.us/resource/9w7m-hzhe.json")

Identify the Class of each Variable

# Create tibble of variable names and class information

NYCClass <- tibble(names(NYCdf), as.character(Map(class, NYCdf)))

# View column classes

NYCClass
## # A tibble: 18 × 2
##           `names(NYCdf)` `as.character(Map(class, NYCdf))`
##                    <chr>                             <chr>
## 1                 action                         character
## 2                   boro                         character
## 3               building                         character
## 4                  camis                         character
## 5          critical_flag                         character
## 6    cuisine_description                         character
## 7                    dba                         character
## 8                  grade                         character
## 9             grade_date            c("POSIXct", "POSIXt")
## 10       inspection_date            c("POSIXct", "POSIXt")
## 11       inspection_type                         character
## 12                 phone                         character
## 13           record_date            c("POSIXct", "POSIXt")
## 14                 score                         character
## 15                street                         character
## 16        violation_code                         character
## 17 violation_description                         character
## 18               zipcode                         character

Create Function to check for POSIXct Class

Create a function that takes a single argument (“x”) and checks if it is of POSIXlt class. If it is, have the function change the input to a simple Date class with as.Date. If not then, the function should keep the input class as is. Apply this function to each of the columns in the NY restaurant data set by using the map function.

# Create function - if class is POSIXct, change to date, else leave as is

NewNYC <- as_tibble(
  Map(function(x) if (class(x) == "POSIXct") {as.Date(x)}
             else {as.character(x)}, NYCdf))

# show the new column classes in tibble
NewClass <- tibble(names(NewNYC), as.character(Map(class, NewNYC)))

NewClass
## # A tibble: 18 × 2
##          `names(NewNYC)` `as.character(Map(class, NewNYC))`
##                    <chr>                              <chr>
## 1                 action                          character
## 2                   boro                          character
## 3               building                          character
## 4                  camis                          character
## 5          critical_flag                          character
## 6    cuisine_description                          character
## 7                    dba                          character
## 8                  grade                          character
## 9             grade_date                               Date
## 10       inspection_date                               Date
## 11       inspection_type                          character
## 12                 phone                          character
## 13           record_date                               Date
## 14                 score                          character
## 15                street                          character
## 16        violation_code                          character
## 17 violation_description                          character
## 18               zipcode                          character

Violation Types

Identify how many restaurants in 2016 had a violation regarding mice, hair, or sewage.

## Create a new column containing mice, hair, or sewage violations

NewNYC$vio_type <- ""
NewNYC$vio_type[grep("mice", NewNYC$violation_description)] <- "mice"
NewNYC$vio_type[grep("hair", NewNYC$violation_description)] <- "hair"
NewNYC$vio_type[grep("sewage", NewNYC$violation_description)] <- "sewage"

## Create a new column for just the year of the inspection

NewNYC$year <- as.numeric(format(NewNYC$inspection_date, "%Y"))

## Filter data set for inspection year in 2016 and violation where the
## description mentioned mice, hair, or sewage, and count the number of
## violations of each type

VioCount <- NewNYC %>% filter(year == 2016, vio_type != '') %>%
  group_by(vio_type) %>% count(vio_type)

VioCount
## # A tibble: 3 × 2
##   vio_type     n
##      <chr> <int>
## 1     hair  2132
## 2     mice  8283
## 3   sewage 13646

Function - Plot Restaurants with Most Violations by Violation Type

Create a function to apply to this tibble that takes a year and a regular expression (i.e. “mice”) and returns a ggplot bar chart of the top 20 restaurants with the most violations. Make sure the restaurants are properly rank-ordered in the bar chart.

As an example, the function below is executed with year = 2016 and violation = ‘mice’.

viochart <- function(selectyear, selectvio) {
  Restaurants <- NewNYC %>% filter(year == selectyear, vio_type == selectvio) %>% 
    group_by(dba) %>% 
    count(vio_type, sort = TRUE) %>% 
    head(20)
  
  ggplot(Restaurants, 
         aes(x = reorder(dba,-n), y = n, fill = 'red')) +
    geom_bar(stat="identity") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(title = "Top 20 Restaurants with Most Violations", 
         x = "Restaurant", y = "Number of Violations") +
    theme(legend.position="none", panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          panel.background = element_blank(), 
          axis.line = element_line(colour = "black"))
}

viochart(2016, 'mice')