The data we will be working with in Assignment 7 is the NYC restaurant data, which can be found here. The data set provides information on NYC restaurants, including restaurant inspections, violations, grades, and adjudication data. We will be using this data to apply what we learned from the tutorials for this week, which relates to writing functions.
library(prettydoc) # document themes for R Markdown
library(RSocrata) # import data
library(tibble) # create tibbles
library(dplyr) # manipulate data
library(magrittr) # piping
library(ggplot2) # data visualization
NYCdf <- read.socrata("https://data.cityofnewyork.us/resource/9w7m-hzhe.json")# Create tibble of variable names and class information
NYCClass <- tibble(names(NYCdf), as.character(Map(class, NYCdf)))
# View column classes
NYCClass## # A tibble: 18 × 2
## `names(NYCdf)` `as.character(Map(class, NYCdf))`
## <chr> <chr>
## 1 action character
## 2 boro character
## 3 building character
## 4 camis character
## 5 critical_flag character
## 6 cuisine_description character
## 7 dba character
## 8 grade character
## 9 grade_date c("POSIXct", "POSIXt")
## 10 inspection_date c("POSIXct", "POSIXt")
## 11 inspection_type character
## 12 phone character
## 13 record_date c("POSIXct", "POSIXt")
## 14 score character
## 15 street character
## 16 violation_code character
## 17 violation_description character
## 18 zipcode character
Create a function that takes a single argument (“x”) and checks if it is of POSIXlt class. If it is, have the function change the input to a simple Date class with as.Date. If not then, the function should keep the input class as is. Apply this function to each of the columns in the NY restaurant data set by using the map function.
# Create function - if class is POSIXct, change to date, else leave as is
NewNYC <- as_tibble(
Map(function(x) if (class(x) == "POSIXct") {as.Date(x)}
else {as.character(x)}, NYCdf))
# show the new column classes in tibble
NewClass <- tibble(names(NewNYC), as.character(Map(class, NewNYC)))
NewClass## # A tibble: 18 × 2
## `names(NewNYC)` `as.character(Map(class, NewNYC))`
## <chr> <chr>
## 1 action character
## 2 boro character
## 3 building character
## 4 camis character
## 5 critical_flag character
## 6 cuisine_description character
## 7 dba character
## 8 grade character
## 9 grade_date Date
## 10 inspection_date Date
## 11 inspection_type character
## 12 phone character
## 13 record_date Date
## 14 score character
## 15 street character
## 16 violation_code character
## 17 violation_description character
## 18 zipcode character
Identify how many restaurants in 2016 had a violation regarding mice, hair, or sewage.
## Create a new column containing mice, hair, or sewage violations
NewNYC$vio_type <- ""
NewNYC$vio_type[grep("mice", NewNYC$violation_description)] <- "mice"
NewNYC$vio_type[grep("hair", NewNYC$violation_description)] <- "hair"
NewNYC$vio_type[grep("sewage", NewNYC$violation_description)] <- "sewage"
## Create a new column for just the year of the inspection
NewNYC$year <- as.numeric(format(NewNYC$inspection_date, "%Y"))
## Filter data set for inspection year in 2016 and violation where the
## description mentioned mice, hair, or sewage, and count the number of
## violations of each type
VioCount <- NewNYC %>% filter(year == 2016, vio_type != '') %>%
group_by(vio_type) %>% count(vio_type)
VioCount## # A tibble: 3 × 2
## vio_type n
## <chr> <int>
## 1 hair 2132
## 2 mice 8283
## 3 sewage 13646
Create a function to apply to this tibble that takes a year and a regular expression (i.e. “mice”) and returns a ggplot bar chart of the top 20 restaurants with the most violations. Make sure the restaurants are properly rank-ordered in the bar chart.
As an example, the function below is executed with year = 2016 and violation = ‘mice’.
viochart <- function(selectyear, selectvio) {
Restaurants <- NewNYC %>% filter(year == selectyear, vio_type == selectvio) %>%
group_by(dba) %>%
count(vio_type, sort = TRUE) %>%
head(20)
ggplot(Restaurants,
aes(x = reorder(dba,-n), y = n, fill = 'red')) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "Top 20 Restaurants with Most Violations",
x = "Restaurant", y = "Number of Violations") +
theme(legend.position="none", panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"))
}
viochart(2016, 'mice')