This is my homework report for week 7, produced with R Markdown.
In this homework I’d be working on NYC Restaurant Data and focus on writing efficient code by practice :
Writing functions
Using iteration
For this homework assignment, I used the following packages:
library(RSocrata) # for SOPA API for NYC Restaurant data
library(readr) # for saving the file in RDS format
library(purrr)
library(tibble)
library(stringr)
library(lubridate)
library(ggplot2) # for creating graphs
library(dplyr) # for performing data transformation and manipulation tasks.
library(knitr) # for kniting r code to html files
library(magrittr) # for chaining commands with pipe operator, %>%.
nyc_restaurant <- read_rds('nyc_restaurant')
Use the map function to identify the class of each variable.
nyc_restaurant %>% map(class)
## $CAMIS
## [1] "integer"
##
## $DBA
## [1] "character"
##
## $BORO
## [1] "character"
##
## $BUILDING
## [1] "character"
##
## $STREET
## [1] "character"
##
## $ZIPCODE
## [1] "integer"
##
## $PHONE
## [1] "character"
##
## $CUISINE.DESCRIPTION
## [1] "character"
##
## $INSPECTION.DATE
## [1] "POSIXct" "POSIXt"
##
## $ACTION
## [1] "character"
##
## $VIOLATION.CODE
## [1] "character"
##
## $VIOLATION.DESCRIPTION
## [1] "character"
##
## $CRITICAL.FLAG
## [1] "character"
##
## $SCORE
## [1] "integer"
##
## $GRADE
## [1] "character"
##
## $GRADE.DATE
## [1] "POSIXct" "POSIXt"
##
## $RECORD.DATE
## [1] "POSIXct" "POSIXt"
##
## $INSPECTION.TYPE
## [1] "character"
Notice how the date variables are in POSIXlt form. Create a function that takes a single argument (“x”) and checks if it is of POSIXlt class. If it is, have the function change the input to a simple Date class with as.Date. If not then, the function should keep the input class as is. Apply this function to each of the columns in the NY restaurant data set by using the map function. Be sure the final output is a tibble and not a list.
checkPosixlt <- function(x){
ifelse(any(class(x) == "POSIXlt"), T, F)
}
convertToDate <- function(x){
if(!checkPosixlt(x)){
return(x)
}
return(as.Date(x))
}
nyc_restaurant <- nyc_restaurant %>%
map(convertToDate) %>%
as_tibble()
nyc_restaurant
## # A tibble: 436,612 × 18
## CAMIS DBA BORO BUILDING
## <int> <chr> <chr> <chr>
## 1 41606387 LA CUARTA RESTAURANT BROOKLYN 782
## 2 50007091 NEW PEKING RESTAURANT BROOKLYN 1581
## 3 50012185 CASTILLO RESTAURNAT BROOKLYN 709
## 4 40750062 PANEANTICO BAKERY BROOKLYN 9124
## 5 50034621 SHI LI XIANG QUEENS 13358
## 6 50007874 vapor lounge BRONX 3758
## 7 40552965 GROUND LEVEL PUB & GRUB STATEN ISLAND 958
## 8 41650546 KING KABAB QUEENS 16709
## 9 41524468 STARBUCKS MANHATTAN 1491
## 10 41435999 YOUR HOUSE CAFE BROOKLYN 6916
## # ... with 436,602 more rows, and 14 more variables: STREET <chr>,
## # ZIPCODE <int>, PHONE <chr>, CUISINE.DESCRIPTION <chr>,
## # INSPECTION.DATE <dttm>, ACTION <chr>, VIOLATION.CODE <chr>,
## # VIOLATION.DESCRIPTION <chr>, CRITICAL.FLAG <chr>, SCORE <int>,
## # GRADE <chr>, GRADE.DATE <dttm>, RECORD.DATE <dttm>,
## # INSPECTION.TYPE <chr>
Using this reformatted tibble, identify how many restaurants in 2016 had a violation regarding “mice”? How about “hair”? What about “sewage”? Hint: the VIOLATION.DESCRIPTION and INSPECTION.DATE variables will be useful here.
search_patterns <- c("mice", "hair", "sewage")
search_years <- 2016
violations_nyc <- function(pattern, year){
nyc_restaurant %>%
filter(year(INSPECTION.DATE) == year) %>%
summarize(total_issues = sum(str_detect(tolower(VIOLATION.DESCRIPTION),
pattern)))
}
violation_count <- map2(search_patterns, search_years, violations_nyc)
violation_count
## [[1]]
## # A tibble: 1 × 1
## total_issues
## <int>
## 1 8283
##
## [[2]]
## # A tibble: 1 × 1
## total_issues
## <int>
## 1 2132
##
## [[3]]
## # A tibble: 1 × 1
## total_issues
## <int>
## 1 13671
Create a function to apply to this tibble that takes a year and a regular expression (i.e. “mice”) and returns a ggplot bar chart of the top 20 restaurants with the most violations. Make sure the restaurants are properly rank-ordered in the bar chart
top_viols_nyc <- function(search_pattern, search_year){
nyc_restaurant %>%
filter(year(INSPECTION.DATE) == search_year,
str_detect(tolower(VIOLATION.DESCRIPTION),
search_pattern)) %>%
count(DBA) %>%
arrange(desc(n)) %>%
top_n(20, n) %>%
ggplot() +
geom_bar(mapping = aes(x = reorder(DBA, n),
y = n),
stat = "identity") +
coord_flip() +
theme(text = element_text(size = 7)) +
labs(x = "Restaurant", y = "Violations") +
ggtitle(paste0("Top 20 Restaurants with '",
search_pattern,
"' violations for ",
search_year))
}
top_viols_nyc("mice", 2016)
top_viols_nyc("hair", 2016)
top_viols_nyc("sewage", 2015)