Synopsis

This is my homework report for week 7, produced with R Markdown.

In this homework I’d be working on NYC Restaurant Data and focus on writing efficient code by practice :

  • Writing functions

  • Using iteration

Packages Required

For this homework assignment, I used the following packages:

library(RSocrata)       # for SOPA API for NYC Restaurant data
library(readr)        # for saving the file in RDS format
library(purrr)
library(tibble)
library(stringr)
library(lubridate)
library(ggplot2)        # for creating graphs
library(dplyr)          # for performing data transformation and manipulation tasks.
library(knitr)          # for kniting r code to html files
library(magrittr)       # for chaining commands with pipe operator, %>%.

Importing the Data

nyc_restaurant <- read_rds('nyc_restaurant')

Problem 1

Use the map function to identify the class of each variable.

nyc_restaurant %>% map(class)
## $CAMIS
## [1] "integer"
## 
## $DBA
## [1] "character"
## 
## $BORO
## [1] "character"
## 
## $BUILDING
## [1] "character"
## 
## $STREET
## [1] "character"
## 
## $ZIPCODE
## [1] "integer"
## 
## $PHONE
## [1] "character"
## 
## $CUISINE.DESCRIPTION
## [1] "character"
## 
## $INSPECTION.DATE
## [1] "POSIXct" "POSIXt" 
## 
## $ACTION
## [1] "character"
## 
## $VIOLATION.CODE
## [1] "character"
## 
## $VIOLATION.DESCRIPTION
## [1] "character"
## 
## $CRITICAL.FLAG
## [1] "character"
## 
## $SCORE
## [1] "integer"
## 
## $GRADE
## [1] "character"
## 
## $GRADE.DATE
## [1] "POSIXct" "POSIXt" 
## 
## $RECORD.DATE
## [1] "POSIXct" "POSIXt" 
## 
## $INSPECTION.TYPE
## [1] "character"

Problem 2

Notice how the date variables are in POSIXlt form. Create a function that takes a single argument (“x”) and checks if it is of POSIXlt class. If it is, have the function change the input to a simple Date class with as.Date. If not then, the function should keep the input class as is. Apply this function to each of the columns in the NY restaurant data set by using the map function. Be sure the final output is a tibble and not a list.

checkPosixlt <- function(x){
  ifelse(any(class(x) == "POSIXlt"), T, F)
}

convertToDate <- function(x){
  if(!checkPosixlt(x)){
    return(x)
  }
  return(as.Date(x))
}

nyc_restaurant <- nyc_restaurant %>%
                  map(convertToDate) %>% 
                  as_tibble()
nyc_restaurant
## # A tibble: 436,612 × 18
##       CAMIS                      DBA          BORO BUILDING
##       <int>                    <chr>         <chr>    <chr>
## 1  41606387     LA CUARTA RESTAURANT      BROOKLYN      782
## 2  50007091    NEW PEKING RESTAURANT      BROOKLYN     1581
## 3  50012185      CASTILLO RESTAURNAT      BROOKLYN      709
## 4  40750062        PANEANTICO BAKERY      BROOKLYN     9124
## 5  50034621             SHI LI XIANG        QUEENS    13358
## 6  50007874             vapor lounge         BRONX     3758
## 7  40552965 GROUND LEVEL  PUB & GRUB STATEN ISLAND      958
## 8  41650546               KING KABAB        QUEENS    16709
## 9  41524468                STARBUCKS     MANHATTAN     1491
## 10 41435999          YOUR HOUSE CAFE      BROOKLYN     6916
## # ... with 436,602 more rows, and 14 more variables: STREET <chr>,
## #   ZIPCODE <int>, PHONE <chr>, CUISINE.DESCRIPTION <chr>,
## #   INSPECTION.DATE <dttm>, ACTION <chr>, VIOLATION.CODE <chr>,
## #   VIOLATION.DESCRIPTION <chr>, CRITICAL.FLAG <chr>, SCORE <int>,
## #   GRADE <chr>, GRADE.DATE <dttm>, RECORD.DATE <dttm>,
## #   INSPECTION.TYPE <chr>

Problem 3

Using this reformatted tibble, identify how many restaurants in 2016 had a violation regarding “mice”? How about “hair”? What about “sewage”? Hint: the VIOLATION.DESCRIPTION and INSPECTION.DATE variables will be useful here.

search_patterns <- c("mice", "hair", "sewage")
search_years <- 2016

violations_nyc <- function(pattern, year){
  nyc_restaurant %>%
    filter(year(INSPECTION.DATE) == year) %>%
    summarize(total_issues = sum(str_detect(tolower(VIOLATION.DESCRIPTION),
                                            pattern)))
}

violation_count <- map2(search_patterns, search_years, violations_nyc)
violation_count
## [[1]]
## # A tibble: 1 × 1
##   total_issues
##          <int>
## 1         8283
## 
## [[2]]
## # A tibble: 1 × 1
##   total_issues
##          <int>
## 1         2132
## 
## [[3]]
## # A tibble: 1 × 1
##   total_issues
##          <int>
## 1        13671

Problem 4

Create a function to apply to this tibble that takes a year and a regular expression (i.e. “mice”) and returns a ggplot bar chart of the top 20 restaurants with the most violations. Make sure the restaurants are properly rank-ordered in the bar chart

top_viols_nyc <- function(search_pattern, search_year){
  nyc_restaurant %>%
    filter(year(INSPECTION.DATE) == search_year,
           str_detect(tolower(VIOLATION.DESCRIPTION),
                      search_pattern)) %>%
    count(DBA) %>%
    arrange(desc(n)) %>%
    top_n(20, n) %>% 
    ggplot() +
    geom_bar(mapping = aes(x = reorder(DBA, n),
                           y = n),
             stat = "identity") +
    coord_flip() +
    theme(text = element_text(size = 7)) +
    labs(x = "Restaurant", y = "Violations") +
    ggtitle(paste0("Top 20 Restaurants with '",
                   search_pattern,
                   "' violations for ",
                   search_year))
}

top_viols_nyc("mice", 2016)

top_viols_nyc("hair", 2016)

top_viols_nyc("sewage", 2015)