After reading in the data and changing all empty spaces to NA, remove the permits that are expired or suspended so that we only have active permitted trucks and carts. Then spread the food items into separate rows in order to anazlyze what foods are being served. I also excluded cold truck and hot truck from the foods items listed as they arenāt actual food types.
library(tidyverse)
foodtruck_data <- read_csv("Mobile_Food_Facility_Permit.csv", col_names = TRUE, na= c("", "NA"), trim_ws = TRUE)
foodtruck_long <-
foodtruck_data %>%
select(Applicant, FacilityType, Status, FoodItems) %>%
filter(Status != "EXPIRED") %>%
filter(Status != "SUSPEND") %>%
drop_na() %>%
separate_rows(FoodItems ,sep=":|;|\\.") %>%
mutate(FoodItems = tolower(trimws(FoodItems))) %>%
filter(FoodItems != "cold truck") %>%
filter(FoodItems != "hot truck") %>%
filter(FoodItems != "")foodtruck_long %>% filter(FacilityType == "Truck") %>% filter( grepl("taco", FoodItems) ) %>% count()## # A tibble: 1 x 1
## n
## <int>
## 1 80
foodtruck_long %>%
count(FoodItems) %>%
top_n(10, n) %>%
arrange(desc(n))## # A tibble: 10 x 2
## FoodItems n
## <chr> <int>
## 1 sandwiches 151
## 2 candy 148
## 3 snacks 117
## 4 burritos 109
## 5 hot dogs 106
## 6 chips 105
## 7 water 82
## 8 coffee 80
## 9 pre-packaged snacks 78
## 10 tacos 76
Begin with reading in the data and selecting the columns to look at cuisine and critical violations. Then I rename columns to remove spaces and remove cuisine types sych as not applicable and not listed and spread the dataset to show proportional criticality by cuisine type arranging by percentage.
NYC_Restaurant_Inspection_Results <- read_csv("DOHMH_New_York_City_Restaurant_Inspection_Results.csv")
NYC_Restaurant_Health <-
NYC_Restaurant_Inspection_Results %>%
select("DBA", "BORO", "CUISINE DESCRIPTION", "CRITICAL FLAG")
colnames(NYC_Restaurant_Health) <- c("DBA", "BORO", "CUISINE", "CRITICALITY")
NYC_Restaurant_Health_Clean <-
NYC_Restaurant_Health %>%
filter(CRITICALITY != "Not Applicable") %>%
filter(CUISINE != "Not Listed/Not Applicable") %>%
count(CUISINE, CRITICALITY) %>%
mutate(CRITICALITY = ifelse(CRITICALITY=="Critical","Critical","NotCritical")) %>%
spread(CRITICALITY, n) %>%
mutate(PERCENTCRITICAL = Critical / (Critical + `NotCritical`)) %>%
arrange(desc(PERCENTCRITICAL)) Select the top 10 cuisines with the highest critical percentages for violations.
NYC_Restaurant_Health_Clean %>% top_n(10, PERCENTCRITICAL)## # A tibble: 10 x 4
## CUISINE Critical NotCritical PERCENTCRITICAL
## <chr> <int> <int> <dbl>
## 1 Creole/Cajun 72 38 0.655
## 2 Bangladeshi 624 361 0.634
## 3 Californian 32 19 0.627
## 4 Creole 340 213 0.615
## 5 Vietnamese/Cambodian/Malaysia 992 642 0.607
## 6 Armenian 223 148 0.601
## 7 English 129 86 0.6
## 8 Chinese/Cuban 165 112 0.596
## 9 Filipino 397 271 0.594
## 10 Chinese/Japanese 506 346 0.594
Select the 10 cuisines which correspond to the lowest critical percentages for violations.
NYC_Restaurant_Health_Clean %>% top_n(-10, PERCENTCRITICAL)## # A tibble: 11 x 4
## CUISINE Critical NotCritical PERCENTCRITICAL
## <chr> <int> <int> <dbl>
## 1 Fruits/Vegetables 27 27 0.5
## 2 Soups 21 21 0.5
## 3 Soups & Sandwiches 263 266 0.497
## 4 Hamburgers 2286 2339 0.494
## 5 Afghan 95 101 0.485
## 6 Salads 387 417 0.481
## 7 Ice Cream, Gelato, Yogurt, Ices 1400 1512 0.481
## 8 Pancakes/Waffles 94 112 0.456
## 9 Chilean 7 10 0.412
## 10 Nuts/Confectionary 8 13 0.381
## 11 Basque 1 4 0.2
Reading in the dataset and plotting to see the viewership by episode and by season.
simpsons_data <- read_csv("simpsons_episodes.csv")
ggplot(data = simpsons_data) +
aes(x = number_in_series, y = us_viewers_in_millions, color = number_in_season) +
geom_line() +
scale_color_distiller(palette = "RdYlGn") Selecting the top 5 titles viewed and arranging in descending order by viewership.
simpsons_data %>% top_n(5, us_viewers_in_millions) %>%
select(title, us_viewers_in_millions, season, number_in_season) %>%
arrange(desc(us_viewers_in_millions))## # A tibble: 5 x 4
## title us_viewers_in_millions season number_in_season
## <chr> <dbl> <int> <int>
## 1 "Bart Gets an \"F\"" 33.6 2 1
## 2 Life on the Fast Lane 33.5 1 9
## 3 The Crepes of Wrath 31.2 1 11
## 4 Krusty Gets Busted 30.4 1 12
## 5 Homer's Night Out 30.3 1 10
Selecting the bottom 5 titles viewed and arranging by viewership.
simpsons_data %>% top_n(-5, us_viewers_in_millions) %>%
select(title, us_viewers_in_millions, season, number_in_season) %>%
arrange(desc(us_viewers_in_millions))## # A tibble: 5 x 4
## title us_viewers_in_milli⦠season number_in_season
## <chr> <dbl> <int> <int>
## 1 My Fare Lady 2.67 26 14
## 2 How Lisa Got Her Marge Back 2.55 27 18
## 3 Orange Is the New Yellow 2.54 27 22
## 4 To Courier with Love 2.52 27 20
## 5 The Burns Cage 2.32 27 17