Post by: Benson Yik Seong Toi “Dataset: The Ramen rater,”THE BIG LIST,” 2021
Link: https://www.kaggle.com/residentmario/ramen-ratings
This Dataset is recorded for a ramen product review. Up to date, this data is provided by 2500 reviewers and keeps updating any new ramen in the market.
We can use this data set to analyze the favorite favor, best brand, ramen style, and more.
The Kaggle page has an informative table to sort or filter to find our needs.”
library(tidyverse)
library(readr)
library(curl)
library(ggplot2)
library(ggmap)
library(dplyr)
library(stringr)
#ramen_ratings <- read.csv("ramen-ratings.csv")
ramen_ratings <- read.csv(curl("https://raw.githubusercontent.com/brsingh7/DATA607/main/Week6/Project2C/ramen-ratings.csv"))
ramen_ratings <- ramen_ratings %>%
separate(Top.Ten,into=c("Year","Top_10_Rank"),sep=" \\#")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 2543 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
ramen_ratings$Top_10_Rank <- as.integer(ramen_ratings$Top_10_Rank)
#Identify then fill missing values in "Style". After identifying the varieties with an unknown style, conduct research to determine its style and populate within the df. Based on further research, E Men Chicken kamfen and 100 Furong Shrimp are both packs.
ramen_ratings %>%
filter(Style == "")
## Review.. Brand Variety Style Country Stars Year Top_10_Rank
## 1 428 Kamfen E Menm Chicken China 3.75 NA
## 2 138 Unif 100 Furong Shrimp Taiwan 3 NA
ramen_ratings$Style <- ifelse(ramen_ratings$Variety=="E Menm Chicken"|ramen_ratings$Variety=="100 Furong Shrimp","Pack",ramen_ratings$Style)
#rename columns
colnames(ramen_ratings) = c("Review_ID","Brand","Variety","Style","Country_Name","Rating","Year_In_Top_10","Top_10_Rank")
#Convert columns
ramen_ratings <- ramen_ratings %>%
mutate_at(c("Rating","Year_In_Top_10"),as.numeric)
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
#Favorite variety
ramen_ratings %>%
group_by(Variety) %>%
summarise(avg_rating = round(mean(Rating),2),
count=n(),
in_top10 = ifelse(Top_10_Rank!="","Yes","No")) %>%
arrange(desc(avg_rating,count))
## # A tibble: 2,580 × 4
## # Groups: Variety [2,413]
## Variety avg_rating count in_top10
## <chr> <dbl> <int> <chr>
## 1 2 Minute Noodles Masala Spicy 5 1 <NA>
## 2 Aloe Noodle Red Onion & Sesame Sauce 5 1 <NA>
## 3 Aloe Noodle Vegetable Sauce 5 1 <NA>
## 4 Aloe Noodle With Basil Sauce 5 1 <NA>
## 5 Aloe Thin Noodles With Camelia Oil Vegetable Sauce… 5 1 <NA>
## 6 Aloe Vera Guan Mian Cyanobacteria Noodle With Ging… 5 1 <NA>
## 7 Aloe Vera Guan Mian Original Noodle With Sesame Sa… 5 1 <NA>
## 8 Always Mi Goreng Perisa Kari Kapitan 5 1 <NA>
## 9 Arrabiata Rice Bucatini 5 1 <NA>
## 10 Atta Noodles Jhatpat Banao Befikr Khao 5 1 <NA>
## # … with 2,570 more rows
ramen_ratings %>%
group_by(Variety) %>%
summarise(avg_rating = round(mean(Rating),2),
count=n(),
in_top10 = ifelse(Top_10_Rank!="","Yes","No")) %>%
filter(avg_rating==5)
## # A tibble: 368 × 4
## # Groups: Variety [357]
## Variety avg_rating count in_top10
## <chr> <dbl> <int> <chr>
## 1 2 Minute Noodles Masala Spicy 5 1 <NA>
## 2 Aloe Noodle Red Onion & Sesame Sauce 5 1 <NA>
## 3 Aloe Noodle Vegetable Sauce 5 1 <NA>
## 4 Aloe Noodle With Basil Sauce 5 1 <NA>
## 5 Aloe Thin Noodles With Camelia Oil Vegetable Sauce… 5 1 <NA>
## 6 Aloe Vera Guan Mian Cyanobacteria Noodle With Ging… 5 1 <NA>
## 7 Aloe Vera Guan Mian Original Noodle With Sesame Sa… 5 1 <NA>
## 8 Always Mi Goreng Perisa Kari Kapitan 5 1 <NA>
## 9 Arrabiata Rice Bucatini 5 1 <NA>
## 10 Atta Noodles Jhatpat Banao Befikr Khao 5 1 <NA>
## # … with 358 more rows
ramen_ratings %>%
group_by(Variety) %>%
summarise(avg_rating = round(mean(Rating),2),
count=n(),
in_top10 = ifelse(Top_10_Rank!="","Yes","No")) %>%
filter(in_top10=="Yes") %>%
arrange(avg_rating)
## # A tibble: 37 × 4
## # Groups: Variety [37]
## Variety avg_rating count in_top10
## <chr> <dbl> <int> <chr>
## 1 Artificial Chicken 3.42 6 Yes
## 2 Ippeichan Yakisoba 4 1 Yes
## 3 Hyoubanya No Chukasoba Oriental 4.25 1 Yes
## 4 Kari Spesial 4.5 1 Yes
## 5 Kokomen Spicy Chicken 4.75 3 Yes
## 6 Shin Ramyun Black 4.88 2 Yes
## 7 Cheese Noodle 5 1 Yes
## 8 Chef Curry Laksa Flavour 5 2 Yes
## 9 Chef Gold Recipe Mi Kari Seribu Rasa 5 1 Yes
## 10 Chow Mein 5 1 Yes
## # … with 27 more rows
ratings_hist <- hist(ramen_ratings$Rating, main="Ramen Ratings distribution", xlab="Rating",ylab="Count",ylim=c(0,800))
text(ratings_hist$mids,ratings_hist$counts,adj=c(0.5,-0.5))
#Favorite Brand
ramen_ratings %>%
group_by(Brand) %>%
summarise(avg_rating = round(mean(Rating),2),
count=n()) %>%
arrange(desc(avg_rating))
## # A tibble: 355 × 3
## Brand avg_rating count
## <chr> <dbl> <int>
## 1 ChoripDong 5 1
## 2 Daddy 5 1
## 3 Daifuku 5 1
## 4 Foodmon 5 2
## 5 Higashi 5 1
## 6 Jackpot Teriyaki 5 1
## 7 Kiki Noodle 5 2
## 8 Kimura 5 1
## 9 Komforte Chockolates 5 1
## 10 MyOri 5 5
## # … with 345 more rows
#Favorite Style
ramen_ratings %>%
group_by(Style) %>%
ggplot(aes(x=Style,y=Rating)) +
geom_bar(position = "dodge",
stat = "summary",
fun = "mean") +
ggtitle("Average Rating by Ramen Style") + ylab("Avg Rating")
## Warning: Removed 3 rows containing non-finite values (stat_summary).
#Ratings by Country
ramen_ratings %>%
group_by(Country_Name) %>%
summarise(avg_rating = round(mean(Rating),2),
count=n()) %>%
arrange(desc(avg_rating))
## # A tibble: 38 × 3
## Country_Name avg_rating count
## <chr> <dbl> <int>
## 1 Brazil 4.35 5
## 2 Sarawak 4.33 3
## 3 Cambodia 4.2 5
## 4 Singapore 4.13 109
## 5 Indonesia 4.07 126
## 6 Japan 3.98 352
## 7 Myanmar 3.95 14
## 8 Fiji 3.88 4
## 9 Hong Kong 3.8 137
## 10 United States 3.75 1
## # … with 28 more rows
country_rtg <- ramen_ratings %>%
group_by(Country_Name) %>%
summarise(avg_rating = mean(Rating))
ramen_ratings <- left_join(ramen_ratings,country_rtg,by=c("Country_Name"="Country_Name"))
library("rnaturalearth")
library("rnaturalearthdata")
library("sf")
mapdata <- map_data("world")
View(mapdata)
mapdata <- left_join(mapdata,country_rtg,by=c("region"="Country_Name"))
map1 <- ggplot(mapdata,aes(x=long,y=lat,group=group))+
geom_polygon(aes(fill=avg_rating))
map2 <- map1 + scale_fill_gradient(name="Average Ramen Rating", low = "yellow", high = "red", na.value = "grey50") +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
rect = element_blank())
map2
Based on the results, it’s difficult to determine the most popular variety, as there are 368 out of 2,580 varieties with a score of 5 (highest possible score). Additionally, given the distribution of ratings, there is a left skewness based on the histogram, showing a largely positive rating towards ramen varieties. The same applies towards Brand, as 24 have an average rating of 5. However, MyKuali had 24 different varieties, and falls right below the subset of brands scoring a 5, with an average rating of 4.95. Given the amount of variety in conjunction with the average score, one could argue that this is the most popular brand. Based on style, the Ramen Bars appear to be the most favorite. Interestingly enough, given the average rating, the higher the rating does not necessarily equate to being placed in the top 10. It would be interesting to look into the criteria used to get a variety into the rankings and conduct further analysis.
Which countries produce the best ramen? Given the data at hand, Brazil has the best ramen, with an average rating of 4.35, followed by Sarawak, however produce a minimal variety of Ramen compared to other countries.