Brian_Singh_DATA607

1. Import libraries and data
2. Tidy Data
3. Data Analysis
4. Conclusions
5. References

Post by: Benson Yik Seong Toi “Dataset: The Ramen rater,”THE BIG LIST,” 2021

Link: https://www.kaggle.com/residentmario/ramen-ratings

This Dataset is recorded for a ramen product review. Up to date, this data is provided by 2500 reviewers and keeps updating any new ramen in the market.

We can use this data set to analyze the favorite favor, best brand, ramen style, and more.

The Kaggle page has an informative table to sort or filter to find our needs.”

1. Import libraries and data

library(tidyverse)
library(readr)
library(curl)
library(ggplot2)
library(ggmap)
library(dplyr)
library(stringr)
#ramen_ratings <- read.csv("ramen-ratings.csv")
ramen_ratings <- read.csv(curl("https://raw.githubusercontent.com/brsingh7/DATA607/main/Week6/Project2C/ramen-ratings.csv"))

2. Tidy Data

ramen_ratings <- ramen_ratings %>%
  separate(Top.Ten,into=c("Year","Top_10_Rank"),sep=" \\#")

## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 2543 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

ramen_ratings$Top_10_Rank <- as.integer(ramen_ratings$Top_10_Rank)

#Identify then fill missing values in "Style". After identifying the varieties with an unknown style, conduct research to determine its style and populate within the df. Based on further research, E Men Chicken kamfen and 100 Furong Shrimp are both packs.
ramen_ratings %>%
    filter(Style == "")

##   Review..  Brand           Variety Style Country Stars Year Top_10_Rank
## 1      428 Kamfen    E Menm Chicken         China  3.75               NA
## 2      138   Unif 100 Furong Shrimp        Taiwan     3               NA

ramen_ratings$Style <- ifelse(ramen_ratings$Variety=="E Menm Chicken"|ramen_ratings$Variety=="100 Furong Shrimp","Pack",ramen_ratings$Style)

#rename columns
colnames(ramen_ratings) = c("Review_ID","Brand","Variety","Style","Country_Name","Rating","Year_In_Top_10","Top_10_Rank")

#Convert columns
ramen_ratings <- ramen_ratings %>%
    mutate_at(c("Rating","Year_In_Top_10"),as.numeric)

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

3. Data Analysis

#Favorite variety
ramen_ratings %>%
    group_by(Variety) %>%
    summarise(avg_rating = round(mean(Rating),2),
              count=n(),
              in_top10 = ifelse(Top_10_Rank!="","Yes","No")) %>%
    arrange(desc(avg_rating,count))

## # A tibble: 2,580 × 4
## # Groups:   Variety [2,413]
##    Variety                                             avg_rating count in_top10
##    <chr>                                                    <dbl> <int> <chr>   
##  1 2 Minute Noodles Masala Spicy                                5     1 <NA>    
##  2 Aloe Noodle Red Onion & Sesame Sauce                         5     1 <NA>    
##  3 Aloe Noodle Vegetable Sauce                                  5     1 <NA>    
##  4 Aloe Noodle With Basil Sauce                                 5     1 <NA>    
##  5 Aloe Thin Noodles With Camelia Oil Vegetable Sauce…          5     1 <NA>    
##  6 Aloe Vera Guan Mian Cyanobacteria Noodle With Ging…          5     1 <NA>    
##  7 Aloe Vera Guan Mian Original Noodle With Sesame Sa…          5     1 <NA>    
##  8 Always Mi Goreng Perisa Kari Kapitan                         5     1 <NA>    
##  9 Arrabiata Rice Bucatini                                      5     1 <NA>    
## 10 Atta Noodles Jhatpat Banao Befikr Khao                       5     1 <NA>    
## # … with 2,570 more rows

ramen_ratings %>%
    group_by(Variety) %>%
    summarise(avg_rating = round(mean(Rating),2),
              count=n(),
              in_top10 = ifelse(Top_10_Rank!="","Yes","No")) %>%
    filter(avg_rating==5)

## # A tibble: 368 × 4
## # Groups:   Variety [357]
##    Variety                                             avg_rating count in_top10
##    <chr>                                                    <dbl> <int> <chr>   
##  1 2 Minute Noodles Masala Spicy                                5     1 <NA>    
##  2 Aloe Noodle Red Onion & Sesame Sauce                         5     1 <NA>    
##  3 Aloe Noodle Vegetable Sauce                                  5     1 <NA>    
##  4 Aloe Noodle With Basil Sauce                                 5     1 <NA>    
##  5 Aloe Thin Noodles With Camelia Oil Vegetable Sauce…          5     1 <NA>    
##  6 Aloe Vera Guan Mian Cyanobacteria Noodle With Ging…          5     1 <NA>    
##  7 Aloe Vera Guan Mian Original Noodle With Sesame Sa…          5     1 <NA>    
##  8 Always Mi Goreng Perisa Kari Kapitan                         5     1 <NA>    
##  9 Arrabiata Rice Bucatini                                      5     1 <NA>    
## 10 Atta Noodles Jhatpat Banao Befikr Khao                       5     1 <NA>    
## # … with 358 more rows

ramen_ratings %>%
    group_by(Variety) %>%
    summarise(avg_rating = round(mean(Rating),2),
              count=n(),
              in_top10 = ifelse(Top_10_Rank!="","Yes","No")) %>%
    filter(in_top10=="Yes") %>%
    arrange(avg_rating)

## # A tibble: 37 × 4
## # Groups:   Variety [37]
##    Variety                              avg_rating count in_top10
##    <chr>                                     <dbl> <int> <chr>   
##  1 Artificial Chicken                         3.42     6 Yes     
##  2 Ippeichan Yakisoba                         4        1 Yes     
##  3 Hyoubanya No Chukasoba Oriental            4.25     1 Yes     
##  4 Kari Spesial                               4.5      1 Yes     
##  5 Kokomen Spicy Chicken                      4.75     3 Yes     
##  6 Shin Ramyun Black                          4.88     2 Yes     
##  7 Cheese Noodle                              5        1 Yes     
##  8 Chef Curry Laksa Flavour                   5        2 Yes     
##  9 Chef Gold Recipe Mi Kari Seribu Rasa       5        1 Yes     
## 10 Chow Mein                                  5        1 Yes     
## # … with 27 more rows

ratings_hist <- hist(ramen_ratings$Rating, main="Ramen Ratings distribution", xlab="Rating",ylab="Count",ylim=c(0,800))
text(ratings_hist$mids,ratings_hist$counts,adj=c(0.5,-0.5))

#Favorite Brand
ramen_ratings %>%
    group_by(Brand) %>%
    summarise(avg_rating = round(mean(Rating),2),
              count=n()) %>%
    arrange(desc(avg_rating))

## # A tibble: 355 × 3
##    Brand                avg_rating count
##    <chr>                     <dbl> <int>
##  1 ChoripDong                    5     1
##  2 Daddy                         5     1
##  3 Daifuku                       5     1
##  4 Foodmon                       5     2
##  5 Higashi                       5     1
##  6 Jackpot Teriyaki              5     1
##  7 Kiki Noodle                   5     2
##  8 Kimura                        5     1
##  9 Komforte Chockolates          5     1
## 10 MyOri                         5     5
## # … with 345 more rows

#Favorite Style
ramen_ratings %>%
    group_by(Style) %>%
    ggplot(aes(x=Style,y=Rating)) +
    geom_bar(position = "dodge",
        stat = "summary",
        fun = "mean") +
    ggtitle("Average Rating by Ramen Style") + ylab("Avg Rating")

## Warning: Removed 3 rows containing non-finite values (stat_summary).

#Ratings by Country
ramen_ratings %>%
    group_by(Country_Name) %>%
    summarise(avg_rating = round(mean(Rating),2), 
              count=n()) %>%
    arrange(desc(avg_rating))

## # A tibble: 38 × 3
##    Country_Name  avg_rating count
##    <chr>              <dbl> <int>
##  1 Brazil              4.35     5
##  2 Sarawak             4.33     3
##  3 Cambodia            4.2      5
##  4 Singapore           4.13   109
##  5 Indonesia           4.07   126
##  6 Japan               3.98   352
##  7 Myanmar             3.95    14
##  8 Fiji                3.88     4
##  9 Hong Kong           3.8    137
## 10 United States       3.75     1
## # … with 28 more rows

country_rtg <- ramen_ratings %>%
    group_by(Country_Name) %>%
    summarise(avg_rating = mean(Rating))

ramen_ratings <- left_join(ramen_ratings,country_rtg,by=c("Country_Name"="Country_Name"))

library("rnaturalearth")
library("rnaturalearthdata")
library("sf")

mapdata <- map_data("world")
View(mapdata)

mapdata <- left_join(mapdata,country_rtg,by=c("region"="Country_Name"))

map1 <- ggplot(mapdata,aes(x=long,y=lat,group=group))+
    geom_polygon(aes(fill=avg_rating))

map2 <- map1 + scale_fill_gradient(name="Average Ramen Rating", low = "yellow", high = "red", na.value = "grey50") +
    theme(axis.text.x = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks = element_blank(),
          axis.title.y = element_blank(),
          axis.title.x = element_blank(),
          rect = element_blank())
map2

4. Conclusions

Based on the results, it’s difficult to determine the most popular variety, as there are 368 out of 2,580 varieties with a score of 5 (highest possible score). Additionally, given the distribution of ratings, there is a left skewness based on the histogram, showing a largely positive rating towards ramen varieties. The same applies towards Brand, as 24 have an average rating of 5. However, MyKuali had 24 different varieties, and falls right below the subset of brands scoring a 5, with an average rating of 4.95. Given the amount of variety in conjunction with the average score, one could argue that this is the most popular brand. Based on style, the Ramen Bars appear to be the most favorite. Interestingly enough, given the average rating, the higher the rating does not necessarily equate to being placed in the top 10. It would be interesting to look into the criteria used to get a variety into the rankings and conduct further analysis.

Which countries produce the best ramen? Given the data at hand, Brazil has the best ramen, with an average rating of 4.35, followed by Sarawak, however produce a minimal variety of Ramen compared to other countries.

5. References

The following video provided guidance on the map:

https://www.google.com/search?q=world+maps+with+dataset+in+r&sxsrf=APq-WBuqmiJyuVF_DnQ1x3_8PCMbPdnexA%3A1647118366701&ei=HggtYp63KvmlptQPzfCo2Ao&ved=0ahUKEwievpmJusH2AhX5kokEHU04CqsQ4dUDCA4&uact=5&oq=world+maps+with+dataset+in+r&gs_lcp=Cgdnd3Mtd2l6EAMyCAghEBYQHRAeMggIIRAWEB0QHjIICCEQFhAdEB4yCAghEBYQHRAeOgQIIxAnOgUIABCRAjoKCAAQsQMQgwEQQzoLCAAQgAQQsQMQgwE6BAgAEEM6CwguEIAEELEDEIMBOg4ILhCABBCxAxDHARCjAjoHCC4Q1AIQQzoQCC4QsQMQgwEQsQMQsQMQCjoHCAAQsQMQQzoLCAAQgAQQsQMQyQM6CAgAEIAEELEDOg0IABCABBCHAhCxAxAUOg0IABCABBCHAhDJAxAUOgUIABCABDoKCAAQgAQQhwIQFDoECAAQCjoICAAQFhAKEB46BQghEKABOgYIABAWEB5KBAhBGABKBAhGGABQAFj4F2DYGGgAcAF4AYABgwGIAeoSkgEEMjUuM5gBAKABAcABAQ&sclient=gws-wiz#kpvalbx=_4wgtYuPwMO-gptQPn6SL4Ao20

Brian_Singh_DATA607_Project2C

Brian Singh

2022-03-10

1. Import libraries and data

2. Tidy Data

3. Data Analysis

4. Conclusions

5. References