Exploration of Instant Ramen Data (Jan 2020)

Calvin ‘Cetus’ Christopher

Wednesday, April 7th 2021


Brief Explanation of the Data

The Ramen Rater is a product review website for the hardcore ramen enthusiast (or “ramenphile”), with over 2500 reviews to date. This dataset is an export of “The Big List” (of reviews), converted to a CSV format. The data itself was downloaded from kaggle.com

Content

Each row has 7 columns:

ID: The unique ID of each Instant Noodle URL: The URL to the review for each Instant Noodle Brand: The brand of the Instant Noodles manufacturer Variety: The specific name of the Instant Noodles Style: Packaging of the Instant Noodles Country: Country of origin Stars: The rating of Instant Noodles, as provided by theramenrater.

What

The Aim of this data exploration is to explore:

  • What is the Top 10 Ramen Brand by Variety
  • What country produces the highest variety of Instant Noodle products.
  • What are the distribution of Instant Noodles packing Styles?
  • What do customers think about Instant Noodles?
  • What is the correlation between IR Style with their Reviews on each Countries?

Who

The data exploration was intended for Instant Ramen maker company, and/or its counterparts.

Why

It is relevant because it can help Instant Ramen Companies to make decisions regarding their new products.

When

The data was downloaded from Kaggle.com and was last updated Jan 2020. In my opinion, its use is still relevant at the time of making this Data Exploration

How

  1. For Top 10 Brand we are going to use bar plot since we want to see top 10 ranked Instant Ramen Brand

  2. To show The Frequency of the data based on Country we are going to use Tree Map. We are going to also show the distribution of Instant Ramen Packaging Style.

  3. To show customers opinion around the world, we are going to make an example of the top 5 country + Indonesia. Again we are going to use Bar plot to show the ranking of those countries. We are also going to used a stacked bar plot to show the distribution of ratings in those six countries.

  4. The correlation between packaging style and their ratings is crucial to decide on the future products’ packaging in each countries. In this case we are going to use bar plot.

Where

  • Tab1 will show the answer to number 1-2 since they are basic overview of the data itself.

  • Tab2 will show the data regarding the top 5 country and will answer question number 3

  • Tab3 will show the data based on the country and stars rating the user inputed. It will answer question number 4. We are also showing the top 10 brand of each country, which will also react to the user input.

  • Tab4 will consists of the dataset we use, the Ramen Data as of January 2020

#Install Necessary Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(glue)
## 
## Attaching package: 'glue'
## The following object is masked from 'package:dplyr':
## 
##     collapse
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use

Read Data

ramen <- read.csv("Ramen_ratings_2020.csv")
str(ramen)
## 'data.frame':    3473 obs. of  7 variables:
##  $ ID     : chr  "3473" "3472" "3471" "3470" ...
##  $ URL    : chr  "https://www.theramenrater.com/2020/04/05/3473-mykuali-white-fish-broth-noodle-malaysia/" "https://www.theramenrater.com/2020/04/05/3472-mykuali-penang-white-curry-noodle-new-recipe-malaysia/" "https://www.theramenrater.com/2020/04/05/3471-ve-wong-instant-oriental-noodles-soup-chinese-herb-ginseng-flavor-taiwan/" "https://www.theramenrater.com/2020/04/04/3470-myojo-ippei-chan-rich-sweet-thick-yakisoba-japan/" ...
##  $ Brand  : chr  "MyKuali" "MyKuali" "Ve Wong" "Myojo" ...
##  $ Variety: chr  "White Fish Broth Noodle" "Penang White Curry Noodle (New Recipe)" "Instant Oriental Noodles Soup Chinese Herb - Ginseng" "Ippeichan Rich & Sweet Yakisoba" ...
##  $ Style  : chr  "Pack" "Pack" "Pack" "Tray" ...
##  $ Country: chr  "Malaysia" "Malaysia" "Taiwan" "Japan" ...
##  $ Stars  : chr  "5" "5" "3.75" "5" ...

Data Cleansing

ramen$Brand <- as.factor(ramen$Brand)
ramen$Style <- as.factor(ramen$Style)
ramen$Country <- as.factor(ramen$Country)
ramen$Stars <- as.numeric(ramen$Stars)
## Warning: NAs introduced by coercion
ramen <- subset(ramen, select = -c(URL))
colSums(is.na(ramen))
##      ID   Brand Variety   Style Country   Stars 
##       0       0       0       0       0      15
ramen <- na.omit(ramen)

Data Exploration & Manipulation

1. Top 10 Ramen Brand by Variety

top_brand <- ramen %>%
  count(Brand) %>% 
  arrange(desc(n))
top_brand_plot <- ggplot(top_brand[1:10,]) +
 aes(y = reorder(Brand, n), x = n, fill = Brand, text = glue("Frequency: {n}")) +
 geom_col() +
  labs(title = "Top 10 Ramen Brand", x = NULL, y = NULL) +
 theme_minimal() +
  theme(legend.position = "none")
ggplotly(top_brand_plot, tooltip = "text") %>% 
  config(displayModeBar = F)

2. Top 10 Ramen Producer by Variety

top_country <- ramen %>%     
  group_by(Country) %>%  
  count(Country) %>% 
  arrange(desc(n))
top_country_plot <- ggplot(top_country[1:10,]) +
 aes(y = reorder(Country, n), x = n, fill = Country, text = glue("Frequency: {n}")) +
 geom_col() +
  labs(title = "Top 10 Ramen Producer by Country", x = NULL, y = NULL) +
 theme_minimal() +
  theme(legend.position = "none")
ggplotly(top_country_plot, tooltip = "text") %>% 
  config(displayModeBar = F)

3. What’s the country distribution of the Data?

distplot1 <- ramen %>% 
  count(Country) %>% 
  arrange(desc(n))

distplot1 %>% 
  hchart(
    "treemap", hcaes(x= Country, value = n, color = n)
    ) %>% 
  hc_title(text = "Country Distribution",
    margin = 20,
    align = "left",
    style = list(useHTML = TRUE)) %>% 
  hc_colorAxis(stops = color_stops(colors = viridis::inferno(10)))

4. What’s the packaging style distribution of the Data?

distplot2 <- ramen %>% 
  count(Style) %>% 
  arrange(desc(n))

distplot2 %>% 
  hchart(
    "pie", hcaes(x = Style, y = n),
    name = "Packaging Distribution"
  ) %>% 
  hc_title(text = "Packaging Style Distribution",
    margin = 20,
    align = "left",
    style = list(useHTML = TRUE))
#distplot2 %>% 
#  hchart(
#    "treemap", hcaes(x= Style, value = n, color = n)
#    ) %>% 
#  hc_title(text = "Packaging Style Distribution",
#    margin = 20,
#    align = "left",
#    style = list(useHTML = TRUE)) %>% 
#  hc_colorAxis(stops = color_stops(colors = viridis::plasma(10)))

5. What do customers think about Instant Noodles?

To answer this question we will look at the average rating of the top 5 ramen-making countries (Japan, United States, Korea, Taiwan & China) with “Indonesia” as a bonus.

avg_us <- (mean(ramen$Stars[ramen$Country == "United States"]))
avg_japan <- (mean(ramen$Stars[ramen$Country == "Japan"]))
avg_korea <- (mean(ramen$Stars[ramen$Country == "South Korea"]))
avg_taiwan <- (mean(ramen$Stars[ramen$Country == "Taiwan"]))
avg_china <- (mean(ramen$Stars[ramen$Country == "China"]))
avg_indonesia <- (mean(ramen$Stars[ramen$Country == "Indonesia"]))

top5.mean <- cbind(Country = c("United States", "Japan", "South Korea", "Taiwan", "China", "Indonesia"))
top5.mean <- cbind(top5.mean,as.data.frame(c(avg_us, avg_japan, avg_korea, avg_taiwan, avg_china, avg_indonesia)))

names(top5.mean)[2] <- paste("Mean")

top5_plot <- ggplot(top5.mean) +
 aes(x = reorder(Country, Mean), fill = Country, weight = Mean, 
 text = glue("Rating {round(Mean,2)}")) +
 geom_bar() +
 scale_fill_viridis_d(option = "inferno") +
 labs(y = "Ratings", x = "Country", title = "Average Ratings on Top 5 Countries
      +Indonesia") +
 coord_flip() +
 theme_minimal() +
 theme(legend.position = "none")
ggplotly(top5_plot, tooltip = "text") %>% 
  config(displayModeBar = F)

5.1 What is the distribution of Ratings on Top 5 Countries (+Indonesia)

ramen$Stars.Range <-c("0-1", "1-2", "2-3", "3-4", "4-5")[findInterval(as.numeric(as.character(ramen$Stars)) , c(0, 1, 2, 3, 4, Inf) )]
top5_range <- ramen %>% 
  filter(Country %in% c("Japan", "United States", "South Korea", "Taiwan", "China", "Indonesia")) %>% 
  count(Stars.Range, Country) 

top5_range_plot <- ggplot(top5_range) +
 aes(x = Country, fill = Stars.Range, weight = n, 
 text = glue("Quantity {n}
              Stars Range {Stars.Range}")) +
 geom_bar() +
 scale_fill_hue() +
 labs(x = "Countries", y = NULL, title = "Distribution of Ratings on Top 5 Countries", subtitle = "+
 Indonesia") +
 theme_minimal() +
  theme(legend.position = "none")
ggplotly(top5_range_plot, tooltip = "text") %>% 
  config(displayModeBar = F)

While Indonesia has the highest average rating of IR Products, from the distribution we can see that Indonesia has significant lower variety if we compare them to The top 4 Countries (Japan, United States, South Korea and Taiwan). In my opinion, the market in both China and Indonesia is not as mature as the other four countries in this plot.

By looking at the Top 4 Countries, we can see that Japan has the highest rating and variety of IR product. We can also conclude that Japan is the current heaven of IR market as customers think most highly of them in Japan.

6. Correlation between IR Style with their Reviews on each Countries?

cor <- ramen %>% 
  filter(Country == "Japan",
         Stars >= 0.0)

cor %>% 
  count(Style) %>% 
  arrange(desc(n))
##   Style   n
## 1  Pack 232
## 2  Bowl 204
## 3   Cup 122
## 4  Tray  51
## 5   Box  17
cor_plot <- ggplot(cor) +
 aes(x = Style, y = Stars, fill = Style) +
 geom_boxplot() +
 scale_fill_viridis_d(option = "plasma") +
 labs(x = "Instant Ramen Style", y = "Stars Given (Reviews)", title = "Ratings (Stars) correlation with Instant Ramen Style") +
 theme_minimal() +
  theme(legend.position = "none")
ggplotly(cor_plot) %>% 
  config(displayModeBar = F)