Dataset: The Ramen rater, “THE BIG LIST,” 2021
Link: https://www.kaggle.com/residentmario/ramen-ratings
This Dataset is recorded for a ramen product review. Up to date, this data is provided by 2500 reviewers and keeps updating any new ramen in the market.
I will analyze the data to find the following: 1. Top Ranking Brand 2. Highest Rated Brand 3. Top ranking brand by Country
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readr)
library(curl)
## Warning: package 'curl' was built under R version 4.1.3
## Using libcurl 7.64.1 with Schannel
##
## Attaching package: 'curl'
## The following object is masked from 'package:readr':
##
## parse_date
##install.packages("curl")
library(ggplot2)
##install.packages("ggmap")
library(dplyr)
library(stringr)
library("magrittr")
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
df<-read.csv("https://raw.githubusercontent.com/deepasharma06/Data-607/main/ramen-ratings%20Dataset%20by%20Benson.csv")
head(df)
## Review.. Brand
## 1 2580 New Touch
## 2 2579 Just Way
## 3 2578 Nissin
## 4 2577 Wei Lih
## 5 2576 Ching's Secret
## 6 2575 Samyang Foods
## Variety Style Country
## 1 T's Restaurant Tantanmen Cup Japan
## 2 Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles Pack Taiwan
## 3 Cup Noodles Chicken Vegetable Cup USA
## 4 GGE Ramen Snack Tomato Flavor Pack Taiwan
## 5 Singapore Curry Pack India
## 6 Kimchi song Song Ramen Pack South Korea
## Stars Top.Ten
## 1 3.75
## 2 1
## 3 2.25
## 4 2.75
## 5 3.75
## 6 4.75
df <- df %>%
separate(Top.Ten,into=c("Year","Ranking"),sep=" \\#")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 2543 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
head(df)
## Review.. Brand
## 1 2580 New Touch
## 2 2579 Just Way
## 3 2578 Nissin
## 4 2577 Wei Lih
## 5 2576 Ching's Secret
## 6 2575 Samyang Foods
## Variety Style Country
## 1 T's Restaurant Tantanmen Cup Japan
## 2 Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles Pack Taiwan
## 3 Cup Noodles Chicken Vegetable Cup USA
## 4 GGE Ramen Snack Tomato Flavor Pack Taiwan
## 5 Singapore Curry Pack India
## 6 Kimchi song Song Ramen Pack South Korea
## Stars Year Ranking
## 1 3.75 <NA>
## 2 1 <NA>
## 3 2.25 <NA>
## 4 2.75 <NA>
## 5 3.75 <NA>
## 6 4.75 <NA>
#Find out the data type of all columns in ramen_rating
sapply(df, class)
## Review.. Brand Variety Style Country Stars
## "integer" "character" "character" "character" "character" "character"
## Year Ranking
## "character" "character"
# Convert Ranking into integer
df$Ranking <- as.integer(df$Ranking)
df$Stars <- as.integer(df$Stars)
## Warning: NAs introduced by coercion
# Verify conversion
sapply(df, class)
## Review.. Brand Variety Style Country Stars
## "integer" "character" "character" "character" "character" "integer"
## Year Ranking
## "character" "integer"
df$Year[df$Year == ''] = NA
df1 <- na.omit(df)
head(df1)
## Review.. Brand Variety
## 617 1964 MAMA Instant Noodles Coconut Milk Flavour
## 634 1947 Prima Taste Singapore Laksa Wholegrain La Mian
## 656 1925 Prima Juzz's Mee Creamy Chicken Flavour
## 674 1907 Prima Taste Singapore Curry Wholegrain La Mian
## 753 1828 Tseng Noodles Scallion With Sichuan Pepper Flavor
## 892 1689 Wugudaochang Tomato Beef Brisket Flavor Purple Potato Noodle
## Style Country Stars Year Ranking
## 617 Pack Myanmar 5 2016 10
## 634 Pack Singapore 5 2016 1
## 656 Pack Singapore 5 2016 8
## 674 Pack Singapore 5 2016 5
## 753 Pack Taiwan 5 2016 9
## 892 Pack China 5 2016 7
df[which.max(df$Stars ),]
## Review.. Brand Variety Style Country Stars Year
## 11 2570 Tao Kae Noi Creamy tom Yum Kung Flavour Pack Thailand 5 <NA>
## Ranking
## 11 NA
Tao Kae Noi brand’s Creamy tom Yum Kung Flavour is the highest rated (6 stars) noodle overall. However, it is interesting that this brand in not in the top ranking.
df[which.max(df$Ranking ),]
## Review.. Brand Variety Style Country Stars
## 617 1964 MAMA Instant Noodles Coconut Milk Flavour Pack Myanmar 5
## Year Ranking
## 617 2016 10
MAMA Brand Instant Noodles Coconut Milk Flavour is the highest rated brand with a rating of 10.
df1 %>%
group_by(Country) %>%
summarise(avg_rating = round(mean(Ranking),),
count=n()) %>%
arrange(desc(avg_rating))
## # A tibble: 11 x 3
## Country avg_rating count
## <chr> <dbl> <int>
## 1 Myanmar 10 1
## 2 Taiwan 10 2
## 3 Hong Kong 9 1
## 4 Thailand 9 3
## 5 China 7 1
## 6 South Korea 7 5
## 7 Japan 5 6
## 8 Singapore 5 7
## 9 Malaysia 4 6
## 10 USA 4 1
## 11 Indonesia 3 4
Myanmar and Tiwan are the two countries with the highest average raking for noodles for any brand.
df1 %>%
group_by(Country) %>%
ggplot(aes(x=Ranking,y=Country)) +
geom_bar(Country = "dodge",
stat = "summary",
fun = "mean") +
ggtitle("Average Rating by Country") + ylab("Country")
## Warning: Ignoring unknown parameters: Country
Based on the analysis, Myanmar has the highest average rating for noodles and Indonesia has the lowest. From the table above, it is seen that Singapore has the highest number (7) of rated noodles. USA has only one noodle that is rated. Nongshim brand’s “Jinjja Jinjja Flamin’ Hot & Nutty” noodle is the only one rated in the USA. It has a star rating of 5 but the ranking is 4/10.
“How to Find the Highest Value of a Column in a Data Frame in R?” Stack Overflow, 13 June 2014, https://stackoverflow.com/questions/24212739/how-to-find-the-highest-value-of-a-column-in-a-data-frame-in-r