Data607Project 2A

I am using Dataset posted by Benson Yik Seong Toi

Dataset: The Ramen rater, “THE BIG LIST,” 2021

Link: https://www.kaggle.com/residentmario/ramen-ratings

This Dataset is recorded for a ramen product review. Up to date, this data is provided by 2500 reviewers and keeps updating any new ramen in the market.

I will analyze the data to find the following: 1. Top Ranking Brand 2. Highest Rated Brand 3. Top ranking brand by Country

Import Library

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(readr)
library(curl)

## Warning: package 'curl' was built under R version 4.1.3

## Using libcurl 7.64.1 with Schannel

## 
## Attaching package: 'curl'

## The following object is masked from 'package:readr':
## 
##     parse_date

##install.packages("curl")
library(ggplot2)
##install.packages("ggmap")
library(dplyr)
library(stringr)
library("magrittr")

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:purrr':
## 
##     set_names

## The following object is masked from 'package:tidyr':
## 
##     extract

Read data from my GitHub and load to a R dataframe

df<-read.csv("https://raw.githubusercontent.com/deepasharma06/Data-607/main/ramen-ratings%20Dataset%20by%20Benson.csv")
head(df)

##   Review..          Brand
## 1     2580      New Touch
## 2     2579       Just Way
## 3     2578         Nissin
## 4     2577        Wei Lih
## 5     2576 Ching's Secret
## 6     2575  Samyang Foods
##                                                       Variety Style     Country
## 1                                   T's Restaurant Tantanmen    Cup       Japan
## 2 Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles  Pack      Taiwan
## 3                               Cup Noodles Chicken Vegetable   Cup         USA
## 4                               GGE Ramen Snack Tomato Flavor  Pack      Taiwan
## 5                                             Singapore Curry  Pack       India
## 6                                      Kimchi song Song Ramen  Pack South Korea
##   Stars Top.Ten
## 1  3.75        
## 2     1        
## 3  2.25        
## 4  2.75        
## 5  3.75        
## 6  4.75

This code is to break the Top.Ten column into Year and Ranking

df <- df %>%
  separate(Top.Ten,into=c("Year","Ranking"),sep=" \\#")

## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 2543 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

head(df)

##   Review..          Brand
## 1     2580      New Touch
## 2     2579       Just Way
## 3     2578         Nissin
## 4     2577        Wei Lih
## 5     2576 Ching's Secret
## 6     2575  Samyang Foods
##                                                       Variety Style     Country
## 1                                   T's Restaurant Tantanmen    Cup       Japan
## 2 Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles  Pack      Taiwan
## 3                               Cup Noodles Chicken Vegetable   Cup         USA
## 4                               GGE Ramen Snack Tomato Flavor  Pack      Taiwan
## 5                                             Singapore Curry  Pack       India
## 6                                      Kimchi song Song Ramen  Pack South Korea
##   Stars Year Ranking
## 1  3.75         <NA>
## 2     1         <NA>
## 3  2.25         <NA>
## 4  2.75         <NA>
## 5  3.75         <NA>
## 6  4.75         <NA>

Data Tidy: I found that the records where the year was not populated was not clean. There were other fields that were not populated as well such as the ratings. So I conlcuded that these rows needed to be ignored. So I created a different datagrame without these rows where the year is not populated as the data would be meaningless without the year. Before doing that, I converted the Ranking field into integer.

#Find out the data type of all columns in ramen_rating
sapply(df, class)

##    Review..       Brand     Variety       Style     Country       Stars 
##   "integer" "character" "character" "character" "character" "character" 
##        Year     Ranking 
## "character" "character"

# Convert Ranking into integer
df$Ranking <- as.integer(df$Ranking)
df$Stars <- as.integer(df$Stars)

## Warning: NAs introduced by coercion

# Verify conversion
sapply(df, class)

##    Review..       Brand     Variety       Style     Country       Stars 
##   "integer" "character" "character" "character" "character"   "integer" 
##        Year     Ranking 
## "character"   "integer"

Then I remove data where the year is not populated.

df$Year[df$Year == ''] = NA

df1 <- na.omit(df) 
head(df1)

##     Review..         Brand                                         Variety
## 617     1964          MAMA            Instant Noodles Coconut Milk Flavour
## 634     1947   Prima Taste              Singapore Laksa Wholegrain La Mian
## 656     1925         Prima               Juzz's Mee Creamy Chicken Flavour
## 674     1907   Prima Taste              Singapore Curry Wholegrain La Mian
## 753     1828 Tseng Noodles            Scallion With Sichuan Pepper  Flavor
## 892     1689  Wugudaochang Tomato Beef Brisket Flavor Purple Potato Noodle
##     Style   Country Stars Year Ranking
## 617  Pack   Myanmar     5 2016      10
## 634  Pack Singapore     5 2016       1
## 656  Pack Singapore     5 2016       8
## 674  Pack Singapore     5 2016       5
## 753  Pack    Taiwan     5 2016       9
## 892  Pack     China     5 2016       7

Analysis:

Top Ranking Brand

df[which.max(df$Stars ),]

##    Review..       Brand                     Variety Style  Country Stars Year
## 11     2570 Tao Kae Noi Creamy tom Yum Kung Flavour  Pack Thailand     5 <NA>
##    Ranking
## 11      NA

Tao Kae Noi brand’s Creamy tom Yum Kung Flavour is the highest rated (6 stars) noodle overall. However, it is interesting that this brand in not in the top ranking.

Highest Rated Brand

df[which.max(df$Ranking ),]

##     Review.. Brand                              Variety Style Country Stars
## 617     1964  MAMA Instant Noodles Coconut Milk Flavour  Pack Myanmar     5
##     Year Ranking
## 617 2016      10

MAMA Brand Instant Noodles Coconut Milk Flavour is the highest rated brand with a rating of 10.

Average Ranking for any brand by country

df1 %>%
    group_by(Country) %>%
    summarise(avg_rating = round(mean(Ranking),), 
              count=n()) %>%
    arrange(desc(avg_rating))

## # A tibble: 11 x 3
##    Country     avg_rating count
##    <chr>            <dbl> <int>
##  1 Myanmar             10     1
##  2 Taiwan              10     2
##  3 Hong Kong            9     1
##  4 Thailand             9     3
##  5 China                7     1
##  6 South Korea          7     5
##  7 Japan                5     6
##  8 Singapore            5     7
##  9 Malaysia             4     6
## 10 USA                  4     1
## 11 Indonesia            3     4

Myanmar and Tiwan are the two countries with the highest average raking for noodles for any brand.

df1 %>%
    group_by(Country) %>%
    ggplot(aes(x=Ranking,y=Country)) +
    geom_bar(Country = "dodge",
        stat = "summary",
        fun = "mean") +
    ggtitle("Average Rating by Country") + ylab("Country")

## Warning: Ignoring unknown parameters: Country

Conclusion:

Based on the analysis, Myanmar has the highest average rating for noodles and Indonesia has the lowest. From the table above, it is seen that Singapore has the highest number (7) of rated noodles. USA has only one noodle that is rated. Nongshim brand’s “Jinjja Jinjja Flamin’ Hot & Nutty” noodle is the only one rated in the USA. It has a star rating of 5 but the ranking is 4/10.

References:

“How to Find the Highest Value of a Column in a Data Frame in R?” Stack Overflow, 13 June 2014, https://stackoverflow.com/questions/24212739/how-to-find-the-highest-value-of-a-column-in-a-data-frame-in-r