Intro:

This data set is a dataset that contains information about users reviews on a particular ramen,the type of ramen,brand of ramen and what country the ramen is from.. This dataset was taken from Kaggle and I got the dataset from my classmate Benson so thanks to Benson I also answered some questions that Benson wanted to analyze and I also got some question from a user called Passerby1 from Kaggle. For this dataset I performed data transformation and analysis to the best of my ability. Unfortunately not much data cleaning was required but there was a lot of observations in the data in which I parsed out using dplyr and tidyr.

Reading the data

I removed the Top.Ten and Review columns since they were empty and they weren’t much relevant for the data analysis I was performing so I removed it from the dataset.

## Project 2 Read the text file
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(tidyr)
library(dplyr)
data <- read.csv("https://raw.githubusercontent.com/AldataSci/Project2-Data607/main/ramen-ratings.csv",header=TRUE,sep=",")

head(data)
##   Review..          Brand
## 1     2580      New Touch
## 2     2579       Just Way
## 3     2578         Nissin
## 4     2577        Wei Lih
## 5     2576 Ching's Secret
## 6     2575  Samyang Foods
##                                                       Variety Style     Country
## 1                                   T's Restaurant Tantanmen    Cup       Japan
## 2 Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles  Pack      Taiwan
## 3                               Cup Noodles Chicken Vegetable   Cup         USA
## 4                               GGE Ramen Snack Tomato Flavor  Pack      Taiwan
## 5                                             Singapore Curry  Pack       India
## 6                                      Kimchi song Song Ramen  Pack South Korea
##   Stars Top.Ten
## 1  3.75        
## 2     1        
## 3  2.25        
## 4  2.75        
## 5  3.75        
## 6  4.75

Most Reviews given to a particular Ramen Brand

We can use this data set to analyze the favorite favor, best brand, r amen style, and more The most reviews given to a particular Ramen Brand is Nissin Chicken Noodle Soup.

data %>% 
  select(-c("Top.Ten","Review..")) %>%
  group_by(Brand) %>%
  count(Brand) %>%
  arrange(desc(n))
## # A tibble: 355 x 2
## # Groups:   Brand [355]
##    Brand             n
##    <chr>         <int>
##  1 Nissin          381
##  2 Nongshim         98
##  3 Maruchan         76
##  4 Mama             71
##  5 Paldo            66
##  6 Myojo            63
##  7 Indomie          53
##  8 Samyang Foods    52
##  9 Ottogi           46
## 10 Lucky Me!        34
## # ... with 345 more rows

Highest rating of flavors among users (Favorite Variety)

In order to find the favorite flavors I selected the relevant groups which were Brand, its variety and The rating its received, I grouped the data by its variety and in order to find the favorite flavors I wanted to see which users gave it a higher than a four star rating which would give me an idea that the user really enjoyed this particular flavor and that high rating could signify that it was their favorite flavor after that I counted the variety in that filter and I filter the counts above 1 since one rating means that the user disliked that ramen variety and I arrange it from descending order and I got 5 counts of yakisoba that has gotten a four star rating or above. Now this doesn’t mean this is the favorite flavor among users but it tells me that yakisoba variety has been rating pretty highly among users.

## Analyze the favorite flavor, It seems like yakisoba is the favorite flavor among reviewers
data %>%
   select(Brand,Variety,Stars) %>%
  group_by(Variety) %>%
  filter(Stars>="4") %>%
  count(Variety) %>%
  filter(n>1) %>%
  arrange(desc(n))
## # A tibble: 39 x 2
## # Groups:   Variety [39]
##    Variety                           n
##    <chr>                         <int>
##  1 Yakisoba                          5
##  2 Artificial Chicken                3
##  3 Curry Flavour Instant Noodles     3
##  4 Curry Udon                        3
##  5 Kokomen Spicy Chicken             3
##  6 Artificial Hot & Sour Shrimp      2
##  7 Bul Jjamppong                     2
##  8 Chef Creamy Tom Yam Flavour       2
##  9 Chef Curry Laksa Flavour          2
## 10 Chef Lontong Flavour              2
## # ... with 29 more rows

Best rated ramen brand among users

In order to find the best rated ramen brand among users I selected the columns that were important i.e Brand,Country and Stars, I then grouped it by Stars and Brand because I needed to group the data by the brand and its rating after than I counted each group and filtered out by Stars and where the counts of the size were greater than 5. After I arranged it from descending order.From the tidying of the data I can see that Nissin has really gotten a mix of good rating ranging from 4 to 5 stars with reviews greater than 20

## Finding the best rated ramen brand among users: 

data %>%
  select(Brand,Country,Stars) %>%
  group_by(Brand,Stars) %>%
  summarise(count=n()) %>%
  filter(Stars>=4.5) %>%
  filter(count>5) %>%
  arrange(desc(count))
## `summarise()` has grouped output by 'Brand'. You can override using the `.groups` argument.
## # A tibble: 18 x 3
## # Groups:   Brand [14]
##    Brand            Stars count
##    <chr>            <chr> <int>
##  1 Nissin           5        68
##  2 Nissin           4.5      21
##  3 MyKuali          5        19
##  4 Nongshim         5        19
##  5 Indomie          5        16
##  6 Paldo            5        16
##  7 Mama             5        14
##  8 KOKA             5        10
##  9 Nissin           4.75     10
## 10 Nongshim         4.5       9
## 11 Samyang Foods    5         9
## 12 Mamee            5         8
## 13 Prima Taste      5         7
## 14 Sapporo Ichiban  5         7
## 15 A-Sha Dry Noodle 5         6
## 16 Indomie          4.5       6
## 17 MAMA             5         6
## 18 Myojo            4.5       6

What country produces the highest amount of ramen products

I selected the variety and the country, we then group the data by country, I counted each individual country and I arranged by descending order from here We can see that from the data most ramen products that are reviewed are from Japan

Countr_y <- data %>% 
  select(Variety,Country) %>%
  group_by(Country) %>%
  count(Country) %>%
  arrange(desc(n))
Countr_y
## # A tibble: 38 x 2
## # Groups:   Country [38]
##    Country         n
##    <chr>       <int>
##  1 Japan         352
##  2 USA           323
##  3 South Korea   309
##  4 Taiwan        224
##  5 Thailand      191
##  6 China         169
##  7 Malaysia      156
##  8 Hong Kong     137
##  9 Indonesia     126
## 10 Singapore     109
## # ... with 28 more rows

Here I just made a nice and simple bar graph for the data

ggplot(data=Countr_y) + 
  aes(x=Country, y=n) + geom_bar(stat="Identity") +
  coord_flip()

Conclusion:

I had fun cleaning,and tidying Benson’s ramen dataset I think its crazy that there is so many variety of ramen brands and styles all over the world. I find it amusing that the second most ramen products ever created is in the US. The fact that many users have reviewed nissin brand of noodle products just shows how popular this brand is all over the country.