This data set is a dataset that contains information about users reviews on a particular ramen,the type of ramen,brand of ramen and what country the ramen is from.. This dataset was taken from Kaggle and I got the dataset from my classmate Benson so thanks to Benson I also answered some questions that Benson wanted to analyze and I also got some question from a user called Passerby1 from Kaggle. For this dataset I performed data transformation and analysis to the best of my ability. Unfortunately not much data cleaning was required but there was a lot of observations in the data in which I parsed out using dplyr and tidyr.
I removed the Top.Ten and Review columns since they were empty and they weren’t much relevant for the data analysis I was performing so I removed it from the dataset.
## Project 2 Read the text file
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyr)
library(dplyr)
data <- read.csv("https://raw.githubusercontent.com/AldataSci/Project2-Data607/main/ramen-ratings.csv",header=TRUE,sep=",")
head(data)
## Review.. Brand
## 1 2580 New Touch
## 2 2579 Just Way
## 3 2578 Nissin
## 4 2577 Wei Lih
## 5 2576 Ching's Secret
## 6 2575 Samyang Foods
## Variety Style Country
## 1 T's Restaurant Tantanmen Cup Japan
## 2 Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles Pack Taiwan
## 3 Cup Noodles Chicken Vegetable Cup USA
## 4 GGE Ramen Snack Tomato Flavor Pack Taiwan
## 5 Singapore Curry Pack India
## 6 Kimchi song Song Ramen Pack South Korea
## Stars Top.Ten
## 1 3.75
## 2 1
## 3 2.25
## 4 2.75
## 5 3.75
## 6 4.75
We can use this data set to analyze the favorite favor, best brand, r amen style, and more The most reviews given to a particular Ramen Brand is Nissin Chicken Noodle Soup.
data %>%
select(-c("Top.Ten","Review..")) %>%
group_by(Brand) %>%
count(Brand) %>%
arrange(desc(n))
## # A tibble: 355 x 2
## # Groups: Brand [355]
## Brand n
## <chr> <int>
## 1 Nissin 381
## 2 Nongshim 98
## 3 Maruchan 76
## 4 Mama 71
## 5 Paldo 66
## 6 Myojo 63
## 7 Indomie 53
## 8 Samyang Foods 52
## 9 Ottogi 46
## 10 Lucky Me! 34
## # ... with 345 more rows
In order to find the favorite flavors I selected the relevant groups which were Brand, its variety and The rating its received, I grouped the data by its variety and in order to find the favorite flavors I wanted to see which users gave it a higher than a four star rating which would give me an idea that the user really enjoyed this particular flavor and that high rating could signify that it was their favorite flavor after that I counted the variety in that filter and I filter the counts above 1 since one rating means that the user disliked that ramen variety and I arrange it from descending order and I got 5 counts of yakisoba that has gotten a four star rating or above. Now this doesn’t mean this is the favorite flavor among users but it tells me that yakisoba variety has been rating pretty highly among users.
## Analyze the favorite flavor, It seems like yakisoba is the favorite flavor among reviewers
data %>%
select(Brand,Variety,Stars) %>%
group_by(Variety) %>%
filter(Stars>="4") %>%
count(Variety) %>%
filter(n>1) %>%
arrange(desc(n))
## # A tibble: 39 x 2
## # Groups: Variety [39]
## Variety n
## <chr> <int>
## 1 Yakisoba 5
## 2 Artificial Chicken 3
## 3 Curry Flavour Instant Noodles 3
## 4 Curry Udon 3
## 5 Kokomen Spicy Chicken 3
## 6 Artificial Hot & Sour Shrimp 2
## 7 Bul Jjamppong 2
## 8 Chef Creamy Tom Yam Flavour 2
## 9 Chef Curry Laksa Flavour 2
## 10 Chef Lontong Flavour 2
## # ... with 29 more rows
In order to find the best rated ramen brand among users I selected the columns that were important i.e Brand,Country and Stars, I then grouped it by Stars and Brand because I needed to group the data by the brand and its rating after than I counted each group and filtered out by Stars and where the counts of the size were greater than 5. After I arranged it from descending order.From the tidying of the data I can see that Nissin has really gotten a mix of good rating ranging from 4 to 5 stars with reviews greater than 20
## Finding the best rated ramen brand among users:
data %>%
select(Brand,Country,Stars) %>%
group_by(Brand,Stars) %>%
summarise(count=n()) %>%
filter(Stars>=4.5) %>%
filter(count>5) %>%
arrange(desc(count))
## `summarise()` has grouped output by 'Brand'. You can override using the `.groups` argument.
## # A tibble: 18 x 3
## # Groups: Brand [14]
## Brand Stars count
## <chr> <chr> <int>
## 1 Nissin 5 68
## 2 Nissin 4.5 21
## 3 MyKuali 5 19
## 4 Nongshim 5 19
## 5 Indomie 5 16
## 6 Paldo 5 16
## 7 Mama 5 14
## 8 KOKA 5 10
## 9 Nissin 4.75 10
## 10 Nongshim 4.5 9
## 11 Samyang Foods 5 9
## 12 Mamee 5 8
## 13 Prima Taste 5 7
## 14 Sapporo Ichiban 5 7
## 15 A-Sha Dry Noodle 5 6
## 16 Indomie 4.5 6
## 17 MAMA 5 6
## 18 Myojo 4.5 6
I selected the variety and the country, we then group the data by country, I counted each individual country and I arranged by descending order from here We can see that from the data most ramen products that are reviewed are from Japan
Countr_y <- data %>%
select(Variety,Country) %>%
group_by(Country) %>%
count(Country) %>%
arrange(desc(n))
Countr_y
## # A tibble: 38 x 2
## # Groups: Country [38]
## Country n
## <chr> <int>
## 1 Japan 352
## 2 USA 323
## 3 South Korea 309
## 4 Taiwan 224
## 5 Thailand 191
## 6 China 169
## 7 Malaysia 156
## 8 Hong Kong 137
## 9 Indonesia 126
## 10 Singapore 109
## # ... with 28 more rows
Here I just made a nice and simple bar graph for the data
ggplot(data=Countr_y) +
aes(x=Country, y=n) + geom_bar(stat="Identity") +
coord_flip()