Can’t decide on what chocolate bar to get next? Are you looking for a new chocolate bar to try? Or maybe you are intrigued by what is generally considered to be the best chocolate bar available?
This analysis is conducted using R on the chocolate bar ratings from all over the world, to highlight what factors affect these ratings, what these chocolate bars might contain and to help you decide what chocolate bar you would love to try next.
This dataset is made up of 2,500 chocolate bar ratings from around the globe. The chocolate bars rated are predominantly plain dark chocolate; these ratings do not reflect health benefits, social missions or organic status. The ratings have been accumulated from 2006 to 2021, the chocolate bars have been evaluated from a combination of both objective qualities and subjective interpretation, the rating represents an experience with one bar from a specific batch.
The Rating system is detailed below;
The packages to be used for the analysis are loaded into the R environment
# Loading the libraries
library(dplyr)
library(tidyr)
library(janitor)
library(ggplot2)
library(corrplot)
library(tm)
library(wordcloud)
The dataset is imported into the R environment
#Importing the dataset
chocolate_bars <- read.csv("chocolate_bars.csv")
#Returns the structure of the dataset
str(chocolate_bars)
## 'data.frame': 2530 obs. of 11 variables:
## $ id : int 2454 2458 2454 2542 2546 2546 2542 797 797 1011 ...
## $ manufacturer : chr "5150" "5150" "5150" "5150" ...
## $ company_location: chr "U.S.A." "U.S.A." "U.S.A." "U.S.A." ...
## $ year_reviewed : int 2019 2019 2019 2021 2021 2021 2021 2012 2012 2013 ...
## $ bean_origin : chr "Tanzania" "Dominican Republic" "Madagascar" "Fiji" ...
## $ bar_name : chr "Kokoa Kamili, batch 1" "Zorzal, batch 1" "Bejofo Estate, batch 1" "Matasawalevu, batch 1" ...
## $ cocoa_percent : num 76 76 76 68 72 80 68 70 63 70 ...
## $ num_ingredients : num 3 3 3 3 3 3 3 4 4 4 ...
## $ ingredients : chr "B,S,C" "B,S,C" "B,S,C" "B,S,C" ...
## $ review : chr "rich cocoa, fatty, bready" "cocoa, vegetal, savory" "cocoa, blackberry, full body" "chewy, off, rubbery" ...
## $ rating : num 3.25 3.5 3.75 3 3 3.25 3.5 3.5 3.75 2.75 ...
# Query to return any duplicate reviews
get_dupes(chocolate_bars)
## [1] id manufacturer company_location year_reviewed
## [5] bean_origin bar_name cocoa_percent num_ingredients
## [9] ingredients review rating dupe_count
## <0 rows> (or 0-length row.names)
#Returns a detailed summary of the dataset
summary(chocolate_bars)
## id manufacturer company_location year_reviewed
## Min. : 5 Length:2530 Length:2530 Min. :2006
## 1st Qu.: 802 Class :character Class :character 1st Qu.:2012
## Median :1454 Mode :character Mode :character Median :2015
## Mean :1430 Mean :2014
## 3rd Qu.:2079 3rd Qu.:2018
## Max. :2712 Max. :2021
##
## bean_origin bar_name cocoa_percent num_ingredients
## Length:2530 Length:2530 Min. : 42.00 Min. :1.000
## Class :character Class :character 1st Qu.: 70.00 1st Qu.:2.000
## Mode :character Mode :character Median : 70.00 Median :3.000
## Mean : 71.64 Mean :3.041
## 3rd Qu.: 74.00 3rd Qu.:4.000
## Max. :100.00 Max. :6.000
## NA's :87
## ingredients review rating
## Length:2530 Length:2530 Min. :1.000
## Class :character Class :character 1st Qu.:3.000
## Mode :character Mode :character Median :3.250
## Mean :3.196
## 3rd Qu.:3.500
## Max. :4.000
##
The summary of the dataset indicates that there are (87) missing values from the number of ingredients field hence we remove these missing values;
#Filtering out missing values
chocolate_bars <- chocolate_bars %>% filter(num_ingredients != "NA")
The highest rated chocolate bars have their cocoa bean origins from
China and Sao Tome & Principe; the Solomon Islands
following closely behind.
Chile hosts the manufacturer with the highest rated
chocolate bar; Argentina, Poland and Sao Tome & Principe tied for
second place.
As expected the most memorable feature of the chocolate bar was the
Cocoa with the other notable features being; Sweet,
Nutty, Fruit, Roasty and Mild.
The yearly ratings of chocolate bars has seen a steady increase as
the years have gone by, with there being a slight decrease in the
ratings between 2017 and 2020.
Sao Tome & Principe ranks high in both cocoa bean origin ratings and manufacturer location ratings, so you should consider adding them to your wishlists of where to order your next chocolate bars from.
The Yearly ratings of chocolate bars have also experienced a steady rise; which indicates the quality is of the bars might have also seen a constant rise.
Further analysis is also required to determine if the number of ingredients used and the percentage of cocoa in the chocolate bars had any effect on the ratings; as preliminary analysis indicated that there was no correlation between the ratings and the aforementioned factors.
Finally I want to Thank you for taking your time to go through my analysis, any questions or feedback is welcomed.