Can’t decide on what chocolate bar to get next? Are you looking for a new chocolate bar to try? Or maybe you are intrigued by what is generally considered to be the best chocolate bar available?

This analysis is conducted using R on the chocolate bar ratings from all over the world, to highlight what factors affect these ratings, what these chocolate bars might contain and to help you decide what chocolate bar you would love to try next.

About the Dataset

This dataset is made up of 2,500 chocolate bar ratings from around the globe. The chocolate bars rated are predominantly plain dark chocolate; these ratings do not reflect health benefits, social missions or organic status. The ratings have been accumulated from 2006 to 2021, the chocolate bars have been evaluated from a combination of both objective qualities and subjective interpretation, the rating represents an experience with one bar from a specific batch.

The Rating system is detailed below;

Importing and Exploring the Dataset

The packages to be used for the analysis are loaded into the R environment

# Loading the libraries
library(dplyr)
library(tidyr)
library(janitor)
library(ggplot2)
library(corrplot)
library(tm)
library(wordcloud)


Loading the dataset

The dataset is imported into the R environment

#Importing the dataset
chocolate_bars <- read.csv("chocolate_bars.csv")


Checking the structure of the dataset

#Returns the structure of the dataset
str(chocolate_bars)
## 'data.frame':    2530 obs. of  11 variables:
##  $ id              : int  2454 2458 2454 2542 2546 2546 2542 797 797 1011 ...
##  $ manufacturer    : chr  "5150" "5150" "5150" "5150" ...
##  $ company_location: chr  "U.S.A." "U.S.A." "U.S.A." "U.S.A." ...
##  $ year_reviewed   : int  2019 2019 2019 2021 2021 2021 2021 2012 2012 2013 ...
##  $ bean_origin     : chr  "Tanzania" "Dominican Republic" "Madagascar" "Fiji" ...
##  $ bar_name        : chr  "Kokoa Kamili, batch 1" "Zorzal, batch 1" "Bejofo Estate, batch 1" "Matasawalevu, batch 1" ...
##  $ cocoa_percent   : num  76 76 76 68 72 80 68 70 63 70 ...
##  $ num_ingredients : num  3 3 3 3 3 3 3 4 4 4 ...
##  $ ingredients     : chr  "B,S,C" "B,S,C" "B,S,C" "B,S,C" ...
##  $ review          : chr  "rich cocoa, fatty, bready" "cocoa, vegetal, savory" "cocoa, blackberry, full body" "chewy, off, rubbery" ...
##  $ rating          : num  3.25 3.5 3.75 3 3 3.25 3.5 3.5 3.75 2.75 ...


Checking if there are duplicate entries in the dataset

# Query to return any duplicate reviews
get_dupes(chocolate_bars)
##  [1] id               manufacturer     company_location year_reviewed   
##  [5] bean_origin      bar_name         cocoa_percent    num_ingredients 
##  [9] ingredients      review           rating           dupe_count      
## <0 rows> (or 0-length row.names)


Checking the summary of the dataset

#Returns a detailed summary of the dataset
summary(chocolate_bars)
##        id       manufacturer       company_location   year_reviewed 
##  Min.   :   5   Length:2530        Length:2530        Min.   :2006  
##  1st Qu.: 802   Class :character   Class :character   1st Qu.:2012  
##  Median :1454   Mode  :character   Mode  :character   Median :2015  
##  Mean   :1430                                         Mean   :2014  
##  3rd Qu.:2079                                         3rd Qu.:2018  
##  Max.   :2712                                         Max.   :2021  
##                                                                     
##  bean_origin          bar_name         cocoa_percent    num_ingredients
##  Length:2530        Length:2530        Min.   : 42.00   Min.   :1.000  
##  Class :character   Class :character   1st Qu.: 70.00   1st Qu.:2.000  
##  Mode  :character   Mode  :character   Median : 70.00   Median :3.000  
##                                        Mean   : 71.64   Mean   :3.041  
##                                        3rd Qu.: 74.00   3rd Qu.:4.000  
##                                        Max.   :100.00   Max.   :6.000  
##                                                         NA's   :87     
##  ingredients           review              rating     
##  Length:2530        Length:2530        Min.   :1.000  
##  Class :character   Class :character   1st Qu.:3.000  
##  Mode  :character   Mode  :character   Median :3.250  
##                                        Mean   :3.196  
##                                        3rd Qu.:3.500  
##                                        Max.   :4.000  
## 


Removing missing values from the dataset

The summary of the dataset indicates that there are (87) missing values from the number of ingredients field hence we remove these missing values;

#Filtering out missing values
chocolate_bars <- chocolate_bars %>% filter(num_ingredients != "NA")

Analyzing and Visualizing the Data

Bean origins with the highest ratings

The highest rated chocolate bars have their cocoa bean origins from China and Sao Tome & Principe; the Solomon Islands following closely behind.

What countries host the highest rated chocolate bars manufacturers?

Chile hosts the manufacturer with the highest rated chocolate bar; Argentina, Poland and Sao Tome & Principe tied for second place.

What was the most memorable characteristic of the chocolate bar?

As expected the most memorable feature of the chocolate bar was the Cocoa with the other notable features being; Sweet, Nutty, Fruit, Roasty and Mild.

Yearly ratings of chocolate bars

The yearly ratings of chocolate bars has seen a steady increase as the years have gone by, with there being a slight decrease in the ratings between 2017 and 2020.

Conclusion and Recommendation

Finally I want to Thank you for taking your time to go through my analysis, any questions or feedback is welcomed.