Note: This Project is built upon the Scenario included in the Course Challenge of ‘Data Analysis with R Programming’, provided by Google on Coursera.

Introduction.
In this Scenario, the task is to analyze the data of the company called ‘Chocolate and Tea’, a chain of cafes. Then, the findings and output of the analysis is to be shared with the Stakeholders to improve the overall service quality and enrich the chocolate bar menu of the company.

As described in the course, the main motivation for this analysis is that the company aims to serve chocolate bars that are highly rated by professional critics and to align with the latest ratings and to ensure that the list contains bars from a variety of countries. Specifically, the company aims to determine which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa).

Data Source. For this project, the ‘Chocolate Bar Ratings’ dataset will be used, which consists of 9 columns.

According to the description of the dataset, the cacao flavor is rated on 1 to 5 scale:
5: Elite
4: Premium
3: Satisfactory
2: Disappointing
1: Unpleasant

Step 1. Installing and Loading the Necessary Packages.

install.packages("tidyverse")
install.packages("janitor")
library(tidyverse)
library(janitor)
chocolate_bar_ratings <- read_csv("flavors_of_cacao.csv")

To get a quick overview of the dataset, we run the head() function:

head(chocolate_bar_ratings)

## # A tibble: 6 × 9
##   Company \n(Make…¹ Speci…²   REF Revie…³ Cocoa…⁴ Compa…⁵ Rating Bean\…⁶ Broad…⁷
##   <chr>             <chr>   <dbl>   <dbl> <chr>   <chr>    <dbl> <chr>   <chr>  
## 1 A. Morin          Agua G…  1876    2016 63%     France    3.75         Sao To…
## 2 A. Morin          Kpime    1676    2015 70%     France    2.75         Togo   
## 3 A. Morin          Atsane   1676    2015 70%     France    3            Togo   
## 4 A. Morin          Akata    1680    2015 70%     France    3.5          Togo   
## 5 A. Morin          Quilla   1704    2015 70%     France    3.5          Peru   
## 6 A. Morin          Carene…  1315    2014 70%     France    2.75 Criollo Venezu…
## # … with abbreviated variable names ¹`Company \n(Maker-if known)`,
## #   ²`Specific Bean Origin\nor Bar Name`, ³`Review\nDate`, ⁴`Cocoa\nPercent`,
## #   ⁵`Company\nLocation`, ⁶`Bean\nType`, ⁷`Broad Bean\nOrigin`

Step 2. Cleaning the Data.

When glimpsing the dataset, it is easy notice that some of the column names are inconsistent (having space or other characters), making it hard to work with these column names.

It is necessary to change the column names using clean_names() function in the ‘janitor’ package. Additionally, ‘cleaned_df’ will be assigned as name to the cleaned dataset:

cleaned_df <- chocolate_bar_ratings %>%
  clean_names()

Now, it is a good idea to check the output of the code:

colnames(cleaned_df)

## [1] "company_maker_if_known"           "specific_bean_origin_or_bar_name"
## [3] "ref"                              "review_date"                     
## [5] "cocoa_percent"                    "company_location"                
## [7] "rating"                           "bean_type"                       
## [9] "broad_bean_origin"

The column names have been successfully cleaned and are ready to be used now.

Step 3. Filtering the Data.

According to the instructions in the course challenge, it is determined that any rating greater than or equal to 3.9 points can be considered a high rating. At the same time, a bar is considered to be super dark chocolate if the bar’s cocoa percent is greater than or equal to 75%.

A new subset of the cleaned_df will be created using the filter() function and pipes:

cleaned_df_filtered <- cleaned_df %>%
  filter(rating >= 3.9 & cocoa_percent >= 75)
head(cleaned_df_filtered)

## # A tibble: 6 × 9
##   company_maker_i…¹ speci…²   ref revie…³ cocoa…⁴ compa…⁵ rating bean_…⁶ broad…⁷
##   <chr>             <chr>   <dbl>   <dbl> <chr>   <chr>    <dbl> <chr>   <chr>  
## 1 Amedei            Nine      111    2007 75%     Italy        4 Blend          
## 2 Bonnat            Kaori    1339    2014 75%     France       4         Brazil 
## 3 Bonnat            Haiti     629    2011 75%     France       4         Haiti  
## 4 Bonnat            Madaga…   629    2011 75%     France       4 Criollo Madaga…
## 5 Bonnat            Porcel…   199    2008 75%     France       4 Crioll… Venezu…
## 6 Bonnat            Ocumar…    32    2006 75%     France       4         Venezu…
## # … with abbreviated variable names ¹company_maker_if_known,
## #   ²specific_bean_origin_or_bar_name, ³review_date, ⁴cocoa_percent,
## #   ⁵company_location, ⁶bean_type, ⁷broad_bean_origin

Step 4. Building Visuals based on the Cleaned and Filtered Dataset.

Now it is time to know which companies produce the highly rated super dark chocolate bar based on the filters introduced in Step 3. For this purpose a bar chart will be created using ggplot2 package and its functions.

ggplot(data = cleaned_df_filtered) + 
  geom_bar(mapping = aes(x = company_maker_if_known))

Step 5. Improving the Readibility of the Visual.

The bar chart created in Step 4 is basic, and the readibility should be improved, specifically:

The visual should be provided a name; x and y axes should be renamed
A caption about the review years should be added to the plot
The background of the plot should be removed
The names in the x axis should be clearly visible

min_date <- cleaned_df_filtered %>%
  summarize(min(review_date))

max_date <- cleaned_df_filtered %>%
  summarize(max(review_date))

ggplot(data = cleaned_df_filtered) + 
  geom_bar(mapping = aes(x = company_maker_if_known), fill = 'lightblue') + 
  labs(title = "Top Manufacturers of Highly Rated Chocolate Bar", caption = paste0("Data is from ", min_date, " to ", max_date), x = "Company", y = "Highly Rated Chocolate Bar") + 
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90))

In order to know in which countries these top manufacturers are located, the following visual will be built:

ggplot(data = cleaned_df_filtered) + 
  geom_bar(mapping = aes(x = company_location), fill = 'cyan') + 
  labs(title = "Countries where Top Manufactureres are Located", subtitle = "This chart visualizes how many top chocolate bar producers each country has", x = "Country", y = "Number of Top Manufactureres") + 
  theme_classic()

According to the plots created in Step 5, the Top 2 Manufacturers are Bonnat and Prauls, while the Top 2 Manufacturing countries are France and the USA.

Thank you for your time and attention!

R Markdown Project: Chocolate and Tea

Komil Khakimov

2022-12-29

Thank you for your time and attention!