Chocolate and Tea unit, aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries. The project work is to collect and analyze data on the latest chocolate ratings. In particular, to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help company to create their next chocolate bar menu. Dataset used from kaggle.com, it is organished,filtered and sorted by Shuvam Anupam.
Loading the dataset assigning a dataframe
flavours_df <- read_csv("flavours_of_cacao_v2.csv")
## Rows: 1000 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Company, Specific Bean Origin, Percent_of_cocoa, Location, Bean
## _ty...
## dbl (3): REF, Review
## Date, Rating
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Data Previewing and cleaning
colnames(flavours_df)
## [1] "Company" "Specific Bean Origin" "REF"
## [4] "Review\nDate" "Percent_of_cocoa" "Location"
## [7] "Rating" "Bean\n_type" "Broad_bean_\norigin"
str(flavours_df)
## spc_tbl_ [1,000 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Company : chr [1:1000] "A. Morin" "A. Morin" "A. Morin" "A. Morin" ...
## $ Specific Bean Origin: chr [1:1000] "Agua Grande" "Kpime" "Atsane" "Akata" ...
## $ REF : num [1:1000] 1876 1676 1676 1680 1704 ...
## $ Review
## Date : num [1:1000] 2016 2015 2015 2015 2015 ...
## $ Percent_of_cocoa : chr [1:1000] "63%" "70%" "70%" "70%" ...
## $ Location : chr [1:1000] "France" "France" "France" "France" ...
## $ Rating : num [1:1000] 3.75 2.75 3 3.5 3.5 2.75 3.5 3.5 3.75 4 ...
## $ Bean
## _type : chr [1:1000] " " " " " " " " ...
## $ Broad_bean_
## origin : chr [1:1000] "Sao Tome" "Togo" "Togo" "Togo" ...
## - attr(*, "spec")=
## .. cols(
## .. Company = col_character(),
## .. `Specific Bean Origin` = col_character(),
## .. REF = col_double(),
## .. `Review
## .. Date` = col_double(),
## .. Percent_of_cocoa = col_character(),
## .. Location = col_character(),
## .. Rating = col_double(),
## .. `Bean
## .. _type` = col_character(),
## .. `Broad_bean_
## .. origin` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Removing the missing Values from the Dataset
na.omit(flavours_df)
## # A tibble: 1,000 × 9
## Company `Specific Bean Origin` REF `Review\nDate` Percent_of_cocoa Location
## <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 A. Mor… Agua Grande 1876 2016 63% France
## 2 A. Mor… Kpime 1676 2015 70% France
## 3 A. Mor… Atsane 1676 2015 70% France
## 4 A. Mor… Akata 1680 2015 70% France
## 5 A. Mor… Quilla 1704 2015 70% France
## 6 A. Mor… Carenero 1315 2014 70% France
## 7 A. Mor… Cuba 1315 2014 70% France
## 8 A. Mor… Sur del Lago 1315 2014 70% France
## 9 A. Mor… Puerto Cabello 1319 2014 70% France
## 10 A. Mor… Pablino 1319 2014 70% France
## # ℹ 990 more rows
## # ℹ 3 more variables: Rating <dbl>, `Bean\n_type` <chr>,
## # `Broad_bean_\norigin` <chr>
Main focus is on Rating,Percent_of_cocoa,location and Company.By using the select() function to create a new data frame with only these four variables.
trimmed_flavours_df <- flavours_df %>%
select(Rating,Percent_of_cocoa,Company,Location)
print(trimmed_flavours_df)
## # A tibble: 1,000 × 4
## Rating Percent_of_cocoa Company Location
## <dbl> <chr> <chr> <chr>
## 1 3.75 63% A. Morin France
## 2 2.75 70% A. Morin France
## 3 3 70% A. Morin France
## 4 3.5 70% A. Morin France
## 5 3.5 70% A. Morin France
## 6 2.75 70% A. Morin France
## 7 3.5 70% A. Morin France
## 8 3.5 70% A. Morin France
## 9 3.75 70% A. Morin France
## 10 4 70% A. Morin France
## # ℹ 990 more rows
selecting the basic statistics that can help team better understand the ratings system in your data. Adding the code chunk that lets you find the standard deviation for the variable Rating.
trimmed_flavours_df %>%
summarize(sd(Rating))
## # A tibble: 1 × 1
## `sd(Rating)`
## <dbl>
## 1 0.480
Chocolate and Tea considers a bar to be super dark chocolate if the bar’s cocoa percent is greater than or equal to 75%.Also determine that any rating greater than or equal to 3.7 points can be considered a high rating. Nnow creating a new data frame to find out which chocolate bars meet these two conditions.
best_trimmed_flavours_df <- trimmed_flavours_df %>%
filter(Percent_of_cocoa >= 75, Rating >= 3.7)
print(best_trimmed_flavours_df)
## # A tibble: 23 × 4
## Rating Percent_of_cocoa Company Location
## <dbl> <chr> <chr> <chr>
## 1 3.75 75% Akesson's (Pralus) Switzerland
## 2 4 75% Amedei Italy
## 3 3.75 75% AMMA Brazil
## 4 3.75 77% Askinosie U.S.A.
## 5 4 75% Bonnat France
## 6 3.75 75% Bonnat France
## 7 4 75% Bonnat France
## 8 4 75% Bonnat France
## 9 4 75% Bonnat France
## 10 3.75 75% Bonnat France
## # ℹ 13 more rows
Used the geom_bar() function to create a bar chart. Adding the code chunk that creates a bar chart with the variable Company Location on the x-axis.
ggplot(data = best_trimmed_flavours_df) + geom_bar(mapping = aes(x=Location, fill = factor(Rating))) +
theme(axis.text.x = element_text(angle = 45))+
labs(title = "Comparision of chocolate location by Ratings")
Used the geom_bar() function to create a bar chart. Adding the code chunk that creates a bar chart with the variable percent_of_cocoa on the x-axis.
ggplot(data = best_trimmed_flavours_df) + geom_bar(mapping = aes(x=Percent_of_cocoa, fill = factor(Location))) +
theme(axis.text.x = element_text(angle = 45))+
labs(title = "Comparision of chocolate Percent_of_cocoa by Location")
Used the geom_bar() function created a bar chart on percent_of_cocoa and wrap around facets of the variable Cocoa Percent and Location.
ggplot(data = best_trimmed_flavours_df)+ geom_bar(mapping = aes(x=Percent_of_cocoa,fill = Location))+
facet_wrap(~Percent_of_cocoa~Rating)+
labs(title = "Comparision of percent_of_cocoa by Location and Rating in best cocoa flavours")
According to the above Analysis the insights shown above as under:-
France,Italy and Sao Tome produces best rating flavours of cocoa France topped the chart in both the Rating and percent_of_cocoa These are the insights asked by the unit..
***End Of The Report***