Project Scenario

Chocolate and Tea unit, aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries. The project work is to collect and analyze data on the latest chocolate ratings. In particular, to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help company to create their next chocolate bar menu. Dataset used from kaggle.com, it is organished,filtered and sorted by Shuvam Anupam.

Loading the dataset assigning a dataframe

flavours_df <- read_csv("flavours_of_cacao_v2.csv")
## Rows: 1000 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Company, Specific Bean Origin, Percent_of_cocoa, Location, Bean
## _ty...
## dbl (3): REF, Review
## Date, Rating
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Previewing and cleaning

colnames(flavours_df)
## [1] "Company"              "Specific Bean Origin" "REF"                 
## [4] "Review\nDate"         "Percent_of_cocoa"     "Location"            
## [7] "Rating"               "Bean\n_type"          "Broad_bean_\norigin"
str(flavours_df)
## spc_tbl_ [1,000 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Company             : chr [1:1000] "A. Morin" "A. Morin" "A. Morin" "A. Morin" ...
##  $ Specific Bean Origin: chr [1:1000] "Agua Grande" "Kpime" "Atsane" "Akata" ...
##  $ REF                 : num [1:1000] 1876 1676 1676 1680 1704 ...
##  $ Review
## Date        : num [1:1000] 2016 2015 2015 2015 2015 ...
##  $ Percent_of_cocoa    : chr [1:1000] "63%" "70%" "70%" "70%" ...
##  $ Location            : chr [1:1000] "France" "France" "France" "France" ...
##  $ Rating              : num [1:1000] 3.75 2.75 3 3.5 3.5 2.75 3.5 3.5 3.75 4 ...
##  $ Bean
## _type         : chr [1:1000] " " " " " " " " ...
##  $ Broad_bean_
## origin : chr [1:1000] "Sao Tome" "Togo" "Togo" "Togo" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Company = col_character(),
##   ..   `Specific Bean Origin` = col_character(),
##   ..   REF = col_double(),
##   ..   `Review
##   .. Date` = col_double(),
##   ..   Percent_of_cocoa = col_character(),
##   ..   Location = col_character(),
##   ..   Rating = col_double(),
##   ..   `Bean
##   .. _type` = col_character(),
##   ..   `Broad_bean_
##   .. origin` = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Removing the missing Values from the Dataset

na.omit(flavours_df)
## # A tibble: 1,000 × 9
##    Company `Specific Bean Origin`   REF `Review\nDate` Percent_of_cocoa Location
##    <chr>   <chr>                  <dbl>          <dbl> <chr>            <chr>   
##  1 A. Mor… Agua Grande             1876           2016 63%              France  
##  2 A. Mor… Kpime                   1676           2015 70%              France  
##  3 A. Mor… Atsane                  1676           2015 70%              France  
##  4 A. Mor… Akata                   1680           2015 70%              France  
##  5 A. Mor… Quilla                  1704           2015 70%              France  
##  6 A. Mor… Carenero                1315           2014 70%              France  
##  7 A. Mor… Cuba                    1315           2014 70%              France  
##  8 A. Mor… Sur del Lago            1315           2014 70%              France  
##  9 A. Mor… Puerto Cabello          1319           2014 70%              France  
## 10 A. Mor… Pablino                 1319           2014 70%              France  
## # ℹ 990 more rows
## # ℹ 3 more variables: Rating <dbl>, `Bean\n_type` <chr>,
## #   `Broad_bean_\norigin` <chr>

Main focus is on Rating,Percent_of_cocoa,location and Company.By using the select() function to create a new data frame with only these four variables.

trimmed_flavours_df <- flavours_df %>% 
  select(Rating,Percent_of_cocoa,Company,Location)
print(trimmed_flavours_df)
## # A tibble: 1,000 × 4
##    Rating Percent_of_cocoa Company  Location
##     <dbl> <chr>            <chr>    <chr>   
##  1   3.75 63%              A. Morin France  
##  2   2.75 70%              A. Morin France  
##  3   3    70%              A. Morin France  
##  4   3.5  70%              A. Morin France  
##  5   3.5  70%              A. Morin France  
##  6   2.75 70%              A. Morin France  
##  7   3.5  70%              A. Morin France  
##  8   3.5  70%              A. Morin France  
##  9   3.75 70%              A. Morin France  
## 10   4    70%              A. Morin France  
## # ℹ 990 more rows

selecting the basic statistics that can help team better understand the ratings system in your data. Adding the code chunk that lets you find the standard deviation for the variable Rating.

trimmed_flavours_df %>% 
  summarize(sd(Rating))
## # A tibble: 1 × 1
##   `sd(Rating)`
##          <dbl>
## 1        0.480

Chocolate and Tea considers a bar to be super dark chocolate if the bar’s cocoa percent is greater than or equal to 75%.Also determine that any rating greater than or equal to 3.7 points can be considered a high rating. Nnow creating a new data frame to find out which chocolate bars meet these two conditions.

best_trimmed_flavours_df <- trimmed_flavours_df %>% 
  filter(Percent_of_cocoa >= 75, Rating >= 3.7)
print(best_trimmed_flavours_df)
## # A tibble: 23 × 4
##    Rating Percent_of_cocoa Company            Location   
##     <dbl> <chr>            <chr>              <chr>      
##  1   3.75 75%              Akesson's (Pralus) Switzerland
##  2   4    75%              Amedei             Italy      
##  3   3.75 75%              AMMA               Brazil     
##  4   3.75 77%              Askinosie          U.S.A.     
##  5   4    75%              Bonnat             France     
##  6   3.75 75%              Bonnat             France     
##  7   4    75%              Bonnat             France     
##  8   4    75%              Bonnat             France     
##  9   4    75%              Bonnat             France     
## 10   3.75 75%              Bonnat             France     
## # ℹ 13 more rows

Used the geom_bar() function to create a bar chart. Adding the code chunk that creates a bar chart with the variable Company Location on the x-axis.

ggplot(data = best_trimmed_flavours_df) + geom_bar(mapping = aes(x=Location, fill = factor(Rating))) +
  theme(axis.text.x = element_text(angle = 45))+
  labs(title = "Comparision of chocolate location by Ratings")

Used the geom_bar() function to create a bar chart. Adding the code chunk that creates a bar chart with the variable percent_of_cocoa on the x-axis.

ggplot(data = best_trimmed_flavours_df) + geom_bar(mapping = aes(x=Percent_of_cocoa, fill = factor(Location))) +
  theme(axis.text.x = element_text(angle = 45))+
  labs(title = "Comparision of chocolate Percent_of_cocoa by Location")

Used the geom_bar() function created a bar chart on percent_of_cocoa and wrap around facets of the variable Cocoa Percent and Location.

ggplot(data = best_trimmed_flavours_df)+ geom_bar(mapping = aes(x=Percent_of_cocoa,fill = Location))+
  facet_wrap(~Percent_of_cocoa~Rating)+
  labs(title = "Comparision of percent_of_cocoa by Location and Rating in best cocoa flavours")

Conclusion

According to the above Analysis the insights shown above as under:-

France,Italy and Sao Tome produces best rating flavours of cocoa France topped the chart in both the Rating and percent_of_cocoa These are the insights asked by the unit..

                                       ***End Of The Report***