data <- read.csv("C:\\Users\\SHREYA\\OneDrive\\Documents\\Gitstuff\\modified_dataset.csv")
library(ggplot2)
data$deviation_cocoa_percent = abs(mean(data$cocoa_percent) - data$cocoa_percent)
head(data)
##    ref company_manufacturer company_location review_date country_of_bean_origin
## 1 2454                 5150           U.S.A.        2019               Tanzania
## 2 2458                 5150           U.S.A.        2019     Dominican Republic
## 3 2454                 5150           U.S.A.        2019             Madagascar
## 4 2542                 5150           U.S.A.        2021                   Fiji
## 5 2546                 5150           U.S.A.        2021              Venezuela
## 6 2546                 5150           U.S.A.        2021                 Uganda
##   specific_bean_origin_or_bar_name cocoa_percent ingredients
## 1            Kokoa Kamili, batch 1          0.76    3- B,S,C
## 2                  Zorzal, batch 1          0.76    3- B,S,C
## 3           Bejofo Estate, batch 1          0.76    3- B,S,C
## 4            Matasawalevu, batch 1          0.68    3- B,S,C
## 5            Sur del Lago, batch 1          0.72    3- B,S,C
## 6         Semuliki Forest, batch 1          0.80    3- B,S,C
##      most_memorable_characteristics rating deviation_cocoa_percent
## 1         rich cocoa, fatty, bready   3.25             0.043602767
## 2            cocoa, vegetal, savory   3.50             0.043602767
## 3      cocoa, blackberry, full body   3.75             0.043602767
## 4               chewy, off, rubbery   3.00             0.036397233
## 5 fatty, earthy, moss, nutty,chalky   3.00             0.003602767
## 6 mildly bitter, basic cocoa, fatty   3.25             0.083602767
data$deviation_rating = abs(mean(data$rating) - data$rating)
head(data)
##    ref company_manufacturer company_location review_date country_of_bean_origin
## 1 2454                 5150           U.S.A.        2019               Tanzania
## 2 2458                 5150           U.S.A.        2019     Dominican Republic
## 3 2454                 5150           U.S.A.        2019             Madagascar
## 4 2542                 5150           U.S.A.        2021                   Fiji
## 5 2546                 5150           U.S.A.        2021              Venezuela
## 6 2546                 5150           U.S.A.        2021                 Uganda
##   specific_bean_origin_or_bar_name cocoa_percent ingredients
## 1            Kokoa Kamili, batch 1          0.76    3- B,S,C
## 2                  Zorzal, batch 1          0.76    3- B,S,C
## 3           Bejofo Estate, batch 1          0.76    3- B,S,C
## 4            Matasawalevu, batch 1          0.68    3- B,S,C
## 5            Sur del Lago, batch 1          0.72    3- B,S,C
## 6         Semuliki Forest, batch 1          0.80    3- B,S,C
##      most_memorable_characteristics rating deviation_cocoa_percent
## 1         rich cocoa, fatty, bready   3.25             0.043602767
## 2            cocoa, vegetal, savory   3.50             0.043602767
## 3      cocoa, blackberry, full body   3.75             0.043602767
## 4               chewy, off, rubbery   3.00             0.036397233
## 5 fatty, earthy, moss, nutty,chalky   3.00             0.003602767
## 6 mildly bitter, basic cocoa, fatty   3.25             0.083602767
##   deviation_rating
## 1       0.05365613
## 2       0.30365613
## 3       0.55365613
## 4       0.19634387
## 5       0.19634387
## 6       0.05365613

Deviation of cocoa_percent: The mean deviation of cocoa_percent is approximately 0.0436. This suggests that, on average, the cocoa_percent values in your dataset deviate by about 0.0436 from their mean. Further questions:

Deviation of rating: The mean deviation of rating is approximately 0.2490. This suggests that, on average, the rating values in your dataset deviate by about 0.2490 from their mean. Further questions:

![](images/clipboard-1362719507.png)

#### Plot a visualization for each relationship
# Scatter plot for deviation_cocoa_percent and  cocoa_percent
ggplot(data, aes(x = deviation_cocoa_percent, y =cocoa_percent)) +
  geom_point() +
  labs(title = "Relationship between Deviation of Cocoa Percentage and  Cocoa Percentage",
       x = "Deviation of Cocoa Percentage ",
       y = "Cocoa Percentage")

# Scatter plot for deviation_rating and rating
ggplot(data, aes(x = deviation_rating, y = rating)) +
  geom_point() +
  labs(title = "Relationship between Deviation of Rating and Rating",
       x = "Deviation of Rating from Mean",
       y = "Rating")

correlation coefficient

cor(data$deviation_cocoa_percent, data$cocoa_percent)
## [1] 0.3233046
cor(data$deviation_rating, data$rating)
## [1] -0.2213079

Insights

The data reveals a weak negative correlation between the deviation of the rating and the actual rating. This suggests that as the actual rating increases, the deviation from the mean rating tends to decrease slightly, and conversely, as the actual rating decreases, the deviation tends to increase slightly.

The weak negative correlation indicates that while there is some variability in the deviation of ratings across different rating values, the relationship is not very strong.

Significance

Understanding this relationship can help in evaluating the consistency of ratings given to different chocolate products, which is valuable for both consumers and producers seeking to understand product quality.

Further Questions

  1. What factors contribute to the variability in rating deviation?
  2. Are there specific types of chocolates or characteristics that tend to have higher or lower rating deviations, and if so, what are they?

Confidence Intervals

# Confidence interval for deviation_rating
rating_ci <- t.test(data$deviation_rating)$conf.int
rating_ci
## [1] 0.3474276 0.3680957
## attr(,"conf.level")
## [1] 0.95

Insights

The confidence interval provides a range within which we are 95% confident that the true population mean deviation of rating lies. In this case, the interval suggests that, on average, the deviation of ratings for the chocolate products in the dataset falls between approximately 0.347 and 0.368.

Significance

Understanding the precision of the rating deviation estimates is crucial for assessing the variability in ratings given to different chocolate products in the dataset.

Further Questions:

  1. How do variations in ratings impact consumer perceptions of chocolate products?
  2. Are there specific factors that contribute to higher or lower deviations in ratings, and if so, what are they?
cocoa_percent_ci <- t.test(data$deviation_cocoa_percent)$conf.int
cocoa_percent_ci
## [1] 0.03490229 0.03822611
## attr(,"conf.level")
## [1] 0.95

Insights

The confidence interval provides a range within which we are 95% confident that the true population mean deviation of cocoa percentage lies. In this case, the interval suggests that, on average, the deviation of cocoa percentage for the chocolate products in the dataset falls between approximately 0.035 and 0.038.

Significance

Understanding the precision of the cocoa percentage deviation estimates is crucial for assessing the variability in cocoa content across different chocolate products in the dataset.

Further Questions

  1. How do variations in cocoa percentage impact the flavor and quality of chocolate products?
  2. Are there specific factors that contribute to higher or lower deviations in cocoa percentage?
  3. How does the deviation of cocoa percentage affect consumer preferences for chocolate products?