data <- read.csv("C:\\Users\\SHREYA\\OneDrive\\Documents\\Gitstuff\\modified_dataset.csv")
library(ggplot2)
data$deviation_cocoa_percent = abs(mean(data$cocoa_percent) - data$cocoa_percent)
head(data)
## ref company_manufacturer company_location review_date country_of_bean_origin
## 1 2454 5150 U.S.A. 2019 Tanzania
## 2 2458 5150 U.S.A. 2019 Dominican Republic
## 3 2454 5150 U.S.A. 2019 Madagascar
## 4 2542 5150 U.S.A. 2021 Fiji
## 5 2546 5150 U.S.A. 2021 Venezuela
## 6 2546 5150 U.S.A. 2021 Uganda
## specific_bean_origin_or_bar_name cocoa_percent ingredients
## 1 Kokoa Kamili, batch 1 0.76 3- B,S,C
## 2 Zorzal, batch 1 0.76 3- B,S,C
## 3 Bejofo Estate, batch 1 0.76 3- B,S,C
## 4 Matasawalevu, batch 1 0.68 3- B,S,C
## 5 Sur del Lago, batch 1 0.72 3- B,S,C
## 6 Semuliki Forest, batch 1 0.80 3- B,S,C
## most_memorable_characteristics rating deviation_cocoa_percent
## 1 rich cocoa, fatty, bready 3.25 0.043602767
## 2 cocoa, vegetal, savory 3.50 0.043602767
## 3 cocoa, blackberry, full body 3.75 0.043602767
## 4 chewy, off, rubbery 3.00 0.036397233
## 5 fatty, earthy, moss, nutty,chalky 3.00 0.003602767
## 6 mildly bitter, basic cocoa, fatty 3.25 0.083602767
data$deviation_rating = abs(mean(data$rating) - data$rating)
head(data)
## ref company_manufacturer company_location review_date country_of_bean_origin
## 1 2454 5150 U.S.A. 2019 Tanzania
## 2 2458 5150 U.S.A. 2019 Dominican Republic
## 3 2454 5150 U.S.A. 2019 Madagascar
## 4 2542 5150 U.S.A. 2021 Fiji
## 5 2546 5150 U.S.A. 2021 Venezuela
## 6 2546 5150 U.S.A. 2021 Uganda
## specific_bean_origin_or_bar_name cocoa_percent ingredients
## 1 Kokoa Kamili, batch 1 0.76 3- B,S,C
## 2 Zorzal, batch 1 0.76 3- B,S,C
## 3 Bejofo Estate, batch 1 0.76 3- B,S,C
## 4 Matasawalevu, batch 1 0.68 3- B,S,C
## 5 Sur del Lago, batch 1 0.72 3- B,S,C
## 6 Semuliki Forest, batch 1 0.80 3- B,S,C
## most_memorable_characteristics rating deviation_cocoa_percent
## 1 rich cocoa, fatty, bready 3.25 0.043602767
## 2 cocoa, vegetal, savory 3.50 0.043602767
## 3 cocoa, blackberry, full body 3.75 0.043602767
## 4 chewy, off, rubbery 3.00 0.036397233
## 5 fatty, earthy, moss, nutty,chalky 3.00 0.003602767
## 6 mildly bitter, basic cocoa, fatty 3.25 0.083602767
## deviation_rating
## 1 0.05365613
## 2 0.30365613
## 3 0.55365613
## 4 0.19634387
## 5 0.19634387
## 6 0.05365613
Deviation of cocoa_percent: The mean deviation of cocoa_percent is approximately 0.0436. This suggests that, on average, the cocoa_percent values in your dataset deviate by about 0.0436 from their mean. Further questions:
Are there specific factors or patterns that contribute to these deviations?
Does this deviation impact the overall quality or characteristics of the chocolate products?
Deviation of rating: The mean deviation of rating is approximately 0.2490. This suggests that, on average, the rating values in your dataset deviate by about 0.2490 from their mean. Further questions:
What factors influence the deviation of ratings?
Are there certain types of chocolates or origins that tend to have higher or lower deviations in ratings?

#### Plot a visualization for each relationship
# Scatter plot for deviation_cocoa_percent and cocoa_percent
ggplot(data, aes(x = deviation_cocoa_percent, y =cocoa_percent)) +
geom_point() +
labs(title = "Relationship between Deviation of Cocoa Percentage and Cocoa Percentage",
x = "Deviation of Cocoa Percentage ",
y = "Cocoa Percentage")
# Scatter plot for deviation_rating and rating
ggplot(data, aes(x = deviation_rating, y = rating)) +
geom_point() +
labs(title = "Relationship between Deviation of Rating and Rating",
x = "Deviation of Rating from Mean",
y = "Rating")
correlation coefficient
cor(data$deviation_cocoa_percent, data$cocoa_percent)
## [1] 0.3233046
Insights
The data indicates a weak positive correlation between the deviation of cocoa percentage and the actual cocoa percentage. This suggests that as the actual cocoa percentage increases, the deviation from the mean cocoa percentage also tends to increase slightly.
The weak positive correlation implies that there is some variability in the deviation of cocoa percentage across different cocoa percentage values, but the relationship is not very strong.
Significance
Understanding this relationship can help chocolate manufacturers assess the consistency of cocoa percentage in their products and potentially identify areas for improvement in their production processes.
Further Questions:
cor(data$deviation_rating, data$rating)
## [1] -0.2213079
Insights
The data reveals a weak negative correlation between the deviation of the rating and the actual rating. This suggests that as the actual rating increases, the deviation from the mean rating tends to decrease slightly, and conversely, as the actual rating decreases, the deviation tends to increase slightly.
The weak negative correlation indicates that while there is some variability in the deviation of ratings across different rating values, the relationship is not very strong.
Significance
Understanding this relationship can help in evaluating the consistency of ratings given to different chocolate products, which is valuable for both consumers and producers seeking to understand product quality.
Further Questions
Confidence Intervals
# Confidence interval for deviation_rating
rating_ci <- t.test(data$deviation_rating)$conf.int
rating_ci
## [1] 0.3474276 0.3680957
## attr(,"conf.level")
## [1] 0.95
Insights
The confidence interval provides a range within which we are 95% confident that the true population mean deviation of rating lies. In this case, the interval suggests that, on average, the deviation of ratings for the chocolate products in the dataset falls between approximately 0.347 and 0.368.
Significance
Understanding the precision of the rating deviation estimates is crucial for assessing the variability in ratings given to different chocolate products in the dataset.
Further Questions:
cocoa_percent_ci <- t.test(data$deviation_cocoa_percent)$conf.int
cocoa_percent_ci
## [1] 0.03490229 0.03822611
## attr(,"conf.level")
## [1] 0.95
Insights
The confidence interval provides a range within which we are 95% confident that the true population mean deviation of cocoa percentage lies. In this case, the interval suggests that, on average, the deviation of cocoa percentage for the chocolate products in the dataset falls between approximately 0.035 and 0.038.
Significance
Understanding the precision of the cocoa percentage deviation estimates is crucial for assessing the variability in cocoa content across different chocolate products in the dataset.
Further Questions