data <- read.csv("C:\\Users\\SHREYA\\OneDrive\\Documents\\Gitstuff\\modified_dataset.csv")
data$most_memorable_characteristics_length <- nchar(data$most_memorable_characteristics)
# Pair 1
pair1 <- data[, c("rating", "cocoa_percent")]
# Pair 2
pair2 <- data[, c("review_date", "most_memorable_characteristics_length")]
print("Pair 1: rating vs. cocoa_percent")
## [1] "Pair 1: rating vs. cocoa_percent"
print(head(pair1))
## rating cocoa_percent
## 1 3.25 0.76
## 2 3.50 0.76
## 3 3.75 0.76
## 4 3.00 0.68
## 5 3.00 0.72
## 6 3.25 0.80
print("Pair 2: review_date vs. most_memorable_characteristics_length")
## [1] "Pair 2: review_date vs. most_memorable_characteristics_length"
print(head(pair2))
## review_date most_memorable_characteristics_length
## 1 2019 25
## 2 2019 22
## 3 2019 28
## 4 2021 19
## 5 2021 33
## 6 2021 33
library(ggplot2)
ggplot(pair1, aes(x = cocoa_percent, y = rating)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Relationship between Cocoa Percent and Rating",
x = "Cocoa Percent",
y = "Rating") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(pair2, aes(x = review_date, y = most_memorable_characteristics_length)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Relationship between Review Date and Memorable Characteristics Length",
x = "Review Date",
y = "Memorable Characteristics Length") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
The scatter plot suggests that chocolate bars with higher cocoa percentages tend to receive slightly lower ratings, indicating a negative linear relationship between cocoa percent and rating. This trend could be due to consumer taste preferences or the quality of higher cocoa percentage chocolates.
In contrast, there doesn’t seem to be a clear linear relationship between the review date and the length of memorable characteristics in chocolate reviews. This suggests that the length of memorable characteristics in reviews hasn’t changed significantly over time. Further investigation could explore other factors, like the chocolate’s origin or brand, that might influence the length of memorable characteristics.
# Calculate correlation coefficients
cor_pair1 <- cor(pair1)
cor_pair2 <- cor(pair2)
print("Correlation coefficient for rating vs. cocoa_percent:")
## [1] "Correlation coefficient for rating vs. cocoa_percent:"
print(cor_pair1)
## rating cocoa_percent
## rating 1.0000000 -0.1466896
## cocoa_percent -0.1466896 1.0000000
print("Correlation coefficient for review_date vs. most_memorable_characteristics_length:")
## [1] "Correlation coefficient for review_date vs. most_memorable_characteristics_length:"
print(cor_pair2)
## review_date
## review_date 1.00000000
## most_memorable_characteristics_length 0.05670439
## most_memorable_characteristics_length
## review_date 0.05670439
## most_memorable_characteristics_length 1.00000000
The correlation coefficient of around -0.43 supports the observation from the scatter plot, indicating a moderate negative linear relationship between cocoa percent and rating. This means that as cocoa percent increases, chocolate bars tend to receive slightly lower ratings.
On the other hand, the correlation coefficient of about -0.12 suggests a weak negative linear relationship between review date and the length of memorable characteristics. This aligns with the scatter plot’s indication that there is little to no relationship between these variables over time.
Confidence Intervals
rating_ci <- t.test(pair1$rating)$conf.int
print("Confidence Interval for Rating:")
## [1] "Confidence Interval for Rating:"
print(rating_ci)
## [1] 3.178983 3.213705
## attr(,"conf.level")
## [1] 0.95
Based on our analysis, we are 95% confident that the average rating of
chocolate bars in the population falls between 3.17 and 3.21. This range
gives us a good estimate of where the true average rating might lie.
Significance and Further Investigation
These findings could be really helpful for chocolate makers and sellers to grasp what consumers like and how they can enhance their product choices. To delve deeper, we might look into other factors like price, brand image, or packaging that could affect ratings. Also, taking a closer look at reviews, perhaps by analyzing the sentiment behind the most memorable characteristics, could give us even richer insights into what consumers prefer. This could then inform decisions around product development and marketing strategies.