Coffee shops play an integral role in the lives of students, serving as spaces for studying, socializing, and relaxation. Despite the numerous local coffee shops around SJSU, many students prefer larger chains such as Starbucks. This study aims to understand the factors influencing student satisfaction across various coffee shops and provide actionable insights to help local coffee shops thrive.
Local coffee shops near SJSU struggle to attract students, even though they offer unique, student-focused environments. Understanding why students underutilize these spaces is critical to helping local businesses improve their offerings and compete effectively with larger chains.
The primary objective of this study is to: 1. Identify key factors that influence student satisfaction across coffee shops. 2. Provide actionable recommendations to local coffee shops to enhance their offerings and better meet student needs. 3. Help local businesses create a stronger connection with the SJSU student community.
This study uses a dataset collected from SJSU students, where satisfaction levels for four coffee shops (7Leaves, Break Time, Gong Cha, and Starbucks) are rated across four categories: - Ambiance - Drink Quality - Price - Customer Service
The analysis includes: 1. Descriptive statistics for each category. 2. Visualizations for individual category ratings and overall satisfaction. 3. Independence testing (ANOVA for each category). 4. Statistical assumptions testing (e.g., Levene’s test for equal variances). 5. ANOVA analysis to test for significant differences.
# Load necessary libraries
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.4.2
library(car)
## Warning: package 'car' was built under R version 4.4.2
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.4.2
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
# Load dataset interactively
combined_survey_data <- read.csv(file.choose())
# Ensure Coffee_Shop is treated as a categorical variable
combined_survey_data$Coffee_Shop <- as.factor(combined_survey_data$Coffee_Shop)
cat("Ambiance Summary:\n")
## Ambiance Summary:
print(summary(combined_survey_data$Ambiance))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 2.904 4.000 5.000
cat("\nDrink Quality Summary:\n")
##
## Drink Quality Summary:
print(summary(combined_survey_data$Drink_Quality))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 3.00 2.95 4.00 5.00
cat("\nPrice Summary:\n")
##
## Price Summary:
print(summary(combined_survey_data$Price))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 2.871 4.000 5.000
cat("\nCustomer Service Summary:\n")
##
## Customer Service Summary:
print(summary(combined_survey_data$Customer_Service))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 2.0 3.0 2.9 4.0 5.0
long_data <- combined_survey_data %>%
pivot_longer(cols = c("Ambiance", "Drink_Quality", "Price", "Customer_Service"),
names_to = "Category", values_to = "Score")
ggplot(long_data, aes(x = Coffee_Shop, y = Score, fill = Coffee_Shop)) +
geom_boxplot() +
facet_wrap(~Category, scales = "free") +
labs(
title = "Satisfaction by Category and Coffee Shop",
x = "Coffee Shop",
y = "Satisfaction Score"
) +
theme_minimal() +
theme(legend.position = "none")
# Calculate total average satisfaction
combined_survey_data <- combined_survey_data %>%
rowwise() %>%
mutate(Total_Average = mean(c(Ambiance, Drink_Quality, Price, Customer_Service), na.rm = TRUE))
# Plot total average satisfaction
ggplot(combined_survey_data, aes(x = Coffee_Shop, y = Total_Average, fill = Coffee_Shop)) +
geom_boxplot() +
labs(
title = "Overall Average Satisfaction by Coffee Shop",
x = "Coffee Shop",
y = "Average Satisfaction Score"
) +
theme_minimal() +
theme(legend.position = "none")
Interpretation The boxplot shows that overall satisfaction scores are
similar for all coffee shops. The medians (middle lines in the boxes)
are close, meaning satisfaction levels are consistent. Some coffee shops
have more spread (taller boxes), showing more varied customer opinions.
There are few outliers, so most responses are typical.
This suggests no single shop is much better or worse in overall satisfaction. Other factors, like location or convenience, might influence student preferences.
anova_ambiance <- aov(Ambiance ~ Coffee_Shop, data = combined_survey_data)
anova_drink_quality <- aov(Drink_Quality ~ Coffee_Shop, data = combined_survey_data)
anova_price <- aov(Price ~ Coffee_Shop, data = combined_survey_data)
anova_customer_service <- aov(Customer_Service ~ Coffee_Shop, data = combined_survey_data)
anova_results <- list(
Ambiance = summary(anova_ambiance)[[1]]["Pr(>F)"][1, 1],
Drink_Quality = summary(anova_drink_quality)[[1]]["Pr(>F)"][1, 1],
Price = summary(anova_price)[[1]]["Pr(>F)"][1, 1],
Customer_Service = summary(anova_customer_service)[[1]]["Pr(>F)"][1, 1]
)
for (category in names(anova_results)) {
p_value <- anova_results[[category]]
cat(paste0(category, ": p-value = ", round(p_value, 4), " -> "))
if (p_value < 0.05) {
cat("Dependent (Reject independence assumption)\n")
} else {
cat("Independent (Fail to reject independence assumption)\n")
}
}
## Ambiance: p-value = 0.6114 -> Independent (Fail to reject independence assumption)
## Drink_Quality: p-value = 0.2783 -> Independent (Fail to reject independence assumption)
## Price: p-value = 0.4258 -> Independent (Fail to reject independence assumption)
## Customer_Service: p-value = 0.0827 -> Independent (Fail to reject independence assumption)
cat("\nLevene's Test Results:\n")
##
## Levene's Test Results:
levene_ambiance <- leveneTest(Ambiance ~ Coffee_Shop, data = combined_survey_data)
cat("Ambiance:\n")
## Ambiance:
print(levene_ambiance)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.3778 0.7691
## 236
levene_drink_quality <- leveneTest(Drink_Quality ~ Coffee_Shop, data = combined_survey_data)
cat("\nDrink Quality:\n")
##
## Drink Quality:
print(levene_drink_quality)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.9598 0.4124
## 236
levene_price <- leveneTest(Price ~ Coffee_Shop, data = combined_survey_data)
cat("\nPrice:\n")
##
## Price:
print(levene_price)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.4024 0.7514
## 236
levene_customer_service <- leveneTest(Customer_Service ~ Coffee_Shop, data = combined_survey_data)
cat("\nCustomer Service:\n")
##
## Customer Service:
print(levene_customer_service)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.1734 0.9143
## 236
anova_results_list <- list(
Ambiance = aov(Ambiance ~ Coffee_Shop, data = combined_survey_data),
Drink_Quality = aov(Drink_Quality ~ Coffee_Shop, data = combined_survey_data),
Price = aov(Price ~ Coffee_Shop, data = combined_survey_data),
Customer_Service = aov(Customer_Service ~ Coffee_Shop, data = combined_survey_data)
)
cat("\nANOVA Results for Individual Categories:\n")
##
## ANOVA Results for Individual Categories:
for (category in names(anova_results_list)) {
cat(category, ":\n")
print(summary(anova_results_list[[category]]))
}
## Ambiance :
## Df Sum Sq Mean Sq F value Pr(>F)
## Coffee_Shop 3 3.5 1.160 0.606 0.611
## Residuals 236 451.3 1.912
## Drink_Quality :
## Df Sum Sq Mean Sq F value Pr(>F)
## Coffee_Shop 3 7.9 2.622 1.291 0.278
## Residuals 236 479.5 2.032
## Price :
## Df Sum Sq Mean Sq F value Pr(>F)
## Coffee_Shop 3 5.4 1.815 0.932 0.426
## Residuals 236 459.5 1.947
## Customer_Service :
## Df Sum Sq Mean Sq F value Pr(>F)
## Coffee_Shop 3 13.7 4.567 2.255 0.0827 .
## Residuals 236 477.9 2.025
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_total_average <- aov(Total_Average ~ Coffee_Shop, data = combined_survey_data)
cat("\nANOVA Results for Total Average Satisfaction Scores:\n")
##
## ANOVA Results for Total Average Satisfaction Scores:
print(summary(anova_total_average))
## Df Sum Sq Mean Sq F value Pr(>F)
## Coffee_Shop 3 1.17 0.3892 0.83 0.479
## Residuals 236 110.72 0.4692
Conclusion Based on the analysis, satisfaction levels across coffee shops are largely similar in the studied categories. This suggests that factors outside of ambiance, drink quality, price, and customer service may influence student preferences, such as location or convenience.
Recommendations Introduce loyalty programs: Encourage repeat visits by offering rewards for frequent purchases.
Use social media: Leverage platforms like Instagram and TikTok to attract more students and promote unique offerings.
Enhance unique features: Focus on WiFi reliability, exclusive study-friendly spaces, or creative menu options to stand out.