Introduction

Coffee shops play an integral role in the lives of students, serving as spaces for studying, socializing, and relaxation. Despite the numerous local coffee shops around SJSU, many students prefer larger chains such as Starbucks. This study aims to understand the factors influencing student satisfaction across various coffee shops and provide actionable insights to help local coffee shops thrive.

Problem Statement

Local coffee shops near SJSU struggle to attract students, even though they offer unique, student-focused environments. Understanding why students underutilize these spaces is critical to helping local businesses improve their offerings and compete effectively with larger chains.

Objectives

The primary objective of this study is to: 1. Identify key factors that influence student satisfaction across coffee shops. 2. Provide actionable recommendations to local coffee shops to enhance their offerings and better meet student needs. 3. Help local businesses create a stronger connection with the SJSU student community.

Method(s)

This study uses a dataset collected from SJSU students, where satisfaction levels for four coffee shops (7Leaves, Break Time, Gong Cha, and Starbucks) are rated across four categories: - Ambiance - Drink Quality - Price - Customer Service

The analysis includes: 1. Descriptive statistics for each category. 2. Visualizations for individual category ratings and overall satisfaction. 3. ANOVA to test for differences across coffee shops. 4. Statistical assumptions testing (e.g., Levene’s test for equal variances).

Variables

Coffee_Shop: The name of the coffee shop (categorical).
Ambiance: Satisfaction rating for the shop’s atmosphere (numerical).
Drink_Quality: Satisfaction rating for beverage quality (numerical).
Price: Satisfaction rating for price fairness (numerical).
Customer_Service: Satisfaction rating for the friendliness and efficiency of staff (numerical).
Total_Average: The overall average of all satisfaction ratings.

Clean the Data

# Load necessary libraries
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.4.2

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.4.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)

## Warning: package 'tidyr' was built under R version 4.4.2

library(car)

## Warning: package 'car' was built under R version 4.4.2

## Loading required package: carData

## Warning: package 'carData' was built under R version 4.4.2

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

# Load dataset interactively
combined_survey_data <- read.csv(file.choose())

# Ensure Coffee_Shop is treated as a categorical variable
combined_survey_data$Coffee_Shop <- as.factor(combined_survey_data$Coffee_Shop)
cat("Ambiance Summary:\n")

## Ambiance Summary:

print(summary(combined_survey_data$Ambiance))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.904   4.000   5.000

cat("\nDrink Quality Summary:\n")

## 
## Drink Quality Summary:

print(summary(combined_survey_data$Drink_Quality))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00    3.00    2.95    4.00    5.00

cat("\nPrice Summary:\n")

## 
## Price Summary:

print(summary(combined_survey_data$Price))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.871   4.000   5.000

cat("\nCustomer Service Summary:\n")

## 
## Customer Service Summary:

print(summary(combined_survey_data$Customer_Service))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     2.0     3.0     2.9     4.0     5.0

long_data <- combined_survey_data %>%
  pivot_longer(cols = c("Ambiance", "Drink_Quality", "Price", "Customer_Service"),
               names_to = "Category", values_to = "Score")

ggplot(long_data, aes(x = Coffee_Shop, y = Score, fill = Coffee_Shop)) +
  geom_boxplot() +
  facet_wrap(~Category, scales = "free") +
  labs(
    title = "Satisfaction by Category and Coffee Shop",
    x = "Coffee Shop",
    y = "Satisfaction Score"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

# Calculate total average satisfaction
combined_survey_data <- combined_survey_data %>%
  rowwise() %>%
  mutate(Total_Average = mean(c(Ambiance, Drink_Quality, Price, Customer_Service), na.rm = TRUE))

# Plot total average satisfaction
ggplot(combined_survey_data, aes(x = Coffee_Shop, y = Total_Average, fill = Coffee_Shop)) +
  geom_boxplot() +
  labs(
    title = "Overall Average Satisfaction by Coffee Shop",
    x = "Coffee Shop",
    y = "Average Satisfaction Score"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

cat("\nConclusion on Independence:\n")

## 
## Conclusion on Independence:

anova_ambiance <- aov(Ambiance ~ Coffee_Shop, data = combined_survey_data)
anova_drink_quality <- aov(Drink_Quality ~ Coffee_Shop, data = combined_survey_data)
anova_price <- aov(Price ~ Coffee_Shop, data = combined_survey_data)
anova_customer_service <- aov(Customer_Service ~ Coffee_Shop, data = combined_survey_data)

anova_results <- list(
  Ambiance = summary(anova_ambiance)[[1]]["Pr(>F)"][1, 1],
  Drink_Quality = summary(anova_drink_quality)[[1]]["Pr(>F)"][1, 1],
  Price = summary(anova_price)[[1]]["Pr(>F)"][1, 1],
  Customer_Service = summary(anova_customer_service)[[1]]["Pr(>F)"][1, 1]
)

for (category in names(anova_results)) {
  p_value <- anova_results[[category]]
  cat(paste0(category, ": p-value = ", round(p_value, 4), " -> "))
  if (p_value < 0.05) {
    cat("Dependent (Reject independence assumption)\n")
  } else {
    cat("Independent (Fail to reject independence assumption)\n")
  }
}

## Ambiance: p-value = 0.6114 -> Independent (Fail to reject independence assumption)
## Drink_Quality: p-value = 0.2783 -> Independent (Fail to reject independence assumption)
## Price: p-value = 0.4258 -> Independent (Fail to reject independence assumption)
## Customer_Service: p-value = 0.0827 -> Independent (Fail to reject independence assumption)

cat("\nLevene's Test Results:\n")

## 
## Levene's Test Results:

levene_ambiance <- leveneTest(Ambiance ~ Coffee_Shop, data = combined_survey_data)
cat("Ambiance:\n")

## Ambiance:

print(levene_ambiance)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   3  0.3778 0.7691
##       236

levene_drink_quality <- leveneTest(Drink_Quality ~ Coffee_Shop, data = combined_survey_data)
cat("\nDrink Quality:\n")

## 
## Drink Quality:

print(levene_drink_quality)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   3  0.9598 0.4124
##       236

levene_price <- leveneTest(Price ~ Coffee_Shop, data = combined_survey_data)
cat("\nPrice:\n")

## 
## Price:

print(levene_price)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   3  0.4024 0.7514
##       236

levene_customer_service <- leveneTest(Customer_Service ~ Coffee_Shop, data = combined_survey_data)
cat("\nCustomer Service:\n")

## 
## Customer Service:

print(levene_customer_service)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   3  0.1734 0.9143
##       236

anova_total_average <- aov(Total_Average ~ Coffee_Shop, data = combined_survey_data)
cat("\nANOVA Results for Total Average Satisfaction Scores:\n")

## 
## ANOVA Results for Total Average Satisfaction Scores:

print(summary(anova_total_average))

##              Df Sum Sq Mean Sq F value Pr(>F)
## Coffee_Shop   3   1.17  0.3892    0.83  0.479
## Residuals   236 110.72  0.4692

anova_results_list <- list(
  Ambiance = aov(Ambiance ~ Coffee_Shop, data = combined_survey_data),
  Drink_Quality = aov(Drink_Quality ~ Coffee_Shop, data = combined_survey_data),
  Price = aov(Price ~ Coffee_Shop, data = combined_survey_data),
  Customer_Service = aov(Customer_Service ~ Coffee_Shop, data = combined_survey_data)
)

cat("\nANOVA Results for Individual Categories:\n")

## 
## ANOVA Results for Individual Categories:

for (category in names(anova_results_list)) {
  cat(category, ":\n")
  print(summary(anova_results_list[[category]]))
}

## Ambiance :
##              Df Sum Sq Mean Sq F value Pr(>F)
## Coffee_Shop   3    3.5   1.160   0.606  0.611
## Residuals   236  451.3   1.912               
## Drink_Quality :
##              Df Sum Sq Mean Sq F value Pr(>F)
## Coffee_Shop   3    7.9   2.622   1.291  0.278
## Residuals   236  479.5   2.032               
## Price :
##              Df Sum Sq Mean Sq F value Pr(>F)
## Coffee_Shop   3    5.4   1.815   0.932  0.426
## Residuals   236  459.5   1.947               
## Customer_Service :
##              Df Sum Sq Mean Sq F value Pr(>F)  
## Coffee_Shop   3   13.7   4.567   2.255 0.0827 .
## Residuals   236  477.9   2.025                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion & Recommendations Conclusion Based on the analysis, satisfaction levels across coffee shops are largely similar in the studied categories. This suggests that factors outside of ambiance, drink quality, price, and customer service may influence student preferences, such as location or convenience.

Recommendations Introduce loyalty programs: Encourage repeat visits by offering rewards for frequent purchases. Improve visibility: Partner with student organizations or host campus events to build awareness. Enhance unique offerings: Focus on features like WiFi reliability, exclusive study-friendly spaces, or creative menu options to stand out.

Brewed to Satisfaction: Analyzing Student Satisfaction in Local Coffee Shops

Group 3: Neo La, Mary Dorothy Alinabon, Xiao Ran Zhu, Adrian Cardenas, Ariq Manzur

2024-11-26