Research Scenario 1

A café owner thinks if she can get customers to stay in her café longer, the customers will make more purchases. She plans to make the café more comfortable (add couches, more electrical outlets for laptops, etc) so customers stay longer. Before she makes this investment, the owner wants to check if her belief is true. She buys an AI software to collect information from her cash register and cameras to determine how long each customer stayed in the café and how many drinks they buy. Analyze the data to determine whether there is a relationship between time spent (minutes) in the shop and number of drinks purchased. Use the appropriate test to see if longer visits are associated with higher spending.

Hypotheses

Null Hypothesis (H0): There is no relationship between time spent in the café and number of drinks purchased.

Alternative Hypothesis (H1): There is a positive relationship between time spent in the café and number of drinks purchased.

Loading Required Packages

# Install packages if not already installed
# install.packages(c("readxl", "psych", "ggplot2", "ggpubr", "rmarkdown"))

# Load required packages
library(readxl)
library(psych)
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)

Importing and Preparing Data

# Import the Excel file
A5RQ1 <- read_excel("C:/Users/saisa/Downloads/A5RQ1.xlsx")

Descriptive Statistics

# Calculate descriptive statistics
describe(A5RQ1[, c("Minutes", "Drinks")])
##         vars   n  mean    sd median trimmed   mad min   max range skew kurtosis
## Minutes    1 461 29.89 18.63   24.4   26.99 15.12  10 154.2 144.2 1.79     5.20
## Drinks     2 461  3.00  1.95    3.0    2.75  1.48   0  17.0  17.0 1.78     6.46
##           se
## Minutes 0.87
## Drinks  0.09

Checking Data Normality

Histograms

# Create histogram for Minutes
hist(A5RQ1$Minutes,
     main = "Histogram of Minutes Spent in Café",
     xlab = "Minutes",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

# Create histogram for Drinks
hist(A5RQ1$Drinks,
     main = "Histogram of Drinks Purchased",
     xlab = "Number of Drinks",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

Shapiro-Wilk Normality Tests

# Conduct Shapiro-Wilk tests
shapiro_minutes <- shapiro.test(A5RQ1$Minutes)
shapiro_drinks <- shapiro.test(A5RQ1$Drinks)

# Display results
shapiro_minutes
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ1$Minutes
## W = 0.84706, p-value < 2.2e-16
shapiro_drinks
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ1$Drinks
## W = 0.85487, p-value < 2.2e-16

Normality Test Results: - Minutes: W = 0.847, p < .001 → NOT normally distributed - Drinks: W = 0.855, p < .001 → NOT normally distributed

Decision: Since both variables are not normally distributed, we will use Spearman Correlation instead of Pearson Correlation.

Visualizing the Relationship

# Create scatterplot with Spearman correlation
ggscatter(A5RQ1, x = "Minutes", y = "Drinks",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "spearman",
          xlab = "Minutes in Café", 
          ylab = "Number of Drinks Purchased",
          title = "Relationship between Time Spent and Drinks Purchased")

Scatterplot Observation: The relationship is positive (line pointing upward) - as minutes increase, drinks purchased also increase.

Spearman Correlation Test

# Conduct Spearman correlation test
spearman_result <- cor.test(A5RQ1$Minutes, A5RQ1$Drinks, method = "spearman")
## Warning in cor.test.default(A5RQ1$Minutes, A5RQ1$Drinks, method = "spearman"):
## Cannot compute exact p-value with ties
spearman_result
## 
##  Spearman's rank correlation rho
## 
## data:  A5RQ1$Minutes and A5RQ1$Drinks
## S = 1305608, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9200417

Results Summary

Statistical Results: - Spearman’s rho (ρ): 0.920

  • p-value < .001

  • Sample size: n = 461

Effect Size Interpretation:

  • Direction: Positive correlation

  • Strength: Very strong (ρ = 0.92 > 0.50 threshold for “strong” relationship)

Answers to Required Questions

Histogram Assessment

Q1) Minutes skewness: Positively skewed
Q2) Minutes kurtosis: Too tall (leptokurtic)
Q3) Drinks skewness: Positively skewed
Q4) Drinks kurtosis: Too tall (leptokurtic)

Normality Questions

Was the data normally distributed for Minutes? No (W = 0.847, p < .001)
Was the data normally distributed for Drinks? No (W = 0.855, p < .001)

Scatterplot Question

Is the relationship positive, negative, or no relationship? Positive (line pointing up)

Effect Size Questions

Q1) Direction of effect: Positive - as minutes increase, drinks increase
Q2) Size of effect: Very strong (ρ = 0.92)

Final Written Report

A Spearman correlation was conducted to assess the relationship between time spent in the café and number of drinks purchased (n = 461). Both variables were not normally distributed (Minutes: W = 0.847, p-value < 0.01; Drinks: W = 0.855, p-value < 0.01). There was a statistically significant correlation between time spent (M = 29.89, SD = 18.63) and drinks purchased (M = 3.00, SD = 1.95). The correlation was positive and very strong, rho = 0.92, p-value < 0.01. As time spent in the café increases, the number of drinks purchased also increases substantially.