Research Scenario 1

A café owner thinks if she can get customers to stay in her café longer, the customers will make more purchases. She plans to make the café more comfortable (add couches, more electrical outlets for laptops, etc) so customers stay longer. Before she makes this investment, the owner wants to check if her belief is true. She buys an AI software to collect information from her cash register and cameras to determine how long each customer stayed in the café and how many drinks they buy. Analyze the data to determine whether there is a relationship between time spent (minutes) in the shop and number of drinks purchased. Use the appropriate test to see if longer visits are associated with higher spending.


Hypotheses

Null Hypothesis (H₀)

There is no relationship between time spent in the café and the number of drinks purchased.

Alternate Hypothesis (H₁)

There is a relationship between time spent in the café and the number of drinks purchased.


Statistical Test

A Spearman correlation was used because one or both variables were not normally distributed based on the Shapiro–Wilk normality tests.


Results Paragraph

Descriptive statistics showed that Minutes (M = 29.89, SD = 18.63) and Drinks (M = 3.00, SD = 1.95) both displayed considerable variability and positive skewness. Histograms indicated that neither variable followed a symmetric distribution. The Shapiro–Wilk normality tests confirmed significant deviations from normality for both Minutes (W = 0.847, p < 2.2e-16) and Drinks (W = 0.855, p < 2.2e-16), indicating that the data were not normally distributed.

A scatterplot with a fitted regression line showed a strong positive association between Minutes and Drinks. The Pearson correlation coefficient was very high, r = 0.954, 95% CI [0.945, 0.962], and statistically significant (p < 2.2e-16), indicating a strong linear relationship. Because the normality assumption was violated, a non-parametric Spearman correlation was also conducted. The Spearman rank-order correlation showed a similarly strong and significant relationship, rho = 0.920 (p < 2.2e-16).

Overall, the results indicate a very strong positive association between Minutes spent and the number of Drinks, supported by both parametric and non-parametric correlation analyses.


R Code

# ============================================================
# IMPORT DATASET
# ============================================================

# Load the readxl package to import Excel files
library(readxl)

# Import your dataset 
dataset <- read_excel("C:/Users/Nithin Kumar Adki/Downloads/A5RQ1.xlsx")


# ============================================================
# DESCRIPTIVE STATISTICS
# ============================================================

# Install the psych package 
# install.packages("psych")

# Load the psych package to compute descriptive statistics
library(psych)

# Calculate descriptive statistics for both variables
describe(dataset[, c("Minutes", "Drinks")])
##         vars   n  mean    sd median trimmed   mad min   max range skew kurtosis
## Minutes    1 461 29.89 18.63   24.4   26.99 15.12  10 154.2 144.2 1.79     5.20
## Drinks     2 461  3.00  1.95    3.0    2.75  1.48   0  17.0  17.0 1.78     6.46
##           se
## Minutes 0.87
## Drinks  0.09
# ============================================================
# HISTOGRAMS TO CHECK NORMALITY VISUALLY
# ============================================================

# Create histogram for Minutes (Variable 1)
hist(dataset$Minutes,
     main = "Histogram of Minutes",
     xlab = "Minutes Spent in Café",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

# Create histogram for Drinks (Variable 2)
hist(dataset$Drinks,
     main = "Histogram of Drinks Purchased",
     xlab = "Number of Drinks",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

# ============================================================
# SHAPIRO-WILK NORMALITY TESTS
# ============================================================

# Test normality for Minutes variable
shapiro.test(dataset$Minutes)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$Minutes
## W = 0.84706, p-value < 2.2e-16
# Test normality for Drinks variable
shapiro.test(dataset$Drinks)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$Drinks
## W = 0.85487, p-value < 2.2e-16
# ============================================================
# SCATTERPLOT
# ============================================================

# Install required packages for scatterplot 
# install.packages("ggplot2")
# install.packages("ggpubr")

# Load the plotting packages
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)

# Create a scatterplot with a regression line
ggscatter(dataset, x = "Minutes", y = "Drinks",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "spearman",   
          xlab = "Minutes Spent in Café",
          ylab = "Number of Drinks Purchased")

# ============================================================
# CORRELATION TEST (SPEARMAN)
# ============================================================

# Since Shapiro-Wilk showed NON-NORMAL results, we use Spearman
cor.test(dataset$Minutes, dataset$Drinks, method = "spearman")
## Warning in cor.test.default(dataset$Minutes, dataset$Drinks, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  dataset$Minutes and dataset$Drinks
## S = 1305608, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9200417