H1: There is a relationship between the Amount of time customers
stay in store with the Number of drinks they purchase
# ======================
# IMPORT EXCEL FILE CODE
# ======================
library(readxl)
A5RQ1 <- read_excel("D:/20251021 AA 5221 Applied Analytics & Methods 1/Week 5/A5RQ1.xlsx")
# ======================
# DESCRIPTIVE STATISTICS
# ======================
# Calculate the mean, median, SD, and sample size for each variable.
library(psych)
describe(A5RQ1[, c("Minutes", "Drinks")])
## vars n mean sd median trimmed mad min max range skew kurtosis
## Minutes 1 461 29.89 18.63 24.4 26.99 15.12 10 154.2 144.2 1.79 5.20
## Drinks 2 461 3.00 1.95 3.0 2.75 1.48 0 17.0 17.0 1.78 6.46
## se
## Minutes 0.87
## Drinks 0.09
# ===============================================
# CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES
# ===============================================
# CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE
hist(A5RQ1$Minutes,
main = "Histogram of Amount of time stay in store - Minutes",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(A5RQ1$Drinks,
main = "Histogram of NUmber of Drinks",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

### Comments:
### Q1) Var 1 histogram is not symmetrical, it is positively skewed
### Q2) Var 1 histogram Kurtosis look tall
### Q3) Var 2 histogram is not symmetrical, it is positively skewed
### Q4) Var 2 histogram Kurtosis look tall
# =============================
# CONDUCT THE SHAPIRO-WILK TEST
# =============================
# PURPOSE
# Use a statistical test to check the normality of the continuous variables.
# The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.
# The test is checking "Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?"
# For this test, if p is GREATER than .05 (p > .05), the data is NORMAL.
# If p is LESS than .05 (p < .05), the data is NOT normal.
shapiro.test(A5RQ1$Minutes)
##
## Shapiro-Wilk normality test
##
## data: A5RQ1$Minutes
## W = 0.84706, p-value < 2.2e-16
shapiro.test(A5RQ1$Drinks)
##
## Shapiro-Wilk normality test
##
## data: A5RQ1$Drinks
## W = 0.85487, p-value < 2.2e-16
# COMMENTS
# The data of the amount of time customer spends inside the store and the number of drinks they purchase are ABNORMAL. Use Spearman Correlation Test
# =========================
# VISUALLY DISPLAY THE DATA
# =========================
# CREATE A SCATTERPLOT, Method = "spearman"
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(ggpubr)
ggscatter(A5RQ1, x = "Minutes", y = "Drinks",
add = "reg.line",
conf.int = TRUE,
cor.coef = TRUE,
cor.method = "spearman",
xlab = "Minutes", ylab = "Drinks")

# COMMENT:
# There is a POSITIVE relationship (line pointing up) between the two variables
# ================================================
# PEARSON CORRELATION OR SPEARMAN CORRELATION TEST
# ================================================
# PURPOSE
# Check if the means of the two groups are different.
# CONDUCT THE SPEARMAN CORRELATION TEST FOR THIS SCENARIO
cor.test(A5RQ1$Minutes, A5RQ1$Drinks, method = "spearman")
## Warning in cor.test.default(A5RQ1$Minutes, A5RQ1$Drinks, method = "spearman"):
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: A5RQ1$Minutes and A5RQ1$Drinks
## S = 1305608, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9200417
# DETERMINE STATISTICAL SIGNIFICANCE & EFFECT SIZE FOR THE TEST (rho)
# ========================================================
# >> WRITTEN REPORT FOR SPEARMAN CORRELATION <<
# ========================================================
# OUTPUT
# 1) Spearman Correlation Test
# 2) Variables: The amount of time customer stay in store (Minutes) and Number of drinks customer purchases (Drinks)
# 3) n = 461
# 4) The inferential test results were statistically significant (p < .001)
# 5) Variable 1: Amount if Minutes; M = 29.89; SD = 18.63
# Variable 2: Number of Drinks; M = 3.00; SD = 1.95
# 6) The correlation is POSITIVE with STRONG effect W = 0.920
# 7) r-value (sample estimate: rho) = 0.920
# 8) p < 0.001
# ........................................................
# 2) WRITE YOUR FINAL REPORT
# OUTPUT REPORT:
# An Spearman Correlation test was conducted to assess the relationship between
# The amount of time customer spends in the store (Minutes) and the Number of drinks customer purchases (Drinks)
# There was a statistically significant correlation between
# Amount of time stay in the store (Minutes, M = 29.89; SD = 18.63) and Number of purchased drinks (Drinks, M = 3.00; SD = 1.95)
# The correlation was Positive and Strong, p < .001
# As the amount of time customer stay in store increases, the number of drinks they purchase increases