RESEARCH SCENARIO 1 - ASSIGNMENT 5

Assess the correlation between the Amount of time customers stay in store with the Number of drinks they purchase

HYPOTHESES:

H0: There is no relationship between the Amount of time customers stay in store with the Number of drinks they purchase

H1: There is a relationship between the Amount of time customers stay in store with the Number of drinks they purchase

# ======================
# IMPORT EXCEL FILE CODE
# ======================

library(readxl)
A5RQ1 <- read_excel("D:/20251021 AA 5221 Applied Analytics & Methods 1/Week 5/A5RQ1.xlsx")

# ======================
# DESCRIPTIVE STATISTICS
# ======================

# Calculate the mean, median, SD, and sample size for each variable.

library(psych)
describe(A5RQ1[, c("Minutes", "Drinks")])
##         vars   n  mean    sd median trimmed   mad min   max range skew kurtosis
## Minutes    1 461 29.89 18.63   24.4   26.99 15.12  10 154.2 144.2 1.79     5.20
## Drinks     2 461  3.00  1.95    3.0    2.75  1.48   0  17.0  17.0 1.78     6.46
##           se
## Minutes 0.87
## Drinks  0.09
# ===============================================
# CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES
# ===============================================

# CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE
hist(A5RQ1$Minutes,
     main = "Histogram of Amount of time stay in store - Minutes",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(A5RQ1$Drinks,
     main = "Histogram of NUmber of Drinks",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

### Comments:
### Q1) Var 1 histogram is not symmetrical, it is positively skewed
### Q2) Var 1 histogram Kurtosis look tall
### Q3) Var 2 histogram is not symmetrical, it is positively skewed
### Q4) Var 2 histogram Kurtosis look tall

# =============================
# CONDUCT THE SHAPIRO-WILK TEST
# =============================

# PURPOSE
# Use a statistical test to check the normality of the continuous variables.
# The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.
# The test is checking "Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?"
# For this test, if p is GREATER than .05 (p > .05), the data is NORMAL.
# If p is LESS than .05 (p < .05), the data is NOT normal.

shapiro.test(A5RQ1$Minutes)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ1$Minutes
## W = 0.84706, p-value < 2.2e-16
shapiro.test(A5RQ1$Drinks)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ1$Drinks
## W = 0.85487, p-value < 2.2e-16
# COMMENTS
# The data of the amount of time customer spends inside the store and the number of drinks they purchase are ABNORMAL. Use Spearman Correlation Test

# =========================
# VISUALLY DISPLAY THE DATA
# =========================

# CREATE A SCATTERPLOT, Method = "spearman"

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)

ggscatter(A5RQ1, x = "Minutes", y = "Drinks",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "spearman",
          xlab = "Minutes", ylab = "Drinks")

# COMMENT:
# There is a POSITIVE relationship (line pointing up) between the two variables

# ================================================
# PEARSON CORRELATION OR SPEARMAN CORRELATION TEST
# ================================================

# PURPOSE
# Check if the means of the two groups are different.

# CONDUCT THE SPEARMAN CORRELATION TEST FOR THIS SCENARIO

cor.test(A5RQ1$Minutes, A5RQ1$Drinks, method = "spearman")
## Warning in cor.test.default(A5RQ1$Minutes, A5RQ1$Drinks, method = "spearman"):
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  A5RQ1$Minutes and A5RQ1$Drinks
## S = 1305608, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9200417
# DETERMINE STATISTICAL SIGNIFICANCE & EFFECT SIZE FOR THE TEST (rho)

# ========================================================
#     >> WRITTEN REPORT FOR SPEARMAN CORRELATION <<
# ========================================================

# OUTPUT
# 1) Spearman Correlation Test
# 2) Variables: The amount of time customer stay in store (Minutes) and Number of drinks customer purchases (Drinks)
# 3) n = 461
# 4) The inferential test results were statistically significant (p < .001)
# 5) Variable 1: Amount if Minutes; M = 29.89; SD = 18.63
#    Variable 2: Number of Drinks; M = 3.00; SD = 1.95
# 6) The correlation is POSITIVE with STRONG effect W = 0.920
# 7) r-value (sample estimate: rho) = 0.920
# 8) p < 0.001

# ........................................................

# 2) WRITE YOUR FINAL REPORT

# OUTPUT REPORT:
# An Spearman Correlation test was conducted to assess the relationship between
# The amount of time customer spends in the store (Minutes) and the Number of drinks customer purchases (Drinks)
# There was a statistically significant correlation between
# Amount of time stay in the store (Minutes, M = 29.89; SD = 18.63) and Number of purchased drinks (Drinks, M = 3.00; SD = 1.95)
# The correlation was Positive and Strong, p < .001
# As the amount of time customer stay in store increases, the number of drinks they purchase increases

OUTPUT REPORT:

A Spearman Correlation test was conducted to assess the relationship between the Amount of time customers stay in store (Minutes) and the Number of drinks they purchase (Drinks).

There was a statistically significant correlation between the Amount of time stay (Minutes, M = 29.89; SD = 18.63) and Number of Drinks (Drinks, M = 3.00; SD = 1.95).

The correlation was Positive and Strong, p < .001.

As the amount of time customer stay in store increases, the number of drinks they purchase increases.