RESEARCH SCENARIO 1

A café owner thinks if she can get customers to stay in her café longer, the customers will make more purchases. She plans to make the café more comfortable (add couches, more electrical outlets for laptops, etc) so customers stay longer. Before she makes this investment, the owner wants to check if her belief is true. She buys an AI software to collect information from her cash register and cameras to determine how long each customer stayed in the café and how many drinks they buy. Analyze the data to determine whether there is a relationship between time spent (minutes) in the shop and number of drinks purchased. Use the appropriate test to see if longer visits are associated with higher spending.

HYPOTHESES

This is a correlation hypothesis, tested using Pearson correlation Null Hypotheses: There is no relationship between time spent in the café and number of drinks purchased. Statistically: ρ = 0

Alternative Hypotheses: There is a positive relationship between time spent and number of drinks purchased. Statistically: ρ > 0

options(repos=c(CRAN="https://cloud.r-project.org"))
install.packages("readxl")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\RtmpALllzr\downloaded_packages
library(readxl)
A5RQ1 <- read_excel("C:\\Users\\sweth\\Downloads\\A5RQ1.xlsx")
install.packages("psych")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'psych' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\RtmpALllzr\downloaded_packages
library(psych)
describe(A5RQ1[, c("Minutes","Drinks")])
##         vars   n  mean    sd median trimmed   mad min   max range skew kurtosis
## Minutes    1 461 29.89 18.63   24.4   26.99 15.12  10 154.2 144.2 1.79     5.20
## Drinks     2 461  3.00  1.95    3.0    2.75  1.48   0  17.0  17.0 1.78     6.46
##           se
## Minutes 0.87
## Drinks  0.09
hist(A5RQ1$Minutes,
     main = "Histogram of Minutes",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(A5RQ1$Drinks,
    main = "Histogram of Drinks",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

Variable 1 Minutes: This variable is positively skewed; most of the customers stay for shorter times, while very few stay for extremely long times. The kurtosis of this variable seems to be too tall-it is leptokurtic-because the distribution has an acute peak with a long right tail. Variable 2 Drinks: This variable is also positively skewed; most people spend less on drinks, while very few spend a lot. Like Variable 1, the kurtosis is too tall or leptokurtic, which is shown by its high central peak with a stretched right tail.

shapiro.test(A5RQ1$Minutes)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ1$Minutes
## W = 0.84706, p-value < 2.2e-16
shapiro.test(A5RQ1$Drinks)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ1$Drinks
## W = 0.85487, p-value < 2.2e-16

Normality check for the data: Shapiro-Wilk test for Variable 1 (Minutes): W = 0.84706, p-value < 2.2e-16 Shapiro-Wilk test for Variable 2 (Drinks): W = 0.85487, p-value < 2.2e-16 The p-values for both variables are far below 0.05, so we reject the null of normality for both. Therefore, neither Variable 1 (Minutes) nor Variable 2 (Drinks) is normally distributed.

install.packages("ggplot2")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## 
##   There is a binary version available but the source version is later:
##         binary source needs_compilation
## ggplot2  4.0.0  4.0.1             FALSE
## installing the source package 'ggplot2'
install.packages("ggpubr")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggpubr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\RtmpALllzr\downloaded_packages
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)
ggscatter(A5RQ1, x = "Minutes", y = "Drinks",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "pearson",
          xlab = "Variable Minutes", ylab = "Variable Drinks")

The scatter plot shows a regression line that slopes upward from left to right. This indicates a positive relationship between the variables: as Minutes increase, Drinks also tend to increase.

cor.test(A5RQ1$Minutes, A5RQ1$Drinks, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  A5RQ1$Minutes and A5RQ1$Drinks
## t = 68.326, df = 459, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9452363 0.9617123
## sample estimates:
##       cor 
## 0.9541922

EFFECT SIZE FOR PEARSON & SPEARMAN CORRRELATION

The relationship between Minutes and Drinks can be assessed using Pearson’s correlation. This revealed a correlation coefficient of 0.954. This means a very strong positive association, and it suggests that the more Minutes there are, the more Drinks there tend to be. The strength of this relationship is striking, since the correlation is very close to 1.00, meaning changes in one variable are closely matched by changes in the other. Moreover, the 95% confidence interval of the correlation, ranging from 0.945 to 0.962, indicates how reliable and exact this estimate is. Overall, these data indicate a strongly positive relationship between Minutes and Drinks. The effect is strong in magnitude and consistent in direction.

WRITTEN REPORT FOR PEARSON CORRELATION

A Pearson correlation was conducted to investigate the relationship between Minutes spent and Drinks consumed, n = 461. The minutes (M = 29.89, SD = 18.63) and Drinks (M = 3.00, SD = 1.95) are statistically significantly and strongly positively correlated, r(459) = 0.954, p < .001. This means that as time goes on, the number of Drinks consumed also increases. This high correlation would indicate a strong and consistent association, whereby persons who stayed longer tended to consume more drinks, hence establishing the direction and magnitude of the relationship.

WRITTEN REPORT FOR SPEARMAN CORRELATION

A Spearman correlation was conducted to examine the relationship between Minutes spent and Drinks consumed, n = 461. Indeed, the analysis showed a strong positive statistical correlation between the Minutes of exercise, M = 29.89, SD = 18.63, and Drinks, M = 3.00, SD = 1.95, ρ(459) = 0.954, p < .001. What this means is that the more time elapsed, the more Drinks were consumed. The very high correlation suggests a strong and consistent relationship, and the direction of this is that those who spend more time tend to consume more drinks.