#Research Scenario 1 #A café owner thinks if she can get customers to stay in her café longer, the customers will make more purchases. She plans to make the café more comfortable (add couches, more electrical outlets for laptops, etc) so customers stay longer. Before she makes this investment, the owner wants to check if her belief is true. She buys an AI software to collect information from her cash register and cameras to determine how long each customer stayed in the café and how many drinks they buy. Analyze the data to determine whether there is a relationship between time spent (minutes) in the shop and number of drinks purchased. Use the appropriate test to see if longer visits are associated with higher spending.
options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("readxl")
## Installing package into 'C:/Users/tsury/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\tsury\AppData\Local\Temp\RtmpYl98GT\downloaded_packages
library(readxl)
dataset <- read_excel("C:\\Users\\tsury\\Downloads\\A5RQ1.xlsx")
# install.packages("psych")
library(psych)
describe(dataset[, c("Minutes", "Drinks")])
## vars n mean sd median trimmed mad min max range skew kurtosis
## Minutes 1 461 29.89 18.63 24.4 26.99 15.12 10 154.2 144.2 1.79 5.20
## Drinks 2 461 3.00 1.95 3.0 2.75 1.48 0 17.0 17.0 1.78 6.46
## se
## Minutes 0.87
## Drinks 0.09
hist(dataset$Minutes,
main = "Histogram of Minutes",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(dataset$Drinks,
main = "Histogram of Drinks",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
#QUESTION Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed? A) The histogram for minutes spent looks positively skewed Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? A) After closely ckecking at the Kurtosis of minutes spent, it does not have a proper bell shaped curve Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed? A) The histogram for drinks bought looks positively skewed Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? A) After closely ckecking at the Kurtosis of drinks bought, it does not have a proper bell shaped curve
#Use a statistical test to check the normality of the continuous variables. The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time. The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?” For this test, if p is GREATER than .05 (p > .05), the data is NORMAL. If p is LESS than .05 (p < .05), the data is NOT normal.
shapiro.test(dataset$Minutes)
##
## Shapiro-Wilk normality test
##
## data: dataset$Minutes
## W = 0.84706, p-value < 2.2e-16
shapiro.test(dataset$Drinks)
##
## Shapiro-Wilk normality test
##
## data: dataset$Drinks
## W = 0.85487, p-value < 2.2e-16
QUESTION Was the data normally distributed for Minutes? NO
Was the data normally distributed for Drinks? NO
Since both of variables are NOT normal, change to the Spearman Correlation test.
#install.packages(“ggplot2”) #install.packages(“ggpubr”)
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(ggpubr)
ggscatter(dataset, x = "Minutes", y = "Drinks",
add = "reg.line",
conf.int = TRUE,
cor.coef = TRUE,
cor.method = "pearson",
xlab = "Variable Minutes", ylab = "Variable Drinks")
The relationship seems to be positive as the line is pointing upwards.
SPEARMAN CORRELATION TEST
cor.test(dataset$Minutes, dataset$Drinks, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: dataset$Minutes and dataset$Drinks
## t = 68.326, df = 459, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9452363 0.9617123
## sample estimates:
## cor
## 0.9541922
DETERMINE STATISTICAL SIGNIFICANCE
If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below. NOTE: Getting results that are not statistically significant does NOT mean you switch to Spearman Correlation. The Spearman Correlation is only for abnormally distributed data — not based on outcome significance.
Final Report
A Spearman correlation was conducted to assess the relationship between the minutes customers spend in a cafe and drinks bought (n = 461). There was no statistically significant correlation (p < 0.05) between minutes spent (M = 29.89, SD = 18.63) and drinks bought (M = 3.00, SD = 1.95).