#Research Scenario 1 #A café owner thinks if she can get customers to stay in her café longer, the customers will make more purchases. She plans to make the café more comfortable (add couches, more electrical outlets for laptops, etc) so customers stay longer. Before she makes this investment, the owner wants to check if her belief is true. She buys an AI software to collect information from her cash register and cameras to determine how long each customer stayed in the café and how many drinks they buy. Analyze the data to determine whether there is a relationship between time spent (minutes) in the shop and number of drinks purchased. Use the appropriate test to see if longer visits are associated with higher spending.

PURPOSE

Used to test the relationship between two continuous variables.

NULL HYPOTHESIS

There is no relationship between Time and Drinks bought

ALTERNATE HYPOTHESIS

There is a relationship between time and Drinks.

QUESTION

What are the null and alternate hypotheses for your research?

H0:There is no correlation between time spent in the cafe (Minutes) and amount of beverages received (Drinks).

H1:The number of drinks purchased (Drinks) is negatively correlated with time spent in the cafe (Minutes).

======================

IMPORT EXCEL FILE CODE

======================

options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("readxl")
## Installing package into 'C:/Users/tsury/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\tsury\AppData\Local\Temp\RtmpYl98GT\downloaded_packages
library(readxl)
dataset <- read_excel("C:\\Users\\tsury\\Downloads\\A5RQ1.xlsx")

======================

DESCRIPTIVE STATISTICS

======================

# install.packages("psych")
library(psych)
describe(dataset[, c("Minutes", "Drinks")])
##         vars   n  mean    sd median trimmed   mad min   max range skew kurtosis
## Minutes    1 461 29.89 18.63   24.4   26.99 15.12  10 154.2 144.2 1.79     5.20
## Drinks     2 461  3.00  1.95    3.0    2.75  1.48   0  17.0  17.0 1.78     6.46
##           se
## Minutes 0.87
## Drinks  0.09

===============================================

CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES

===============================================

hist(dataset$Minutes,
     main = "Histogram of Minutes",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(dataset$Drinks,
     main = "Histogram of Drinks",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

#QUESTION Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed? A) The histogram for minutes spent looks positively skewed Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? A) After closely ckecking at the Kurtosis of minutes spent, it does not have a proper bell shaped curve Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed? A) The histogram for drinks bought looks positively skewed Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? A) After closely ckecking at the Kurtosis of drinks bought, it does not have a proper bell shaped curve

PURPOSE

#Use a statistical test to check the normality of the continuous variables. The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time. The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?” For this test, if p is GREATER than .05 (p > .05), the data is NORMAL. If p is LESS than .05 (p < .05), the data is NOT normal.

shapiro.test(dataset$Minutes)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$Minutes
## W = 0.84706, p-value < 2.2e-16
shapiro.test(dataset$Drinks)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$Drinks
## W = 0.85487, p-value < 2.2e-16

QUESTION Was the data normally distributed for Minutes? NO

Was the data normally distributed for Drinks? NO

Since both of variables are NOT normal, change to the Spearman Correlation test.

=========================

VISUALLY DISPLAY THE DATA

=========================

PURPOSE

A scatterplot visually shows the relationship between two continuous variables.

#install.packages(“ggplot2”) #install.packages(“ggpubr”)

LOAD THE PACKAGE

Always reload the package you want to use.

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)

CREATE THE SCATTERPLOT

ggscatter(dataset, x = "Minutes", y = "Drinks",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "pearson",
          xlab = "Variable Minutes", ylab = "Variable Drinks")

QUESTION

Is the relationship positive (line pointing up), negative (line pointing down), or is there no relationship (line is flat)?

The relationship seems to be positive as the line is pointing upwards.

SPEARMAN CORRELATION TEST

PURPOSE

Check if the means of the two groups are different.

cor.test(dataset$Minutes, dataset$Drinks, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  dataset$Minutes and dataset$Drinks
## t = 68.326, df = 459, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9452363 0.9617123
## sample estimates:
##       cor 
## 0.9541922

DETERMINE STATISTICAL SIGNIFICANCE

If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below. NOTE: Getting results that are not statistically significant does NOT mean you switch to Spearman Correlation. The Spearman Correlation is only for abnormally distributed data — not based on outcome significance.

Final Report

A Spearman correlation was conducted to assess the relationship between the minutes customers spend in a cafe and drinks bought (n = 461). There was no statistically significant correlation (p < 0.05) between minutes spent (M = 29.89, SD = 18.63) and drinks bought (M = 3.00, SD = 1.95).