Research Scenario 1

A café owner thinks if she can get customers to stay in her café longer, the customers will make more purchases. She plans to make the café more comfortable (add couches, more electrical outlets for laptops, etc) so customers stay longer. Before she makes this investment, the owner wants to check if her belief is true. She buys an AI software to collect information from her cash register and cameras to determine how long each customer stayed in the café and how many drinks they buy. Analyze the data to determine whether there is a relationship between time spent (minutes) in the shop and number of drinks purchased. Use the appropriate test to see if longer visits are associated with higher spending.

PURPOSE

Used to test the relationship between two continuous variables. # NULL HYPOTHESIS There is no relationship between time spent and number of drinks purchased.

ALTERNATE HYPOTHESIS

There is a relationship between time spent and number of drinks purchased.

INSTALL REQUIRED PACKAGE

#install.packages("readxl")

LOAD THE PACKAGE

library(readxl)

IMPORT THE EXCEL FILE INTO R STUDIO

A5RQ1 <- read_excel("C:\\Users\\manit\\OneDrive\\Desktop\\A5RQ1.xlsx")
head(A5RQ1)

## # A tibble: 6 × 3
##   Customer Minutes Drinks
##      <dbl>   <dbl>  <dbl>
## 1        1    26.9      3
## 2        2    21.5      2
## 3        3    36.6      3
## 4        4    10.6      1
## 5        5    11.1      1
## 6        6    16.3      1

DESCRIPTIVE STATISTICS

Calculate the mean, median, SD, and sample size for each variable. # INSTALL THE REQUIRED PACKAGE

#install.packages("psych")

LOAD THE PACKAGE

library(psych)

CALCULATE THE DESCRIPTIVE DATA

describe(A5RQ1[, c("Minutes", "Drinks")])

##         vars   n  mean    sd median trimmed   mad min   max range skew kurtosis
## Minutes    1 461 29.89 18.63   24.4   26.99 15.12  10 154.2 144.2 1.79     5.20
## Drinks     2 461  3.00  1.95    3.0    2.75  1.48   0  17.0  17.0 1.78     6.46
##           se
## Minutes 0.87
## Drinks  0.09

CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES

Two methods will be used to check the normality of the continuous variables. First, you will create histograms to visually inspect the normality of the variables.Next, you will conduct a test called the Shapiro-Wilk test to inspect the normality of the variables.It is important to know whether or not the data is normal to determine which inferential test should be used.

CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE

A histogram is used to visually check if the data is normally distributed.

hist(A5RQ1$Minutes,
     main = "Histogram of Minutes",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(A5RQ1$Drinks,
     main = "Histogram of Drinks",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

QUESTION

Answer the questions below as comments within the R script: Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed? Ans) The histogram for Minutes is positively skewed (right-skewed). Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? Ans) The distribution looks too tall and peaked, not like a normal bell curve. It has a leptokurtic shape. Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed? Ans) The histogram for Drinks is also positively skewed (right-skewed). Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve? Ans) The distribution is tall and peaked, not a normal bell curve. # PURPOSE Use a statistical test to check the normality of the continuous variables.The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?” For this test, if p is GREATER than .05 (p > .05), the data is NORMAL.If p is LESS than .05 (p < .05), the data is NOT normal.

shapiro.test(A5RQ1$Minutes)

## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ1$Minutes
## W = 0.84706, p-value < 2.2e-16

shapiro.test(A5RQ1$Drinks)

## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ1$Drinks
## W = 0.85487, p-value < 2.2e-16

QUESTION

Was the data normally distributed for Variable 1? Ans) No Was the data normally distributed for Variable 2? Ans) No

VISUALLY DISPLAY THE DATA

PURPOSE

A scatterplot visually shows the relationship between two continuous variables.

#install.packages("ggplot2")
#install.packages("ggpubr")

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

library(ggpubr)

CREATE THE SCATTERPLOT

ggscatter(A5RQ1, x = "Minutes", y = "Drinks",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "spearman",
          xlab = "Variable Minutes", ylab = "Variable Drinks")

# QUESTION Answer the questions below as a comment within the R script: Is the relationship positive (line pointing up), negative (line pointing down), or is there no relationship (line is flat)? Ans) The relationship is positive. The line is clearly pointing upward.

SPEARMAN CORRELATION TEST

PURPOSE

Check if the means of the two groups are different.

cor.test(A5RQ1$Minutes, A5RQ1$Drinks, method = "spearman")

## Warning in cor.test.default(A5RQ1$Minutes, A5RQ1$Drinks, method = "spearman"):
## Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  A5RQ1$Minutes and A5RQ1$Drinks
## S = 1305608, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9200417

DETERMINE STATISTICAL SIGNIFICANCE

If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below.NOTE: Getting results that are not statistically significant does NOT mean you switch to Spearman Correlation. The Spearman Correlation is only for abnormally distributed data — not based on outcome significance.

EFFECT SIZE FOR SPEARMAN CORRRELATION

Q1) What is the direction of the effect? Ans) The effect is positive because As Minutes increases, the number of Drinks also increases. Q2) What is the size of the effect? Ans) The effect size is strong.

Final Report

A Spearman correlation was conducted to assess the relationship between the minutes customers spend in a café and the number of drinks they buy (n = 461). The results showed a statistically significant positive correlation (p < .001) between minutes spent (M = 29.89, SD = 18.63) and drinks bought (M = 3.00, SD = 1.95), with a strong association (ρ = 0.92).

Team 6 week 5 Scene 1

2025-11-15

Research Scenario 1

PURPOSE

ALTERNATE HYPOTHESIS

INSTALL REQUIRED PACKAGE

LOAD THE PACKAGE

IMPORT THE EXCEL FILE INTO R STUDIO

DESCRIPTIVE STATISTICS

LOAD THE PACKAGE

CALCULATE THE DESCRIPTIVE DATA

CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES

CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE

QUESTION

QUESTION

VISUALLY DISPLAY THE DATA

PURPOSE

CREATE THE SCATTERPLOT

SPEARMAN CORRELATION TEST

PURPOSE

DETERMINE STATISTICAL SIGNIFICANCE

EFFECT SIZE FOR SPEARMAN CORRRELATION

Final Report