===================================================

PEARSON CORRELATION & SPEARMAN CORRELATION OVERVIEW

===================================================

PURPOSE

Used to test the relationship between time spent (minutes) in the shop and number of drinks purchased.

==========

HYPOTHESES

==========

H0:There is no relationship between time spent (minutes) in the shop and number of drinks purchased.

H1: There is a relationship between time spent (minutes) in the shop and number of drinks purchased.

………………………………………………………..

======================

IMPORTING EXCEL FILE CODE

======================

PURPOSE OF THIS CODE

Imports our Excel dataset automatically into R Studio.

options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("readxl")
## Installing package into 'C:/Users/mnava/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'readxl'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
## C:\Users\mnava\AppData\Local\R\win-library\4.5\00LOCK\readxl\libs\x64\readxl.dll
## to C:\Users\mnava\AppData\Local\R\win-library\4.5\readxl\libs\x64\readxl.dll:
## Permission denied
## Warning: restored 'readxl'
## 
## The downloaded binary packages are in
##  C:\Users\mnava\AppData\Local\Temp\RtmpUNoeNn\downloaded_packages

LOAD THE PACKAGE

We must always reload the package we want to use.

library(readxl)

#Our Excel file imported into RStudio

A5RQ1 <- read_excel("D:/Ms Analytics 2025/Fall 1/Applied Analytics &Methods 1/Week 5/A5RQ1.xlsx")

======================

DESCRIPTIVE STATISTICS

======================

Calculate the mean, median, SD, and sample size for each variable.

INSTALL THE REQUIRED PACKAGE

install.packages("psych")
## Installing package into 'C:/Users/mnava/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'psych' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\mnava\AppData\Local\Temp\RtmpUNoeNn\downloaded_packages

LOAD THE PACKAGE

library(psych)

CALCULATE THE DESCRIPTIVE DATA

describe(A5RQ1[, c("Minutes", "Drinks")])
##         vars   n  mean    sd median trimmed   mad min   max range skew kurtosis
## Minutes    1 461 29.89 18.63   24.4   26.99 15.12  10 154.2 144.2 1.79     5.20
## Drinks     2 461  3.00  1.95    3.0    2.75  1.48   0  17.0  17.0 1.78     6.46
##           se
## Minutes 0.87
## Drinks  0.09
describe(A5RQ1[, c("Minutes", "Drinks")])
##         vars   n  mean    sd median trimmed   mad min   max range skew kurtosis
## Minutes    1 461 29.89 18.63   24.4   26.99 15.12  10 154.2 144.2 1.79     5.20
## Drinks     2 461  3.00  1.95    3.0    2.75  1.48   0  17.0  17.0 1.78     6.46
##           se
## Minutes 0.87
## Drinks  0.09

=========================

VISUALLY DISPLAY THE DATA

=========================

CREATE A SCATTERPLOT

PURPOSE

A scatterplot visually shows the relationship between time spent (minutes) in the shop and number of drinks purchased.

INSTALL THE REQUIRED PACKAGES

Remove the hashtags in front of the code below to install the package once.

After installing the packages, put the hashtag in front of the code again.

install.packages("ggplot2")#
## Installing package into 'C:/Users/mnava/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\mnava\AppData\Local\Temp\RtmpUNoeNn\downloaded_packages
install.packages("ggpubr")#
## Installing package into 'C:/Users/mnava/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggpubr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\mnava\AppData\Local\Temp\RtmpUNoeNn\downloaded_packages

LOAD THE PACKAGE

Always reload the package you want to use.

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)

CREATE THE SCATTERPLOT

ggscatter(A5RQ1, x = "Minutes", y = "Drinks",
           add = "reg.line",
           conf.int = TRUE,
           cor.coef = TRUE,
           cor.method = "spearman",
           xlab = "Minutes", ylab = "Drinks")

……………………………………………….

The Relationship is positive

===============================================

CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES

===============================================

OVERVIEW

Two methods will be used to check the normality of the continuous variables.

First, we will create histograms to visually inspect the normality of the variables.

Next, we will conduct a test called the Shapiro-Wilk test to inspect the normality of the variables.

It is important to know whether or not the data is normal to determine which inferential test should be used.

CREATING A HISTOGRAM FOR EACH CONTINUOUS VARIABLE

hist(A5RQ1$Minutes,
      main = "Histogram of Minutes",
      xlab = "Value",
      ylab = "Frequency",
      col = "lightblue",
      border = "black",
      breaks = 20)

 hist(A5RQ1$Drinks,
      main = "Histogram of Drinks",
      xlab = "Value",
      ylab = "Frequency",
      col = "lightgreen",
      border = "black",
      breaks = 20)

………………………………………………..

#Observations:

The SWEWNESS of both Minutes and Drinks are not symmetrical, they are positively skewed.

In our opinion the KURTOSIS of both Minutes and Drinks are too tall.

………………………………………………..

PURPOSE

#We use a statistical test to check the normality of the continuous variables # The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time. # The test is checking “Is this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?” # For this test, if p is GREATER than .05 (p > .05), the data is NORMAL. # If p is LESS than .05 (p < .05), the data is NOT normal.

CONDUCTING SHAPIRO-WILK TEST

shapiro.test(A5RQ1$Minutes)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ1$Minutes
## W = 0.84706, p-value < 2.2e-16
shapiro.test(A5RQ1$Drinks)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ1$Drinks
## W = 0.85487, p-value < 2.2e-16

…………………………………………………

#Observations # The data was not normally distrubuted for both Minutes and Drinks

…………………………………………………

If one or both of variables are NOT normal we change to the Spearman Correlation test.

================================================

PEARSON CORRELATION OR SPEARMAN CORRELATION TEST

================================================

PURPOSE

Check if the means of the two groups are different.

CONDUCT THE PEARSON CORRELATION OR SPEARMAN CORRELATION

 cor.test(A5RQ1$Minutes, A5RQ1$Drinks, method = "spearman")
## Warning in cor.test.default(A5RQ1$Minutes, A5RQ1$Drinks, method = "spearman"):
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  A5RQ1$Minutes and A5RQ1$Drinks
## S = 1305608, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9200417

DETERMINE STATISTICAL SIGNIFICANCE

#Our results indicate that p<2.2e-16 which is statistical significant.

===============================================

EFFECT SIZE FOR PEARSON & SPEARMAN CORFRELATION

===============================================

………………………………………………..

REPORT

#There is a correlation of 0.9541922 .As minutes increase ,the number of drinks bought is also increasing. #The relationship between minutes spent and drinks bought is also strong

========================================================

>> WRITTEN REPORT FOR PEARSON CORRELATION <<

========================================================

Write a paragraph summarizing your findings.

………………………………………………..

1) REVIEW OF OUTPUT

1) The name of the inferential test used (Spearman Correlation)

2) The names of the two variables analyzed relationship between time spent (minutes) in the shop and number of drinks purchased.

3) The total sample size is 461

4) Whether the inferential test results were statistically significant p-value <2.2e-16

5) The mean for minutes was 29.89 and for drinks was 3.00

The SD for Minutes was 18.63 and for Drinks was 1.95

6) The direction and size of the correlation was positive with a correlation of 0.9200417

7) Degrees of freedom was 459

8) r-value sample estimates:0.92

9) p-value 0.00000000000000022

………………………………………………..

FINAL REPORT

A Spearman correlation was conducted to examine the relationship between

time spent (minutes) in the shop and number of drinks purchased (n = 461

There was a statistically significant correlation between

minutes (M = 29.89, SD = 18.63) and drinks purchased (M = 3.00, SD = 1.95).

The correlation was positive and strong, r = 0.92, p < .05.

time spent (minutes) in the shop increases, drinks purchased also increases.