Hypothesis:
H0: There is no relationship between time spent in the shop Minutes
and the number of drinks purchased (Drinks).
H1: There is a positive relationship between time spent in the shop
Minutes and the number of drinks purchased (Drinks).
===================================================
PEARSON CORRELATION & SPEARMAN CORRELATION OVERVIEW
===================================================
PURPOSE
Used to test the relationship between two continuous variables.
==========
HYPOTHESES
==========
NULL HYPOTHESIS
There is no relationship between Variables A and B.
ALTERNATE HYPOTHESIS
There is a relationship between Variables A and B.
DIRECTIONAL ALTERNATE HYPOTHESES
As Variable A increases, Variable B increases.
As Variable A increases, Variable B decreases.
======================
IMPORT EXCEL FILE CODE
======================
PURPOSE OF THIS CODE
Imports your Excel dataset automatically into R Studio.
You need to import your dataset every time you want to analyze your
data in R Studio.
INSTALL REQUIRED PACKAGE
The package only needs to be installed once.
The code for this task is provided below. Remove the hashtag below
to convert the note into code.
install.packages(“readxl”)
LOAD THE PACKAGE
You must always reload the package you want to use.
The code for this task is provided below. Remove the hashtag below
to convert the note into code.
library(readxl)
IMPORT THE EXCEL FILE INTO R STUDIO
Download the Excel file from One Drive and save it to your
desktop.
Right-click the Excel file and click “Copy as path” from the
menu.
In R Studio, replace the example path below with your actual
path.
Replace backslashes with forward slashes / or double them //:
✘ WRONG “C:.xlsx”
✔ CORRECT “C:/Users/Joseph/Desktop/mydata.xlsx”
✔ CORRECT “C:\Users\Joseph\Desktop\mydata.xlsx”
Replace “dataset” with the name of your excel data (without the
.xlsx)
An example of the code for this task is provided below.
You can edit the code below and remove the hashtag to use the code
below.
dataset <- read_excel("/Users/sharmilaakula/Downloads/OneDrive_2_11-14-2025/A5RQ1.xlsx")
======================
DESCRIPTIVE STATISTICS
======================
Calculate the mean, median, SD, and sample size for each
variable.
INSTALL THE REQUIRED PACKAGE
Remove the hashtag in front of the code below to install the package
once.
After installing the package, put the hashtag in front of the code
again.
install.packages(“psych”)
LOAD THE PACKAGE
Always reload the package you want to use.
library(psych)
CALCULATE THE DESCRIPTIVE DATA
Replace “dataset” with the name of your excel data (without the
.xlsx)
Replace “V1” with the R code name for your first variable.
Replace “V2” with the R code name for your second variable.
describe(dataset[, c("Minutes", "Drinks")])
## vars n mean sd median trimmed mad min max range skew kurtosis
## Minutes 1 461 29.89 18.63 24.4 26.99 15.12 10 154.2 144.2 1.79 5.20
## Drinks 2 461 3.00 1.95 3.0 2.75 1.48 0 17.0 17.0 1.78 6.46
## se
## Minutes 0.87
## Drinks 0.09
===============================================
CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES
===============================================
OVERVIEW
Two methods will be used to check the normality of the continuous
variables.
First, you will create histograms to visually inspect the normality
of the variables.
Next, you will conduct a test called the Shapiro-Wilk test to
inspect the normality of the variables.
It is important to know whether or not the data is normal to
determine which inferential test should be used.
CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE
A histogram is used to visually check if the data is normally
distributed.
CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE
Replace “dataset” with the name of your excel data (without the
.xlsx)
Replace “V1” with the R code name for your first variable.
Replace “V2” with the R code name for your second variable.
hist(dataset$Minutes,
main = "Histogram of Minutes",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)

hist(dataset$Drinks,
main = "Histogram of Drinks",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)

………………………………………………..
QUESTION
Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion,
does the histogram look symmetrical, positively skewed, or negatively
skewed?
A)In our Opinion, The Skewness of the VARIABLE 1 histogram looks
positively Skewed # Q2) Check the KURTOSIS of the VARIABLE 1 histogram.
In your opinion, does the histogram look too flat, too tall, or does it
have a proper bell curve? A)After checking the kurtosis of the minutes
spent, the histogram does not have proper bell shaped curve
# Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion,
does the histogram look symmetrical, positively skewed, or negatively
skewed? A)In our Opinion, The Skewness of the VARIABLE 2 histogram looks
positively Skewed # Q4) Check the KUROTSIS of the VARIABLE 2 histogram.
In your opinion, does the histogram look too flat, too tall, or does it
have a proper bell curve? A)After checking the kurtosis of the drinks
bought, the histogram does not have proper bell shaped curve #
………………………………………………..
PURPOSE
Use a statistical test to check the normality of the continuous
variables.
The Shapiro-Wilk Test is a test that checks skewness and kurtosis at
the same time.
The test is checking “Is this variable the SAME as normal data (null
hypothesis) or DIFFERENT from normal data (alternate hypothesis)?”
For this test, if p is GREATER than .05 (p > .05), the data is
NORMAL.
If p is LESS than .05 (p < .05), the data is NOT normal.
CONDUCT THE SHAPIRO-WILK TEST
Replace “dataset” with the name of your excel data (without the
.xlsx)
Replace “V1” with the R code name for your first variable.
Replace “V2” with the R code name for your second variable.
shapiro.test(dataset$Minutes)
##
## Shapiro-Wilk normality test
##
## data: dataset$Minutes
## W = 0.84706, p-value < 2.2e-16
shapiro.test(dataset$Drinks)
##
## Shapiro-Wilk normality test
##
## data: dataset$Drinks
## W = 0.85487, p-value < 2.2e-16
…………………………………………………
QUESTION
Was the data normally distributed for Variable 1?
NO # Was the data normally distributed for Variable 2? NO
If the data is normal for both variables, continue with the Pearson
Correlation test.
If one or both of variables are NOT normal, change to the Spearman
Correlation test.
Since both the variables are NOT normal, change to the Spearman
Correlation test.
=========================
VISUALLY DISPLAY THE DATA
=========================
CREATE A SCATTERPLOT
PURPOSE
A scatterplot visually shows the relationship between two continuous
variables.
INSTALL THE REQUIRED PACKAGES
Remove the hashtags in front of the code below to install the
package once.
After installing the packages, put the hashtag in front of the code
again.
install.packages(“ggplot2”)
install.packages(“ggpubr”)
LOAD THE PACKAGE
Always reload the package you want to use.
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(ggpubr)
CREATE THE SCATTERPLOT
Replace “dataset” with the name of your excel data (without the
.xlsx)
Replace “V1” with the R code name for your first variable.
Replace “V2” with the R code name for your second variable.
Replace “pearson” with “spearman” if you are using the spearman
correlation.
ggscatter(dataset, x = "Minutes", y = "Drinks",
add = "reg.line",
conf.int = TRUE,
cor.coef = TRUE,
cor.method = "spearman",
xlab = "Variable Minutes", ylab = "Variable Drinks")

………………………………………………..
QUESTION
Is the relationship positive (line pointing up), negative (line
pointing down), or is there no relationship (line is flat)?
- Since the line pointing upwards, the relationship seems to be
postive.
================================================
PEARSON CORRELATION OR SPEARMAN CORRELATION TEST
================================================
PURPOSE
Check if the means of the two groups are different.
CONDUCT THE PEARSON CORRELATION OR SPEARMAN CORRELATION
Replace “dataset” with the name of your excel data (without the
.xlsx)
Replace “V1” with the R code name for your first variable.
Replace “V2” with the R code name for your second variable.
Replace “pearson” with “spearman” if you are using the spearman
correlation.
cor.test(dataset$Minutes, dataset$Drinks, method = "spearman")
## Warning in cor.test.default(dataset$Minutes, dataset$Drinks, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: dataset$Minutes and dataset$Drinks
## S = 1305608, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9200417
DETERMINE STATISTICAL SIGNIFICANCE
If results were statistically significant (p < .05), continue to
effect size section below.
If results were NOT statistically significant (p > .05), skip to
reporting section below.
NOTE: Getting results that are not statistically significant does
NOT mean you switch to Spearman Correlation.
The Spearman Correlation is only for abnormally distributed data —
not based on outcome significance.
===============================================
EFFECT SIZE FOR PEARSON & SPEARMAN CORRRELATION
===============================================
If results were statistically significant, then determine how the
variables are related and how strong the relationship is.
1) REVIEW THE CORRECT CORRELATION TEST
• For Pearson correlation, find “sample estimates: cor” in your
output (when you calculated the Pearson Correlation earlier).
• For Spearman correlation, find “sample estimates: rho” in your
output (when you calculated the Spearman Correlation earlier).
………………………………………………..
1) WRITE THE REPORT
Q1) What is the direction of the effect?
“Direction” explains the relationship between the variables.
A positive (+) correlation means as Variable X increases, Variable Y
increases.
A negative (-) correlation means as Variable X increases, Variable Y
decreases.
Examples:
A correlation of 0.90 is positive. As X increases, Y increases.
A correlation of -0.90 is negative. As X increases, Y
decreases.
Q2) What is the size of the effect?
Ranges from 0 (no relationship) to 1.00 (perfect relationship).
“Size” explains how much the variables are connected to each
other.
± 0.00 to 0.09 = no relationship
± 0.10 to 0.29 = weak
± 0.30 to 0.49 = moderate
± 0.50 to 1.00 = strong
Examples:
A correlation of 0.90 is a strong relationship.
A correlation of 0.15 is a weak relationship.
========================================================
>> WRITTEN REPORT FOR PEARSON CORRELATION <<
========================================================
Write a paragraph summarizing your findings.
………………………………………………..
1) REVIEW YOUR OUTPUT
Collect the information below from your output:
1) The name of the inferential test used (Pearson Correlation)
2) The names of the two variables you analyzed (their proper names,
not their R code names).
3) The total sample size (labeled as “n”).
4) Whether the inferential test results were statistically
significant (p < .05) or not (p > .05)
5) The mean and SD for each variable (rounded to two places after
the decimal)
6) The direction and size of the correlation.
7) Degrees of freedom (labeled as “df”)
8) r-value (labeled as “sample estimate: cor” in output)
9) EXACT p-value to three decimals. NOTE: If p > .05, just report
p > .05 If p < .001, just report p < .001
………………………………………………..
2) WRITE YOUR FINAL REPORT
An example report is provided below. You should copy the paragraph
and just edit/ replace words with your information.
This is not considered plagiarizing because science has a specific
format for reporting information.
EXAMPLE
A Pearson correlation was conducted to examine the relationship
between
job satisfaction and employee performance (n = 300).
There was a statistically significant correlation between
job satisfaction (M = 8.21, SD = 0.2) and employee performance (M =
4.2, SD = 0.02).
The correlation was positive and strong, r(298) = 0.65, p <
.05.
As job satisfaction increases, employee performance also
increases.
========================================================
>> WRITTEN REPORT FOR SPEARMAN CORRELATION <<
========================================================
Write a paragraph summarizing your findings.
………………………………………………..
1) REVIEW YOUR OUTPUT
Collect the information below from your output:
1) The name of the inferential test used (Spearman Correlation)
2) The names of the two variables you analyzed (their proper names,
not their R code names).
3) The total sample size (labeled as “n”).
4) Whether the inferential test results were statistically
significant (p < .05) or not (p > .05)
5) The mean and SD for each variable (rounded to two places after
the decimal)
6) The direction and size of the correlation.
7) rho-value (labeled as “sample estimate: rho” in output, and
labeled as ρ in paragraph)
8) EXACT p-value to three decimals. NOTE: If p > .05, just report
p > .05 If p < .001, just report p < .001
………………………………………………..
2) WRITE YOUR FINAL REPORT
An example report is provided below. You should copy the paragraph
and just edit/ replace words with your information.
This is not considered plagiarizing because science has a specific
format for reporting information.
EXAMPLE
A Spearman correlation was conducted to assess the relationship
between
stress levels and sleep quality (n = 75).
There was a statistically significant correlation between
stress (M = 6.31, SD = 1.21) and sleep quality (M = 4.12, SD =
0.91).
The correlation was negative and moderate, rho = -0.45, p =
.02.
As stress level increases, sleep quality decreases.
R Markdown
This is an R Markdown document. Markdown is a simple formatting
syntax for authoring HTML, PDF, and MS Word documents. For more details
on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be
generated that includes both content as well as the output of any
embedded R code chunks within the document. You can embed an R code
chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Including Plots
You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.