===================================================

PEARSON CORRELATION & SPEARMAN CORRELATION OVERVIEW

===================================================

PURPOSE

Used to test the relationship between the number of laptops purchased and the number of anti-virus licenses purchased

==========

HYPOTHESES

==========

#H0: There is no relation between the number of laptops purchased and the number of anti-virus licenses purchased. # H1: There is a relation between the number of laptops purchased and the number of anti-virus licenses purchased.

………………………………………………………..

======================

IMPORT EXCEL FILE CODE

======================

INSTALL REQUIRED PACKAGE

options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("readxl")
## Installing package into 'C:/Users/mnava/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\mnava\AppData\Local\Temp\Rtmp4wg1Nd\downloaded_packages

LOAD THE PACKAGE

library(readxl)

IMPORT THE EXCEL FILE INTO R STUDIO

 A5RQ2<- read_excel("D:\\Ms Analytics 2025\\Fall 1\\Applied Analytics &Methods 1\\Week 5\\A5RQ2.xlsx")

======================

DESCRIPTIVE STATISTICS

======================

To Calculate the mean, median, SD, and sample size for each variable.

INSTALL THE REQUIRED PACKAGE

install.packages("psych")#
## Installing package into 'C:/Users/mnava/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'psych' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\mnava\AppData\Local\Temp\Rtmp4wg1Nd\downloaded_packages

LOAD THE PACKAGE

library(psych)#

CALCULATE THE DESCRIPTIVE DATA

describe(A5RQ2[, c("Antivirus", "Laptop")])
##           vars   n  mean    sd median trimmed   mad min max range  skew
## Antivirus    1 122 50.18 13.36     49   49.92 12.60  15  83    68  0.15
## Laptop       2 122 40.02 12.30     39   39.93 11.86   8  68    60 -0.01
##           kurtosis   se
## Antivirus    -0.14 1.21
## Laptop       -0.32 1.11

=========================

VISUALLY DISPLAY THE DATA

=========================

CREATE A SCATTERPLOT

PURPOSE

A scatterplot visually shows the relationship between the number of laptops purchased and the number of anti-virus licenses purchased.

INSTALL THE REQUIRED PACKAGES

install.packages("ggplot2")#
## Installing package into 'C:/Users/mnava/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\mnava\AppData\Local\Temp\Rtmp4wg1Nd\downloaded_packages
install.packages("ggpubr")#
## Installing package into 'C:/Users/mnava/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggpubr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\mnava\AppData\Local\Temp\Rtmp4wg1Nd\downloaded_packages

LOAD THE PACKAGE

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)
ggscatter(A5RQ2, x = "Antivirus", y = "Laptop",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "pearson",
          xlab = "Antivirus", ylab = "Laptop")

#The relationship is positive since the line is pointing up. # ………………………………………………..

===============================================

CHECKING THE NORMALITY OF THE CONTINUOUS VARIABLES

===============================================

OVERVIEW

Two methods will be used to check the normality of the continuous variables.

First, we will create histograms to visually inspect the normality of the variables.

Next, we will conduct a test called the Shapiro-Wilk test to inspect the normality of the variables.

HISTOGRAM FOR EACH CONTINUOUS VARIABLE

A histogram is used to visually check if the data is normally distributed.

hist(A5RQ2$Antivirus,
     main = "Histogram of Antivirus",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(A5RQ2$Laptop,
     main = "Histogram of Laptop",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

………………………………………………..

#The histogram for Antivirus is not symmetrical, it is positively skewed. #In our opinion the histogram of antivirus is too tall. #The Histogram for Laptops is not symmetrical, it is negatively skewed. #In our opinion the histogram laptops is a bell curve. # ………………………………………………..

Shapiro-Wilk Test

To check the normality of the continuous variables.

The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.

shapiro.test(A5RQ2$Antivirus)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ2$Antivirus
## W = 0.99419, p-value = 0.8981
shapiro.test(A5RQ2$Laptop)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ2$Laptop
## W = 0.99362, p-value = 0.8559

…………………………………………………

The data for both Laptops and Antivirus was normally distributed.

#The data is normal for both variables, we continue with the Pearson Correlation test. # ================================================ # PEARSON CORRELATION TEST # ================================================

Check if the means of the two groups are different.

cor.test(A5RQ2$Antivirus, A5RQ2$Laptop, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  A5RQ2$Antivirus and A5RQ2$Laptop
## t = 25.16, df = 120, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8830253 0.9412249
## sample estimates:
##       cor 
## 0.9168679

===============================================

EFFECT SIZE FOR PEARSON

===============================================

………………………………………………..

========================================================

>> WRITTEN REPORT FOR PEARSON CORRELATION <<

========================================================

Write a paragraph summarizing your findings.

………………………………………………..

The name of the inferential test used (Pearson Correlation)

The names of the two variables we analyzed are laptops and anti-virus

The total sample size is 122.

The inferential test results were statistically significant p > .05

#mean = 50.8, SD = 13.36-Antivirus #Mean = 40.02, SD = 12.30 -Laptops # The relationship is positive since the line is pointing up # Degrees of freedom = 120 # r-value =0.9168 # p-value p = 2.26 epower -16 => p<0.001 # ………………………………………………..

FINAL REPORT

A Pearson correlation was conducted to examine the relationship between the number of laptops purchased and the number of anti-virus licenses purchased (n =122)

There was a statistically not significant correlation between

number of antivirus software sold (M = 50.8, SD = 13.36) and number of laptops sold (M = 40.02, SD = 12.30).

The correlation was positive and strong, r(120) = 0.9168, p < .001.

There is no relationship between the number of laptops purchased and the number of antivirus licenses purchased.