RESEARCH SCENARIO 2

In order to be more competitive in the market, a technology store wants to start selling their laptops and anti-virus software as a bundle to small businesses. A bundle is when two products are sold together at a lower price than if they were purchased separately. Before offering the bundle, the store wants to make sure the two products are commonly purchased together. The store has data from the past year showing how many laptops and how many anti-virus software licenses each small business bought from them. Analyze the data to determine if there is a positive correlation between the number of laptops purchased and the number of anti-virus licenses purchased.

Hypotheses

Null Hypothesis: There is no correlation between the number of laptops purchased and the number of anti-virus licenses purchased (ρ = 0).

Alternative Hypothesis: There is a positive correlation between the number of laptops purchased and the number of anti-virus licenses purchased (ρ > 0).

options(repos=c(CRAN="https://cloud.r-project.org"))
install.packages("readxl")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'readxl' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\Rtmpm6EhDz\downloaded_packages
library(readxl)
A5RQ2 <- read_excel("C:\\Users\\sweth\\Downloads\\A5RQ2.xlsx")
install.packages("psych")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'psych' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\Rtmpm6EhDz\downloaded_packages
library(psych)
describe(A5RQ2[, c("Antivirus", "Laptop")])
##           vars   n  mean    sd median trimmed   mad min max range  skew
## Antivirus    1 122 50.18 13.36     49   49.92 12.60  15  83    68  0.15
## Laptop       2 122 40.02 12.30     39   39.93 11.86   8  68    60 -0.01
##           kurtosis   se
## Antivirus    -0.14 1.21
## Laptop       -0.32 1.11
hist(A5RQ2$Antivirus,
     main = "Histogram of Antivirus",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(A5RQ2$Laptop,
     main = "Histogram of Laptop",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

Variable 1 (Antivirus) is roughly symmetrical with just a slight right tail, so it has a mild positive skew. The kurtosis indicates that the peak is normal-looking-that is, neither too flat nor too tall-so it may be considered to resemble a proper bell-shaped curve. In Variable 2 (Laptop), there is a small stretch toward the right side, which means that it also has a mild positive skew. Its kurtosis is also normal because the peak is moderate and neither extremely flat nor sharp, thus having the appearance of a proper bell curve.

shapiro.test(A5RQ2$Antivirus)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ2$Antivirus
## W = 0.99419, p-value = 0.8981
shapiro.test(A5RQ2$Laptop)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ2$Laptop
## W = 0.99362, p-value = 0.8559

According to the Shapiro-Wilk test, both Variable 1 (Antivirus) and Variable 2 (Laptop) have a p-value greater than 0.05 (0.8981 and 0.8559, respectively); hence, we cannot reject the null hypothesis of normality. Thus, Variable 1 and Variable 2 can be considered normally distributed.

install.packages("ggplot2")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## 
##   There is a binary version available but the source version is later:
##         binary source needs_compilation
## ggplot2  4.0.0  4.0.1             FALSE
## installing the source package 'ggplot2'
install.packages("ggpubr")
## Installing package into 'C:/Users/sweth/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'ggpubr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sweth\AppData\Local\Temp\Rtmpm6EhDz\downloaded_packages
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)
ggscatter(A5RQ2, x = "Antivirus", y = "Laptop",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "pearson",
          xlab = "Variable Antivirus", ylab = "Variable Laptop")

The scatterplot above shows a regression line that slopes upward, which indicates a positive relationship between Antivirus and Laptop. That is, as the Antivirus values go up, the Laptop values also tend to go up. Since there is a definite upward trend, it indicates that the variables move together in the same direction rather than showing a downward (negative) trend or a flat line showing no relationship.

cor.test(A5RQ2$Antivirus, A5RQ2$Laptop, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  A5RQ2$Antivirus and A5RQ2$Laptop
## t = 25.16, df = 120, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8830253 0.9412249
## sample estimates:
##       cor 
## 0.9168679

EFFECT SIZE FOR PEARSON & SPEARMAN CORRRELATION

The correlation of Antivirus and Laptop has a distinct positive trend, which means as scores of Antivirus When scores increase, Laptop scores also tend to rise. The upward-sloping regression line and the positive correlation coefficient confirm this positive relationship. The effect size is moderate in size because the value of correlation It falls within the range of ±0.30 to ±0.49, which already signifies that the two variables are related in a meaningful way. This means that while the variables move together consistently, it is not strong enough to be considered a strong relationship.

WRITTEN REPORT FOR PEARSON CORRELATION

A Pearson correlation was run to explore the relationship between Antivirus use and Laptop performance (n = 122). There was a significant correlation between Antivirus use (M = 50.18, SD = 13.36) and Laptop performance (M = 40.02, SD = 12.30). This correlation is positive and very strong. r(120) = 0.917, p < .001. As Antivirus use increases, so does Laptop performance.

WRITTEN REPORT FOR SPEARMAN CORRELATION

A Spearman correlation was performed to determine the relationship between Antivirus use and Laptop performance using a sample size of n = 122. There is a statistically significant association between Antivirus use (M = 50.18, SD = 13.36) and Laptop performance (M = 40.02, SD = 12.30), ρ(120) = 0.917, p <.001. As Antivirus use goes up, so too does Laptop performance.