In order to be more competitive in the market, a technology store wants to start selling their laptops and anti-virus software as a bundle to small businesses. A bundle is when two products are sold together at a lower price than if they were purchased separately. Before offering the bundle, the store wants to make sure the two products are commonly purchased together. The store has data from the past year showing how many laptops and how many anti-virus software licenses each small business bought from them. Analyze the data to determine if there is a positive correlation between the number of laptops purchased and the number of anti-virus licenses purchased.
Used to test the relationship between two continuous variables.
There is no relationship between laptops and anti-virus licenses purchased.
There is a relationship between laptops and anti-virus licenses purchased.
# install.packages("readxl")
library(readxl)
## Warning: package 'readxl' was built under R version 4.5.2
dataset <- read_excel("C:/Users/Murari_Lakshman/Downloads/A5RQ2.xlsx")
Calculate the mean, median, SD, and sample size for each variable.
# install.packages("psych")
library(psych)
## Warning: package 'psych' was built under R version 4.5.2
describe(dataset[, c("Antivirus", "Laptop")])
## vars n mean sd median trimmed mad min max range skew
## Antivirus 1 122 50.18 13.36 49 49.92 12.60 15 83 68 0.15
## Laptop 2 122 40.02 12.30 39 39.93 11.86 8 68 60 -0.01
## kurtosis se
## Antivirus -0.14 1.21
## Laptop -0.32 1.11
Two methods will be used to check the normality of the continuous variables. First, you will create histograms to visually inspect the normality of the variables. Next, you will conduct a test called the Shapiro-Wilk test to inspect the normality of the variables. It is important to know whether or not the data is normal to determine which inferential test should be used.
A histogram is used to visually check if the data is normally distributed.
hist(dataset$Antivirus,
main = "Histogram of Antivirus",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(dataset$Laptop,
main = "Histogram of Laptop",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion,
does the histogram look symmetrical, positively skewed, or negatively
skewed?
A) The histogram for anti-virus purchases looks
symmetrical
Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion,
does the histogram look too flat, too tall, or does it have a proper
bell curve?
A) After closely checking at the Kurtosis of anti-virus
purchases, it has a proper bell-shaped curve
Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion,
does the histogram look symmetrical, positively skewed, or negatively
skewed?
A) The histogram for laptop purchases looks
symmetrical
Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion,
does the histogram look too flat, too tall, or does it have a proper
bell curve?
A) After closely checking at the Kurtosis of laptop purchases,
it has a proper bell-shaped curve
Use a statistical test to check the normality of the continuous variables. The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time. The test is checking βIs this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?β For this test, if p is GREATER than .05 (p > .05), the data is NORMAL. If p is LESS than .05 (p < .05), the data is NOT normal.
shapiro.test(dataset$Antivirus)
##
## Shapiro-Wilk normality test
##
## data: dataset$Antivirus
## W = 0.99419, p-value = 0.8981
shapiro.test(dataset$Laptop)
##
## Shapiro-Wilk normality test
##
## data: dataset$Laptop
## W = 0.99362, p-value = 0.8559
Was the data normally distributed for Variable 1?
YES
Was the data normally distributed for Variable 2?
YES
A scatterplot visually shows the relationship between two continuous variables.
# install.packages("ggplot2")
# install.packages("ggpubr")
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.5.2
ggscatter(dataset, x = "Antivirus", y = "Laptop",
add = "reg.line",
conf.int = TRUE,
cor.coef = TRUE,
cor.method = "pearson",
xlab = "Variable Antivirus", ylab = "Variable Laptop")
Is the relationship positive (line pointing up), negative (line
pointing down), or is there no relationship (line is flat)?
A) The relationship seems to be positive as the line is pointing
upwards.
cor.test(dataset$Antivirus, dataset$Laptop, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: dataset$Antivirus and dataset$Laptop
## t = 25.16, df = 120, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8830253 0.9412249
## sample estimates:
## cor
## 0.9168679
If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below. NOTE: Getting results that are not statistically significant does NOT mean you switch to Spearman Correlation. The Spearman Correlation is only for abnormally distributed data β not based on outcome significance.
Q1) What is the direction of the effect?
A) A correlation of 0.9168679 is positive (+) means as laptop
purchases increases, anti-virus license purchases
increases.
Q2) What is the size of the effect?
A) A correlation of 0.9168679 is a strong
relationship
A Pearson correlation was conducted to examine the relationship between laptops purchased and anti-virus license purchased (n = 122). There was a statistically significant correlation between laptops (M = 40.02, SD = 12.30) and anti-virus licenses purchased (M = 50.18, SD = 13.36). The correlation was positive and strong, r(122) = 0.92, p < .001. As laptop purchases increases, anti-virus license purchases also increases.