Research Scenario 2

In order to be more competitive in the market, a technology store wants to start selling their laptops and anti-virus software as a bundle to small businesses. A bundle is when two products are sold together at a lower price than if they were purchased separately. Before offering the bundle, the store wants to make sure the two products are commonly purchased together. The store has data from the past year showing how many laptops and how many anti-virus software licenses each small business bought from them. Analyze the data to determine if there is a positive correlation between the number of laptops purchased and the number of anti-virus licenses purchased.

PURPOSE

Used to test the relationship between two continuous variables.

NULL HYPOTHESIS

There is no relationship between the number of laptops purchased and the number of antivirus licenses purchased.

ALTERNATE HYPOTHESIS

There is a relationship between the number of laptops purchased and the number of antivirus licenses purchased.

INSTALL REQUIRED PACKAGE

#install.packages("readxl")

LOAD THE PACKAGE

library(readxl)

IMPORT THE EXCEL FILE INTO R STUDIO

A5RQ2 <- read_excel("C:\\Users\\manit\\OneDrive\\Desktop\\A5RQ2.xlsx")
head(A5RQ2)
## # A tibble: 6 Γ— 3
##   Business Antivirus Laptop
##      <dbl>     <dbl>  <dbl>
## 1        1        42     31
## 2        2        47     36
## 3        3        73     68
## 4        4        51     38
## 5        5        52     43
## 6        6        76     61

DESCRIPTIVE STATISTICS

Calculate the mean, median, SD, and sample size for each variable. ## INSTALL THE REQUIRED PACKAGE

#install.packages("psych")

LOAD THE PACKAGE

library(psych)

CALCULATE THE DESCRIPTIVE DATA

describe(A5RQ2[, c("Antivirus", "Laptop")])
##           vars   n  mean    sd median trimmed   mad min max range  skew
## Antivirus    1 122 50.18 13.36     49   49.92 12.60  15  83    68  0.15
## Laptop       2 122 40.02 12.30     39   39.93 11.86   8  68    60 -0.01
##           kurtosis   se
## Antivirus    -0.14 1.21
## Laptop       -0.32 1.11

CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES

Two methods will be used to check the normality of the continuous variables. First, you will create histograms to visually inspect the normality of the variables.Next, you will conduct a test called the Shapiro-Wilk test to inspect the normality of the variables.It is important to know whether or not the data is normal to determine which inferential test should be used.

CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE

A histogram is used to visually check if the data is normally distributed.

hist(A5RQ2$Antivirus,
     main = "Histogram of Antivirus",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(A5RQ2$Laptop,
     main = "Histogram of Laptop",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

QUESTION

Answer the questions below as comments within the R script:
Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
Ans) The histogram for Laptop is slightly positively skewed.
Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
Ans) The Laptop distribution has a normal bell-curve shape, not too flat or too tall.
Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
Ans) The histogram for Antivirus is slightly positively skewed.
Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
Ans) The Antivirus distribution is fairly bell-shaped and not extremely flat or tall.

PURPOSE

Use a statistical test to check the normality of the continuous variables.The Shapiro-Wilk Test is a test that checks skewness and kurtosis at the same time.The test is checking β€œIs this variable the SAME as normal data (null hypothesis) or DIFFERENT from normal data (alternate hypothesis)?” For this test, if p is GREATER than .05 (p > .05), the data is NORMAL.If p is LESS than .05 (p < .05), the data is NOT normal.

shapiro.test(A5RQ2$Antivirus)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ2$Antivirus
## W = 0.99419, p-value = 0.8981
shapiro.test(A5RQ2$Laptop)
## 
##  Shapiro-Wilk normality test
## 
## data:  A5RQ2$Laptop
## W = 0.99362, p-value = 0.8559

QUESTION

Was the data normally distributed for Variable 1?
Ans) Yes
Was the data normally distributed for Variable 2?
Ans) Yes

VISUALLY DISPLAY THE DATA

PURPOSE

A scatterplot visually shows the relationship between two continuous variables.

#install.packages("ggplot2")
#install.packages("ggpubr")
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)

CREATE THE SCATTERPLOT

ggscatter(A5RQ2, x = "Antivirus", y = "Laptop",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "pearson",
          xlab = "Variable Antivirus", ylab = "Variable Laptop")

QUESTION

Answer the questions below as a comment within the R script:
Is the relationship positive (line pointing up), negative (line pointing down), or is there no relationship (line is flat)?
Ans) The relationship is strongly positive, as the regression line clearly slopes upward.

PEARSON CORRELATION TEST

PURPOSE

Check if the means of the two groups are different.

cor.test(A5RQ2$Antivirus, A5RQ2$Laptop, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  A5RQ2$Antivirus and A5RQ2$Laptop
## t = 25.16, df = 120, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8830253 0.9412249
## sample estimates:
##       cor 
## 0.9168679

DETERMINE STATISTICAL SIGNIFICANCE

If results were statistically significant (p < .05), continue to effect size section below. If results were NOT statistically significant (p > .05), skip to reporting section below.NOTE: Getting results that are not statistically significant does NOT mean you switch to Spearman Correlation. The Spearman Correlation is only for abnormally distributed data β€” not based on outcome significance.

EFFECT SIZE FOR PEARSON CORRRELATION

Q1) What is the direction of the effect?
Ans) The effect is positive because as Antivirus purchases increase, Laptop purchases also increase.
Q2) What is the size of the effect?
Ans) The effect size is strong.

Final Report

A Pearson correlation was conducted to examine the relationship between the number of laptops purchased and the number of antivirus software licenses purchased by small businesses (n = 122). The results showed a statistically significant positive correlation (p < .001) between laptops purchased (M = 40.02, SD = 12.30) and antivirus licenses purchased (M = 50.18, SD = 13.36), with a strong association (r = 0.92).