This analysis is for RESEARCH SCENARIO 2 from assignment 5. It tests to see if there is a positive correlation between the number of laptops purchased and the number of anti-virus licenses purchased.
# IMPORT EXCEL FILE CODE
# PURPOSE OF THIS CODE
# Imports your Excel dataset automatically into R Studio.
# You need to import your dataset every time you want to analyze your data in R Studio.
# INSTALL REQUIRED PACKAGE
# install.packages("readxl")
# LOAD THE PACKAGE
library(readxl)
# IMPORT THE EXCEL FILE INTO R STUDIO
dataset <- read_excel("//apporto.com/dfs/SLU/Users/minhoku_slu/Desktop/A5RQ2.xlsx")
# ======================
# DESCRIPTIVE STATISTICS
# ======================
# Calculate the mean, median, SD, and sample size for each variable.
# INSTALL THE REQUIRED PACKAGE
# install.packages("psych")
# LOAD THE PACKAGE
library(psych)
# CALCULATE THE DESCRIPTIVE DATA
describe(dataset[, c("Antivirus", "Laptop")])
## vars n mean sd median trimmed mad min max range skew
## Antivirus 1 122 50.18 13.36 49 49.92 12.60 15 83 68 0.15
## Laptop 2 122 40.02 12.30 39 39.93 11.86 8 68 60 -0.01
## kurtosis se
## Antivirus -0.14 1.21
## Laptop -0.32 1.11
# ===============================================
# CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES
# ===============================================
# OVERVIEW
# Two methods will be used to check the normality of the continuous variables.
# First, you will create histograms to visually inspect the normality of the variables.
# Next, you will conduct a test called the Shapiro-Wilk test to inspect the normality of the variables.
# It is important to know whether or not the data is normal to determine which inferential test should be used.
# CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE
# A histogram is used to visually check if the data is normally distributed.
hist(dataset$Antivirus,
main = "Histogram of Antivirus",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(dataset$Laptop,
main = "Histogram of Laptop",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
# ........................................................
# Q1) Check the SKEWNESS of the VARIABLE 1 (Antivirus) histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# Answer 1: The histogram looks SYMMETRICAL. The data is evenly distributed on both sides of the center.
# Q2) Check the KURTOSIS of the VARIABLE 1 (Antivirus) histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# Answer 2: The histogram looks like it has a PROPER BELL CURVE . It is not too flat or too pointy.
# Q3) Check the SKEWNESS of the VARIABLE 2 (Laptop) histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# Answer 3: The histogram looks SYMMETRICAL. The left and right sides are nearly mirror images.
# Q4) Check the KUROTSIS of the VARIABLE 2 (Laptop) histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# Answer 4: The histogram looks like it has a PROPER BELL CURVE. It has a well-defined, rounded peak.
# ........................................................
# PURPOSE
# Use a statistical test to check the normality of the continuous variables.
# CONDUCT THE SHAPIRO-WILK TEST
shapiro.test(dataset$Antivirus)
##
## Shapiro-Wilk normality test
##
## data: dataset$Antivirus
## W = 0.99419, p-value = 0.8981
shapiro.test(dataset$Laptop)
##
## Shapiro-Wilk normality test
##
## data: dataset$Laptop
## W = 0.99362, p-value = 0.8559
# .........................................................
# Was the data normally distributed for Variable 1 (Antivirus)?
# Yes, the data was normally distributed for Variable 1 (Antivirus).
# The Shapiro-Wilk test (shapiro.test) result shows a p-value = .8981 (>0.05).
# Was the data normally distributed for Variable 2?
# Yes, the data was normally distributed for Variable 2 (Laptop).
# The Shapiro-Wilk test (shapiro.test) result shows a p-value = .8559 (>0.05).
# .........................................................
# =========================
# VISUALLY DISPLAY THE DATA
# =========================
# CREATE A SCATTERPLOT
# PURPOSE
# A scatterplot visually shows the relationship between two continuous variables.
# INSTALL THE REQUIRED PACKAGES
# install.packages("ggplot2")
# install.packages("ggpubr")
# LOAD THE PACKAGE
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(ggpubr)
# CREATE THE SCATTERPLOT
ggscatter(dataset, x = "Antivirus", y = "Laptop",
add = "reg.line",
conf.int = TRUE,
cor.coef = TRUE,
cor.method = "pearson",
xlab = "Variable Antivirus", ylab = "Variable Laptop")
# ........................................................
# Is the relationship positive (line pointing up), negative (line pointing down), or is there no relationship (line is flat)?
# The relationship is positive.
# ........................................................
# ================================================
# PEARSON CORRELATION TEST(normal distributed)
# ================================================
# PURPOSE
# Check if the means of the two groups are different.
# CONDUCT THE PEARSON CORRELATION OR SPEARMAN CORRELATION
cor.test(dataset$Antivirus, dataset$Laptop, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: dataset$Antivirus and dataset$Laptop
## t = 25.16, df = 120, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8830253 0.9412249
## sample estimates:
## cor
## 0.9168679
# DETERMINE STATISTICAL SIGNIFICANCE.
# ===============================================
# EFFECT SIZE FOR PEARSON CORRRELATION
# ===============================================
# If results were statistically significant, then determine how the variables are related and how strong the relationship is.
# 1) REVIEW THE CORRECT CORRELATION TEST
# cor = 0.9168679
# ........................................................
# 1) WRITE THE REPORT
# Q1) What is the direction of the effect?
# A correlation of 0.92 is positive. As X increases, Y increases.
#
# Q2) What is the size of the effect?
# A correlation of 0.92 is a strong relationship.