SPEARMAN CORRELATION

This analysis is for RESEARCH SCENARIO 2 from assignment 5. It tests to see if there is a positive correlation between the number of laptops purchased and the number of anti-virus licenses purchased.

Hypotheses

  • H0 (Null Hypothesis): There is no relationship between the number of laptops purchased and the number of anti-virus licenses purchased.
  • H1 (Alternate Hypothesis): There is a positive relationship between the number of laptops purchased and the number of anti-virus licenses purchased.

Result paragraph

  • A Spearman correlation was conducted to assess the relationship between number of anti-virus licenses purchased and number of Laptop purchased (n = 122). There was a statistically significant correlation between number of anti-virus licenses purchased (M = 50.18, SD = 13.36) and number of Laptop purchased (M = 40.02, SD = 12.30). The correlation was positive and strong, ρ(120) = .920, p < .001. As time spent in the shop increases, number of Laptop purchased increases.

R code and Analysis

# IMPORT EXCEL FILE CODE

# PURPOSE OF THIS CODE
# Imports your Excel dataset automatically into R Studio.
# You need to import your dataset every time you want to analyze your data in R Studio.

# INSTALL REQUIRED PACKAGE

# install.packages("readxl")

# LOAD THE PACKAGE

library(readxl)

# IMPORT THE EXCEL FILE INTO R STUDIO

dataset <- read_excel("//apporto.com/dfs/SLU/Users/minhoku_slu/Desktop/A5RQ2.xlsx")

# ======================
# DESCRIPTIVE STATISTICS
# ======================

# Calculate the mean, median, SD, and sample size for each variable.

# INSTALL THE REQUIRED PACKAGE

# install.packages("psych")

# LOAD THE PACKAGE

library(psych)

# CALCULATE THE DESCRIPTIVE DATA

describe(dataset[, c("Antivirus", "Laptop")])
##           vars   n  mean    sd median trimmed   mad min max range  skew
## Antivirus    1 122 50.18 13.36     49   49.92 12.60  15  83    68  0.15
## Laptop       2 122 40.02 12.30     39   39.93 11.86   8  68    60 -0.01
##           kurtosis   se
## Antivirus    -0.14 1.21
## Laptop       -0.32 1.11
# ===============================================
# CHECK THE NORMALITY OF THE CONTINUOUS VARIABLES
# ===============================================

# OVERVIEW
# Two methods will be used to check the normality of the continuous variables.
# First, you will create histograms to visually inspect the normality of the variables.
# Next, you will conduct a test called the Shapiro-Wilk test to inspect the normality of the variables.
# It is important to know whether or not the data is normal to determine which inferential test should be used.


# CREATE A HISTOGRAM FOR EACH CONTINUOUS VARIABLE
# A histogram is used to visually check if the data is normally distributed.

hist(dataset$Antivirus,
     main = "Histogram of Antivirus",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(dataset$Laptop,
     main = "Histogram of Laptop",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

# ........................................................
# Q1) Check the SKEWNESS of the VARIABLE 1 (Antivirus) histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# Answer 1: The histogram looks SYMMETRICAL. The data is evenly distributed on both sides of the center.


# Q2) Check the KURTOSIS of the VARIABLE 1 (Antivirus) histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# Answer 2: The histogram looks like it has a PROPER BELL CURVE . It is not too flat or too pointy.


# Q3) Check the SKEWNESS of the VARIABLE 2 (Laptop) histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# Answer 3: The histogram looks SYMMETRICAL. The left and right sides are nearly mirror images.


# Q4) Check the KUROTSIS of the VARIABLE 2 (Laptop) histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# Answer 4: The histogram looks like it has a PROPER BELL CURVE. It has a well-defined, rounded peak.
# ........................................................

# PURPOSE
# Use a statistical test to check the normality of the continuous variables.

# CONDUCT THE SHAPIRO-WILK TEST

shapiro.test(dataset$Antivirus)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$Antivirus
## W = 0.99419, p-value = 0.8981
shapiro.test(dataset$Laptop)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$Laptop
## W = 0.99362, p-value = 0.8559
# .........................................................
# Was the data normally distributed for Variable 1 (Antivirus)?
# Yes, the data was normally distributed for Variable 1 (Antivirus).
# The Shapiro-Wilk test (shapiro.test) result shows a p-value = .8981 (>0.05).
# Was the data normally distributed for Variable 2?
# Yes, the data was normally distributed for Variable 2 (Laptop).
# The Shapiro-Wilk test (shapiro.test) result shows a p-value = .8559 (>0.05).
# .........................................................


# =========================
# VISUALLY DISPLAY THE DATA
# =========================

# CREATE A SCATTERPLOT

# PURPOSE
# A scatterplot visually shows the relationship between two continuous variables.

# INSTALL THE REQUIRED PACKAGES

# install.packages("ggplot2")
# install.packages("ggpubr")

# LOAD THE PACKAGE

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)

# CREATE THE SCATTERPLOT

ggscatter(dataset, x = "Antivirus", y = "Laptop",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "pearson",
          xlab = "Variable Antivirus", ylab = "Variable Laptop")

# ........................................................

# Is the relationship positive (line pointing up), negative (line pointing down), or is there no relationship (line is flat)?
# The relationship is positive.
# ........................................................


# ================================================
# PEARSON CORRELATION TEST(normal distributed)
# ================================================

# PURPOSE
# Check if the means of the two groups are different.

# CONDUCT THE PEARSON CORRELATION OR SPEARMAN CORRELATION

cor.test(dataset$Antivirus, dataset$Laptop, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  dataset$Antivirus and dataset$Laptop
## t = 25.16, df = 120, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8830253 0.9412249
## sample estimates:
##       cor 
## 0.9168679
# DETERMINE STATISTICAL SIGNIFICANCE.

# ===============================================
# EFFECT SIZE FOR PEARSON CORRRELATION
# ===============================================

# If results were statistically significant, then determine how the variables are related and how strong the relationship is.

# 1) REVIEW THE CORRECT CORRELATION TEST
#    cor = 0.9168679

# ........................................................

# 1) WRITE THE REPORT 
#    Q1) What is the direction of the effect?  
#          A correlation of 0.92 is positive. As X increases, Y increases.
#
#     Q2) What is the size of the effect? 
#         A correlation of 0.92 is a strong relationship.