===================================================

PEARSON CORRELATION & SPEARMAN CORRELATION OVERVIEW

===================================================

PURPOSE

Used to test the relationship between two continuous variables.

==========

HYPOTHESES

==========

NULL HYPOTHESIS

There is no relationship between Antivirus and Laptop.

ALTERNATE HYPOTHESIS

There is a relationship between Antivirus and Laptop.

DIRECTIONAL ALTERNATE HYPOTHESES

As Variable A increases, Variable B increases.

As Variable A increases, Variable B decreases.

………………………………………………………..

QUESTION

What are the null and alternate hypotheses for your research?

H0:There is no relationship between variables Antivirus and Laptop

H1:There is a relationship between variables Antivirus and Laptop

………………………………………………………..

======================

IMPORT EXCEL FILE CODE

======================

Descriptive Statistics

# Load required packages
library(readxl)
library(psych)

# Load dataset
dataset <- read_excel("C:\\Users\\navya\\Downloads\\A5RQ2.xlsx")

# Display descriptive statistics for both variables
describe(dataset[, c("Antivirus", "Laptop")])
##           vars   n  mean    sd median trimmed   mad min max range  skew
## Antivirus    1 122 50.18 13.36     49   49.92 12.60  15  83    68  0.15
## Laptop       2 122 40.02 12.30     39   39.93 11.86   8  68    60 -0.01
##           kurtosis   se
## Antivirus    -0.14 1.21
## Laptop       -0.32 1.11

Histogram

# Histogram for checking the normality
hist(dataset$Antivirus,
     main = "Histogram of Antivirus Licenses",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(dataset$Laptop,
     main = "Histogram of Laptops Purchased",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

## Review of Histograms
# ........................................................
# QUESTION
# Answer the questions below as comments within the R script:
# Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# It appears symmetrical because it is uniformly distributed across the range, it does not lean left ot right.
# Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# The histogram for V1 is too flat(platykurtic).It does not show a bell curve shape.
# Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# It appears mostly symmetrical, with a slight positive skew because a small line extending towards high values.
# Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# The histogram for variable 2 has a proper bell shaped curve, it is like normal distribution.
# ........................................................

Shapiro-Wilk Tests

## Shapiro-Wilk tests for normality
shapiro.test(dataset$Antivirus)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$Antivirus
## W = 0.99419, p-value = 0.8981
shapiro.test(dataset$Laptop)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$Laptop
## W = 0.99362, p-value = 0.8559
# .........................................................
# QUESTION
# Answer the questions below as a comment within the R script:
# Was the data normally distributed for Variable 1?
#Answer: Based on the shapiro test p=0.89,v1 was normally distributed.
# Was the data normally distributed for Variable 2?
# Based on the shapiro test p=0.99,v2 was normally distributed.

# If the data is normal for both variables, continue with the Pearson Correlation test.
# Change to Spearman Correlation test because one or both of variables are NOT normal.
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)

Scatterplot

# Scatterplot to visually show the relationship between two continuous variables.
ggscatter(dataset, x = "Antivirus", y = "Laptop",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "pearson",
          xlab = "Antivirus", ylab = "Laptop")

# ........................................................
# QUESTION
# Answer the questions below as a comment within the R script:
# Is the relationship positive (line pointing up), negative (line pointing down), or is there no relationship (line is flat)?
# The relationship between Antivirus and Laptop is positive(line pointing up),because the regression goes up(R= 0.92).

Pearson Correlation Test

# Decision: Both Shapiro-Wilk p-values were > .05, so we use Pearson.
cor.test(dataset$Antivirus, dataset$Laptop, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  dataset$Antivirus and dataset$Laptop
## t = 25.16, df = 120, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8830253 0.9412249
## sample estimates:
##       cor 
## 0.9168679

Report

========================================================

>> WRITTEN REPORT FOR PEARSON CORRELATION <<

========================================================

Write a paragraph summarizing your findings.

A Pearson correlation test was conducted to examine the relationship between antivirus and laptop score. Shapiro-Wilk tests indicated that both variables were normally distributed (p > .05), so Pearson Correlation was appropriate analysis. The scatterplot showed Strong positive upward trend, which suggests higher antivirus values are associated with higher laptop prices. Pearson correlation test confirmed this visual pattern, yielding a strong positive correlation (R ≈ 0.92). This result was statistically significant (p < .05), This shows that there is a meaningful relationship between the two variable. Therefore, we reject the null hypothesis and conclude Antivirus and Laptop are significantly and positively related.

………………………………………………..

2) WRITE YOUR FINAL REPORT

A Pearson correlation analysis was conducted to examine the relationship between Antivirus Licenses and Laptop Purchases. Descriptive statistics showed that Antivirus had a mean of M = 50.18 (SD = 13.36), and Laptop had a mean of M = 40.02 (SD = 12.30). The correlation results indicated a significant relationship between the two variables, r(df = 120) = 0.92, p < .001. The direction of the relationship was positive, indicating that as Antivirus values increased, Laptop values also increased. Based on the effect size, the correlation was large. These results support the alternate hypothesis, showing that there is a meaningful relationship between Antivirus Licenses and Laptop Purchases.