===================================================

PEARSON CORRELATION & SPEARMAN CORRELATION OVERVIEW

===================================================

PURPOSE

Used to test the relationship between two continuous variables.

==========

HYPOTHESES

==========

NULL HYPOTHESIS

There is no relationship between Minutes and Drinks.

ALTERNATE HYPOTHESIS

There is a relationship between Minutes and Drinks.

DIRECTIONAL ALTERNATE HYPOTHESES

As Variable A increases, Variable B increases.

As Variable A increases, Variable B decreases.

………………………………………………………..

QUESTION

What are the null and alternate hypotheses for your research?

H0:There is no relationship between variables Minutes and Drinks

H1:There is a relationship between variables Minutes and Drinks

………………………………………………………..

======================

IMPORT EXCEL FILE CODE

======================

Descriptive Statistics

# Load required packages
library(readxl)
library(psych)

# Load dataset
dataset <- read_excel("C:\\Users\\navya\\Downloads\\A5RQ1.xlsx")

# Display descriptive statistics for both variables
describe(dataset[, c("Minutes", "Drinks")])
##         vars   n  mean    sd median trimmed   mad min   max range skew kurtosis
## Minutes    1 461 29.89 18.63   24.4   26.99 15.12  10 154.2 144.2 1.79     5.20
## Drinks     2 461  3.00  1.95    3.0    2.75  1.48   0  17.0  17.0 1.78     6.46
##           se
## Minutes 0.87
## Drinks  0.09

Histogram

# Histogram for checking the normality
hist(dataset$Minutes,
     main = "Histogram of Minutes",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightblue",
     border = "black",
     breaks = 20)

hist(dataset$Drinks,
     main = "Histogram of Drinks",
     xlab = "Value",
     ylab = "Frequency",
     col = "lightgreen",
     border = "black",
     breaks = 20)

## Review of Histograms
# ........................................................
# QUESTION
# Answer the questions below as comments within the R script:
# Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# It appears symmetrical because it is uniformly distributed across the range, it does not lean left ot right.
# Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# The histogram for V1 is too flat(platykurtic).It does not show a bell curve shape.
# Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# It appears mostly symmetrical, with a slight positive skew because a small line extending towards high values.
# Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# The histogram for variable 2 has a proper bell shaped curve, it is like normal distribution.
# ........................................................

Shapiro-Wilk Tests

## Shapiro-Wilk tests for normality
shapiro.test(dataset$Minutes)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$Minutes
## W = 0.84706, p-value < 2.2e-16
shapiro.test(dataset$Drinks)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$Drinks
## W = 0.85487, p-value < 2.2e-16
# .........................................................
# QUESTION
# Answer the questions below as a comment within the R script:
# Was the data normally distributed for Variable 1?
#Answer: Based on the shapiro test p=0.89,v1 was normally distributed.
# Was the data normally distributed for Variable 2?
# Based on the shapiro test p=0.99,v2 was normally distributed.

# If the data is normal for both variables, continue with the Pearson Correlation test.
# Change to Spearman Correlation test because one or both of variables are NOT normal.
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(ggpubr)

Scatterplot

# Scatterplot to visually show the relationship between two continuous variables.
ggscatter(dataset, x = "Minutes", y = "Drinks",
          add = "reg.line",
          conf.int = TRUE,
          cor.coef = TRUE,
          cor.method = "pearson",
          xlab = "Minutes", ylab = "Drinks")

# ........................................................
# QUESTION
# Answer the questions below as a comment within the R script:
# Is the relationship positive (line pointing up), negative (line pointing down), or is there no relationship (line is flat)?
# The relationship between Antivirus and Laptop is positive(line pointing up),because the regression goes up(R= 0.92).

Pearson Correlation Test

# Decision: Both Shapiro-Wilk p-values were > .05, so we use Pearson.
cor.test(dataset$Minutes, dataset$Drinks, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  dataset$Minutes and dataset$Drinks
## t = 68.326, df = 459, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9452363 0.9617123
## sample estimates:
##       cor 
## 0.9541922

Report

========================================================

>> WRITTEN REPORT FOR PEARSON CORRELATION <<

========================================================

Write a paragraph summarizing your findings.

A Pearson correlation test was conducted to examine the relationship between antivirus and laptop score. Shapiro-Wilk tests indicated that both variables were normally distributed (p > .05), so Pearson Correlation was appropriate analysis. The scatterplot showed Strong positive upward trend, which suggests higher antivirus values are associated with higher laptop prices. Pearson correlation test confirmed this visual pattern, yielding a strong positive correlation (R ≈ 0.92). This result was statistically significant (p < .05), This shows that there is a meaningful relationship between the two variable. Therefore, we reject the null hypothesis and conclude Antivirus and Laptop are significantly and positively related.

………………………………………………..

2) WRITE YOUR FINAL REPORT

#A Spearman correlation was conducted to examine the relationship between time spent and number of drinks purchased. Number of customers (n = 461). There was a statistically significant correlation between time spent (mean = 29.89, sd = 18.63) and number of drinks (mean = 3.00, sd = 1.95). The correlation was positive and strong, ρ(459) = 0.92, p < 0.05. As time spent in the shop increases, number of drinks purchased also increases.