title: “assignment 4_correlations” author: “Vedant Mahajan” date: “2026-02-08” output: html_document
library(readxl)
library(ggpubr)
## Loading required package: ggplot2
tom <- read_excel("D:/Vedant Work/SLU/Spring Sem (Jan to May 2026)/Applied Analytics/Assignment 4/tom.xlsx")
mean(tom$StudyHours)
## [1] 6.135609
sd(tom$StudyHours)
## [1] 1.369224
mean(tom$ExamScore)
## [1] 90.06906
sd(tom$ExamScore)
## [1] 6.795224
hist(tom$StudyHours,
main = "Study Hours",
breaks = 20,
col = "lightblue",
border = "white")
hist(tom$ExamScore,
main = "Exam Score",
breaks = 20,
col = "lightcoral",
border = "white")
StudyHours appears normally distributed. ExamScore appears slightly skewed.
shapiro.test(tom$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: tom$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(tom$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: tom$ExamScore
## W = 0.96286, p-value = 0.006465
StudyHours is normally distributed (p > .05). ExamScore is not normally distributed (p < .05). Therefore, Spearman correlation is required.
cor.test(tom$StudyHours, tom$ExamScore, method = "spearman")
## Warning in cor.test.default(tom$StudyHours, tom$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: tom$StudyHours and tom$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
Spearman correlation selected due to non-normality. The p-value is < .001, so results are statistically significant. The correlation is positive and strong.
ggscatter(
tom,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "Study Hours",
ylab = "Exam Score"
)
Study hours (M = 6.14, SD = 1.37) was correlated with exam score (M = 90.07, SD = 6.80), ρ(98) = .90, p < .001. The relationship was positive and strong. As study hours increased, exam scores increased.
jerry <- read_excel("D:/Vedant Work/SLU/Spring Sem (Jan to May 2026)/Applied Analytics/Assignment 4/jerry.xlsx")
mean(jerry$ScreenTime)
## [1] 5.063296
sd(jerry$ScreenTime)
## [1] 2.056833
mean(jerry$SleepingHours)
## [1] 6.938459
sd(jerry$SleepingHours)
## [1] 1.351332
hist(jerry$ScreenTime,
main = "Phone Usage (Hours)",
breaks = 20,
col = "lightblue",
border = "white")
hist(jerry$SleepingHours,
main = "Sleeping Hours",
breaks = 20,
col = "lightcoral",
border = "white")
shapiro.test(jerry$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: jerry$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(jerry$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: jerry$SleepingHours
## W = 0.98467, p-value = 0.3004
ScreenTime is not normally distributed. SleepingHours is normally distributed. Therefore, Spearman correlation is required.
cor.test(jerry$ScreenTime, jerry$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: jerry$ScreenTime and jerry$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
The p-value is < .001, therefore statistically significant. The correlation is negative and strong.
ggscatter(
jerry,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "Phone Usage (Hours)",
ylab = "Sleep (Hours)"
)
Phone usage (M = 5.06, SD = 2.06) was correlated with sleeping hours (M = 6.94, SD = 1.35), ρ(98) = -.55, p < .001. The relationship was negative and strong. As phone usage increased, sleeping hours decreased.