assignment-4_final.knit

title: “assignment 4_correlations” author: “Vedant Mahajan” date: “2026-02-08” output: html_document

library(readxl)

library(ggpubr)

## Loading required package: ggplot2

tom <- read_excel("D:/Vedant Work/SLU/Spring Sem (Jan to May 2026)/Applied Analytics/Assignment 4/tom.xlsx")

mean(tom$StudyHours)

## [1] 6.135609

sd(tom$StudyHours)

## [1] 1.369224

mean(tom$ExamScore)

## [1] 90.06906

sd(tom$ExamScore)

## [1] 6.795224

hist(tom$StudyHours,
     main = "Study Hours",
     breaks = 20,
     col = "lightblue",
     border = "white")

hist(tom$ExamScore,
     main = "Exam Score",
     breaks = 20,
     col = "lightcoral",
     border = "white")

StudyHours appears normally distributed. ExamScore appears slightly skewed.

shapiro.test(tom$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  tom$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(tom$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  tom$ExamScore
## W = 0.96286, p-value = 0.006465

StudyHours is normally distributed (p > .05). ExamScore is not normally distributed (p < .05). Therefore, Spearman correlation is required.

cor.test(tom$StudyHours, tom$ExamScore, method = "spearman")

## Warning in cor.test.default(tom$StudyHours, tom$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  tom$StudyHours and tom$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

Spearman correlation selected due to non-normality. The p-value is < .001, so results are statistically significant. The correlation is positive and strong.

  ggscatter(
  tom,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "Study Hours",
  ylab = "Exam Score"
)

Study hours (M = 6.14, SD = 1.37) was correlated with exam score (M = 90.07, SD = 6.80), ρ(98) = .90, p < .001. The relationship was positive and strong. As study hours increased, exam scores increased.

jerry <- read_excel("D:/Vedant Work/SLU/Spring Sem (Jan to May 2026)/Applied Analytics/Assignment 4/jerry.xlsx")

mean(jerry$ScreenTime)

## [1] 5.063296

sd(jerry$ScreenTime)

## [1] 2.056833

mean(jerry$SleepingHours)

## [1] 6.938459

sd(jerry$SleepingHours)

## [1] 1.351332

hist(jerry$ScreenTime,
     main = "Phone Usage (Hours)",
     breaks = 20,
     col = "lightblue",
     border = "white")

hist(jerry$SleepingHours,
     main = "Sleeping Hours",
     breaks = 20,
     col = "lightcoral",
     border = "white")

shapiro.test(jerry$ScreenTime)

## 
##  Shapiro-Wilk normality test
## 
## data:  jerry$ScreenTime
## W = 0.90278, p-value = 1.914e-06

shapiro.test(jerry$SleepingHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  jerry$SleepingHours
## W = 0.98467, p-value = 0.3004

ScreenTime is not normally distributed. SleepingHours is normally distributed. Therefore, Spearman correlation is required.

cor.test(jerry$ScreenTime, jerry$SleepingHours, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  jerry$ScreenTime and jerry$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The p-value is < .001, therefore statistically significant. The correlation is negative and strong.

  ggscatter(
  jerry,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "Phone Usage (Hours)",
  ylab = "Sleep (Hours)"
)

Phone usage (M = 5.06, SD = 2.06) was correlated with sleeping hours (M = 6.94, SD = 1.35), ρ(98) = -.55, p < .001. The relationship was negative and strong. As phone usage increased, sleeping hours decreased.