Loading Libraries

library(ggplot2)
library(ggpubr)
library(readxl)

Importing Datasets

DatasetA <- read_excel("C:/Users/Admin/Downloads/DatasetA.xlsx")
DatasetB <- read_excel("C:/Users/Admin/Downloads/DatasetB.xlsx")

Question 1: What is the relationship between study hours and exam score?

Variables: Study Hours (Independent Variable), Exam Score (Dependent Variable)

Descriptive Statistics

mean(DatasetA$StudyHours, na.rm = TRUE)
## [1] 6.135609
sd(DatasetA$StudyHours, na.rm = TRUE)
## [1] 1.369224
mean(DatasetA$ExamScore, na.rm = TRUE)
## [1] 90.06906
sd(DatasetA$ExamScore, na.rm = TRUE)
## [1] 6.795224

Normality Check

Independent Variable Graph:

Skewness: Symmetrical

Kurtosis: Proper bell curve

hist(DatasetA$StudyHours,
     main = "Study Hours",
     breaks = 20,
     xlab="Independent Variable Graph: Study Hours",
     col = "lightblue",
     border = "lightyellow",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1) 

Dependent Variable Graph:

Skewness: Negatively skewed

Kurtosis: Too tall

hist(DatasetA$ExamScore,
     main = "Exam Score",
     breaks = 20,
     col = "lightblue",
     xlab="Dependent Variable Graph: Exam Score",
     border = "lightyellow",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1) 

Conducting Shapiro–Wilk tests

shapiro.test(DatasetA$StudyHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

Exam scores are not normally distributed, a Spearman correlation will be used.

cor_test_A <- cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
cor_test_A
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

Correlation Analysis

The Spearman correlation analysis showed a statistically significant relationship between study hours and exam scores (p < .001). The association was positive, indicating that higher study hours were linked to higher exam scores. The relationship was strong, with a Spearman correlation coefficient of ρ = 0.90.

Scatterplots

ggscatter(DatasetA,
          x = "StudyHours",
          y = "ExamScore",
          add = "reg.line",
          xlab = "Study Hours",
          ylab = "Exam Score (%)",
          title = "Relationship Between Study Hours and Exam Score")

Interpretation:

Direction: The relationship between study hours and exam score is positive, meaning higher study hours are associated with higher exam scores. Strength: The data indicate a strong association between the variables. Linearity: The pattern follows a monotonic increasing trend, supporting the use of a Spearman correlation. Outliers: No extreme outliers are present that would significantly affect the results.

Question 2: What is the relationship between how much a person uses their phone (hours) and how much they sleep (hours)?

Variables: Screen Time (IV), Sleeping Hours (DV)

Descriptive Statistics

mean(DatasetB$ScreenTime, na.rm = TRUE)
## [1] 5.063296
sd(DatasetB$ScreenTime, na.rm = TRUE)
## [1] 2.056833
mean(DatasetB$SleepingHours, na.rm = TRUE)
## [1] 6.938459
sd(DatasetB$SleepingHours, na.rm = TRUE)
## [1] 1.351332

Normality Check

Independent Variable Graph:

Skewness: Positively Skewed

Kurtosis: Too Flat

hist(DatasetB$ScreenTime,
     main = "Screen Time",
     breaks = 20,
     col = "lightblue",
     xlab="Independent Variable Graph: Screen Time",
     border = "lightyellow",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1) 

Dependent Variable Graph:

Skewness: Symmetrical

Kurtosis: Proper Bell Curve

hist(DatasetB$SleepingHours,
     main = "Sleeping Hours",
     breaks = 20,
     col = "lightblue",
     xlab="Dependent Variable Graph: Sleeping Hours",
     border = "lightyellow",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1) 

Conducting Shapiro–Wilk tests

For ScreenTime p < .05 → variable is not normal

For SleepingHours p ≥ .05 → variable is normal

shapiro.test(DatasetB$ScreenTime)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

Decision: Because screen time was not normally distributed based on the Shapiro–Wilk test, a Spearman correlation will be used.

cor_test_B <- cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
cor_test_B
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

Correlation Analysis

The Spearman correlation indicated a statistically significant association between screen time and sleeping hours (p < .001). The relationship was negative, meaning that individuals who spent more time on their phones generally reported fewer hours of sleep. The strength of the association was moderate, with a correlation coefficient of ρ = −0.55.

Scatterplots

ggscatter(DatasetB,
          x = "ScreenTime",
          y = "SleepingHours",
          add = "reg.line",
          xlab = "Screen Time (Hours)",
          ylab = "Sleeping Hours",
          title = "Relationship Between Screen Time and Sleeping Hours")

Interpretation

Direction: The relationship between screen time and sleeping hours is negative, indicating that increased phone usage is associated with fewer hours of sleep. Strength: The relationship is moderate in strength. Linearity: The pattern appears monotonic, which is appropriate for a Spearman correlation. Outliers: No extreme outliers are observed that would meaningfully influence the relationship.