Loading Libraries

library(ggplot2)
library(ggpubr)
library(readxl)

Importing Datasets

DatasetA <- read_excel("DatasetA.xlsx")
DatasetB <- read_excel("DatasetB.xlsx")

Research Question 1: What is the relationship between study hours and exam score?

Variables: Study Hours (IV), Exam Score (DV)

Descriptive Statistics

mean(DatasetA$StudyHours, na.rm = TRUE)
## [1] 6.135609
sd(DatasetA$StudyHours, na.rm = TRUE)
## [1] 1.369224
mean(DatasetA$ExamScore, na.rm = TRUE)
## [1] 90.06906
sd(DatasetA$ExamScore, na.rm = TRUE)
## [1] 6.795224

Normality Check

Independent Variable Graph:

Skewness: Symmetrical

Kurtosis: Proper bell curve

hist(DatasetA$StudyHours,
     main = "Study Hours",
     breaks = 20,
     xlab="Independent Variable Graph: Study Hours",
     col = "lightblue",
     border = "blue",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Dependent Variable Graph

Skewness: Negatively skewed

Kurtosis: Too tall

hist(DatasetA$ExamScore,
     main = "Exam Score",
     breaks = 20,
     col = "lightblue",
     xlab="Dependent Variable Graph: Exam Score",
     border = "red",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Conducting Shapiro–Wilk tests

For StudyHours p ≥ .05 → variable is normal

For ExamScore p < .05 → variable is not normal

shapiro.test(DatasetA$StudyHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

Decision: Because exam scores were not normally distributed, a Spearman correlation will be used.

cor_test_A <- cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
cor_test_A
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

Correlation Analysis

Statistical Significance: The results were statistically significant, as the p-value was less than .05 (p < .001).

Direction of the Relationship: The relationship between study hours and exam score was positive. This means that as the number of hours students study increases, exam scores also tend to increase.

Strength of the Relationship: The relationship was strong. The Spearman correlation coefficient was p = 0.90, which indicates a very strong association between study time and exam performance.

Scatterplots

ggscatter(DatasetA,
          x = "StudyHours",
          y = "ExamScore",
          add = "reg.line",
          conf.int = TRUE,
          xlab = "Study Hours",
          ylab = "Exam Score (%)",
          title = "Relationship Between Study Hours and Exam Score")

Research Question 2: What is the relationship between how much a person uses their phone (hours) and how much they sleep (hours)?

Variables: Screen Time (IV), Sleeping Hours (DV)

Descriptive Statistics

mean(DatasetB$ScreenTime, na.rm = TRUE)
## [1] 5.063296
sd(DatasetB$ScreenTime, na.rm = TRUE)
## [1] 2.056833
mean(DatasetB$SleepingHours, na.rm = TRUE)
## [1] 6.938459
sd(DatasetB$SleepingHours, na.rm = TRUE)
## [1] 1.351332

Normality Check

Independent Variable Graph

Skewness: Positively Skewed

Kurtosis: Too Flat

hist(DatasetB$ScreenTime,
     main = "Screen Time",
     breaks = 20,
     col = "lightblue",
     xlab="Independent Variable Graph: Screen Time",
     border = "red",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Dependent Variable Graph

Skewness: Symmetrical

Kurtosis: Proper Bell Curve

hist(DatasetB$SleepingHours,
     main = "Sleeping Hours",
     breaks = 20,
     col = "lightblue",
     xlab="Dependent Variable Graph: Sleeping Hours",
     border = "blue",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Conducting Shapiro–Wilk tests

For ScreenTime p < .05 → variable is not normal

For SleepingHours p ≥ .05 → variable is normal

shapiro.test(DatasetB$ScreenTime)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

Decision: Because screen time was not normally distributed based on the Shapiro–Wilk test, a Spearman correlation will be used.

cor_test_B <- cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
cor_test_B
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

Correlation Analysis

Statistical Significance: The results were statistically significant, as the p-value was less than .05 (p < .001).

Direction of the Relationship: The relationship between screen time and sleeping hours was negative. This indicates that as screen time increases, the amount of sleep tends to decrease.

Strength of the Relationship: The relationship was moderate in strength. The Spearman correlation coefficient was ρ = −0.55, which reflects a moderate negative association between phone use and sleep duration.

Scatterplots

ggscatter(DatasetB,
          x = "ScreenTime",
          y = "SleepingHours",
          add = "reg.line",
          conf.int = TRUE,
          xlab = "Screen Time (Hours)",
          ylab = "Sleeping Hours",
          title = "Relationship Between Screen Time and Sleeping Hours")