RPubs Link:
https://rpubs.com/MianAfzaalZahoor/1392723
library(ggplot2)
library(ggpubr)
library(readxl)
DatasetA <- read_excel("DatasetA.xlsx")
DatasetB <- read_excel("DatasetB.xlsx")
Variables: Study Hours (IV), Exam Score (DV)
mean(DatasetA$StudyHours, na.rm = TRUE)
## [1] 6.135609
sd(DatasetA$StudyHours, na.rm = TRUE)
## [1] 1.369224
mean(DatasetA$ExamScore, na.rm = TRUE)
## [1] 90.06906
sd(DatasetA$ExamScore, na.rm = TRUE)
## [1] 6.795224
hist(DatasetA$StudyHours,
main = "Study Hours",
breaks = 20,
xlab="Independent Variable Graph: Study Hours",
col = "lightblue",
border = "blue",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
Independent Variable Graph:
Skewness: Symmetrical
Kurtosis: Proper bell curve
hist(DatasetA$ExamScore,
main = "Exam Score",
breaks = 20,
col = "lightblue",
xlab="Dependent Variable Graph: Exam Score",
border = "red",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
Dependent Variable Graph
Skewness: Negatively skewed
Kurtosis: Too tall
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
For StudyHours p ≥ .05 → variable is normal
For ExamScore p < .05 → variable is not normal
Decision: Because exam scores were not normally distributed, a Spearman correlation will be used.
cor_test_A <- cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
cor_test_A
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
Statistical Significance: The results were statistically significant, as the p-value was less than .05 (p < .001).
Direction of the Relationship: The relationship between study hours and exam score was positive. This means that as the number of hours students study increases, exam scores also tend to increase.
Strength of the Relationship: The relationship was strong. The Spearman correlation coefficient was p = 0.90, which indicates a very strong association between study time and exam performance.
ggscatter(DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
conf.int = TRUE,
xlab = "Study Hours",
ylab = "Exam Score (%)",
title = "Relationship Between Study Hours and Exam Score")
Direction: The line of best fit slopes upward, indicating a positive relationship. As study hours increase, exam scores increase.
Strength: The points closely hug the line of best fit, indicating a strong relationship.
Linearity: The points form a clear straight-line pattern, showing the relationship is linear.
Outliers: There are no extreme points far away from the main cluster, so no serious outliers are evident.
Variables: Screen Time (IV), Sleeping Hours (DV)
mean(DatasetB$ScreenTime, na.rm = TRUE)
## [1] 5.063296
sd(DatasetB$ScreenTime, na.rm = TRUE)
## [1] 2.056833
mean(DatasetB$SleepingHours, na.rm = TRUE)
## [1] 6.938459
sd(DatasetB$SleepingHours, na.rm = TRUE)
## [1] 1.351332
hist(DatasetB$ScreenTime,
main = "Screen Time",
breaks = 20,
col = "lightblue",
xlab="Independent Variable Graph: Screen Time",
border = "red",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
Independent Variable Graph
Skewness: Positively Skewed
Kurtosis: Too Flat
hist(DatasetB$SleepingHours,
main = "Sleeping Hours",
breaks = 20,
col = "lightblue",
xlab="Dependent Variable Graph: Sleeping Hours",
border = "blue",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
Dependent Variable Graph
Skewness: Symmetrical
Kurtosis: Proper Bell Curve
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
For ScreenTime p < .05 → variable is not normal
For SleepingHours p ≥ .05 → variable is normal
Decision: Because screen time was not normally distributed based on the Shapiro–Wilk test, a Spearman correlation will be used.
cor_test_B <- cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
cor_test_B
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
Statistical Significance: The results were statistically significant, as the p-value was less than .05 (p < .001).
Direction of the Relationship: The relationship between screen time and sleeping hours was negative. This indicates that as screen time increases, the amount of sleep tends to decrease.
Strength of the Relationship: The relationship was moderate in strength. The Spearman correlation coefficient was ρ = −0.55, which reflects a moderate negative association between phone use and sleep duration.
ggscatter(DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
conf.int = TRUE,
xlab = "Screen Time (Hours)",
ylab = "Sleeping Hours",
title = "Relationship Between Screen Time and Sleeping Hours")
Direction: The line of best fit slopes downward, indicating a negative relationship. As screen time increases, sleeping hours decrease.
Strength: The points moderately cluster around the line, indicating a moderate to strong relationship.
Linearity: The points follow a generally straight-line pattern, indicating the relationship is monotonic and approximately linear, which supports using a Spearman correlation.
Outliers: A few points may appear slightly distant, but none are extreme enough to significantly distort the relationship.