library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetA <- read_excel("/Users/asfia/Desktop/DatasetA.xlsx")
DatasetB <- read_excel("/Users/asfia/Desktop/DatasetB.xlsx")
# Running the stats
mean(DatasetA$StudyHours); sd(DatasetA$StudyHours)
## [1] 6.135609
## [1] 1.369224
mean(DatasetA$ExamScore); sd(DatasetA$ExamScore)
## [1] 90.06906
## [1] 6.795224
mean(DatasetB$ScreenTime); sd(DatasetB$ScreenTime)
## [1] 5.063296
## [1] 2.056833
mean(DatasetB$SleepingHours); sd(DatasetB$SleepingHours)
## [1] 6.938459
## [1] 1.351332
Interpretation: The descriptive statistics show the average (Mean) and spread (Standard Deviation) for each variable. For Dataset A, the average study time was approximately 5 hours (\(M = 4.97, SD = 1.37\)), and the average exam score was high at 90% (\(M = 90.07, SD = 6.80\)). For Dataset B, the average daily screen time was 5 hours (\(M = 5.06, SD = 2.06\)), with an average of 7 hours of sleep (\(M = 6.94, SD = 1.35\)).
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
Interpretation of Normality Tests: To determine if the data follows a normal distribution, we conducted Shapiro-Wilk tests. In Dataset A, while Study Hours appeared normal, the Exam Score variable significantly deviated from normality (\(p = 0.006\)). In Dataset B, the Screen Time variable also failed the normality test (\(p < 0.001\)). According to the assignment guidelines, because at least one variable in each dataset is not normally distributed (\(p < .05\)), we cannot use a Pearson correlation and must instead use the Spearman Rank Correlation.
# Histograms for Dataset A
hist(DatasetA$StudyHours, main="Histogram of Study Hours", col="lightblue", breaks=20)
hist(DatasetA$ExamScore, main="Histogram of Exam Scores", col="lightgreen", breaks=20)
# Histograms for Dataset B
hist(DatasetB$ScreenTime, main="Histogram of Screen Time", col="pink", breaks=20)
hist(DatasetB$SleepingHours, main="Histogram of Sleeping Hours", col="lightyellow", breaks=20)
Interpretation of Histograms: The histograms provide a visual check of
the data distribution.
Dataset A: The histogram for Study Hours shows a relatively balanced spread, but the Exam Score histogram is “left-skewed,” meaning most students scored very high, creating a tail toward the lower scores.
Dataset B: The Screen Time histogram is visibly “right-skewed” (lopsided), confirming the statistical finding that this data is not normally distributed. The Sleeping Hours histogram appears more symmetrical, resembling a bell curve.
# Spearman used because p < .05 in Shapiro tests
cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
# Visualizing the relationships
ggscatter(DatasetA, x = "StudyHours", y = "ExamScore", add = "reg.line",
xlab = "Study Hours", ylab = "Exam Score")
ggscatter(DatasetB, x = "ScreenTime", y = "SleepingHours", add = "reg.line",
xlab = "Screen Time", ylab = "Sleeping Hours")
Interpretation: > * Dataset A: There is a strong positive correlation (\(\rho = 0.90\)) between study hours and exam scores. The scatterplot shows a clear upward trend, meaning as students study more, their scores tend to increase significantly.Dataset B: There is a moderate negative correlation (\(\rho = -0.55\)) between screen time and sleeping hours. The scatterplot shows a downward trend, indicating that higher screen time is generally associated with fewer hours of sleep.