# Loading required packages
library(readxl)
library(ggpubr)
## Loading required package: ggplot2
# Importing datasets from the computer
DatasetA <- read_excel("/Users/asfia/Desktop/DatasetA.xlsx")
DatasetB <- read_excel("/Users/asfia/Desktop/DatasetB.xlsx")
DatasetA <- read_excel("/Users/asfia/Desktop/DatasetA.xlsx")
# Independent Variable: Study Hours | Dependent Variable: Exam Score
mean(DatasetA$StudyHours); sd(DatasetA$StudyHours)
## [1] 6.135609
## [1] 1.369224
mean(DatasetA$ExamScore); sd(DatasetA$ExamScore)
## [1] 90.06906
## [1] 6.795224
Interpretation: The average study time was \(M = 4.97, SD = 1.37\). The average exam score was \(M = 90.07, SD = 6.80\).
For StudyHours \(p \geq .05\) →
variable is normal
For ExamScore \(p < .05\) → variable
is not normal
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
hist(DatasetA$StudyHours, main="Histogram of Study Hours", col="lightblue", breaks=20)
hist(DatasetA$ExamScore, main="Histogram of Exam Scores", col="lightgreen", breaks=20)
Decision: Because exam scores were not normally
distributed based on the Shapiro–Wilk test (\(p = 0.006\)), a Spearman
correlation will be used. The variable “StudyHours” appears
normally distributed. The data looks symmetrical as most of the data is
clustered in the centre for around 5-7 hours. The data appears to have a
proper bell curve as it is not excessively flat or tall.
cor_test_A <- cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
cor_test_A
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
ggscatter(DatasetA, x = "StudyHours", y = "ExamScore", add = "reg.line",
xlab = "Study Hours", ylab = "Exam Score", title = "Study vs Exam")
Interpretation #Results
The independent variable, study hours (M = 6.14, SD = 1.37), was correlated with the dependent variable, exam score (M = 90.07, SD = 6.80), ρ(98) = .90, p < .001. The relationship was positive and strong. As study hours increased, exam scores increased. The Spearman Correlation test was selected because the variable “ExamScore” was abnormally distributed according to the histogram and the Shapiro-Wilk test (p < .05). The p-value is < 2.2e-16, which is significantly below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is 0.90. The correlation is positive, which means as study hours increase, exam scores also increase. The correlation value is greater than 0.50, which means the relationship is strong. Note: The argument “exact=FALSE” was added to the code because the data contained “ties” (duplicate values), preventing R from computing an exact p-value otherwise.
DatasetB <- read_excel("/Users/asfia/Desktop/DatasetB.xlsx")
# Independent Variable: Screen Time | Dependent Variable: Sleeping Hours
mean(DatasetB$ScreenTime); sd(DatasetB$ScreenTime)
## [1] 5.063296
## [1] 2.056833
mean(DatasetB$SleepingHours); sd(DatasetB$SleepingHours)
## [1] 6.938459
## [1] 1.351332
Interpretation: The average screen time was \(M = 5.06, SD = 2.06\). The average sleep time was \(M = 6.94, SD = 1.35\).
For ScreenTime \(p < .05\) →
variable is not normal
For SleepingHours \(p \geq .05\) →
variable is normal
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
hist(DatasetB$ScreenTime, main="Histogram of Screen Time", col="pink", breaks=20)
hist(DatasetB$SleepingHours, main="Histogram of Sleeping Hours", col="lightyellow", breaks=20)
Decision: Because screen time was not normally
distributed based on the Shapiro–Wilk test (\(p < .001\)), a Spearman
correlation will be used.The histogram shows that sleeping
hours are approximately normally distributed, with a symmetrical shape
and most values centered around the middle.
cor_test_B <- cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
cor_test_B
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
ggscatter(DatasetB, x = "ScreenTime", y = "SleepingHours", add = "reg.line",
xlab = "Screen Time", ylab = "Sleeping Hours", title = "Phone vs Sleep")
Interpretation * Direction: The line of best fit slopes downward, indicating a negative relationship. As screen time increases, sleeping hours decrease. * Strength: The points moderately cluster around the line, indicating a moderate to strong relationship. * Linearity: The points follow a generally straight-line pattern, indicating the relationship is monotonic and approximately linear, which supports using a Spearman correlation. * Outliers: A few points may appear slightly distant, but none are extreme enough to significantly distort the relationship. The Spearman Correlation test was selected because the variable “ScreenTime” was abnormally distributed according to the histogram and the Shapiro-Wilk test (p < .05). The p-value is < .001, which is significantly below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.55. The correlation is negative, which means as screen time increases, sleeping hours decrease. The correlation value is greater than -0.50 (in absolute terms), which means the relationship is strong. Note: The argument “exact=FALSE” was added to the code because the data contained “ties” (duplicate values), preventing R from computing an exact p-value otherwise.
Interpretation #Results
#Results The independent variable, screen time (M = 5.06, SD = 2.06), was correlated with the dependent variable, sleeping hours (M = 6.94, SD = 1.35), ρ(98) = −.55, p < .001. The relationship was negative and moderate. As screen time increased, sleeping hours decreased.