Research Question 1: Study Hours and Exam Scores

Part 1: Descriptive Statistics (Dataset A)

DatasetA <- read_excel("/Users/asfia/Desktop/DatasetA.xlsx")
# Independent Variable: Study Hours | Dependent Variable: Exam Score
mean(DatasetA$StudyHours); sd(DatasetA$StudyHours)
## [1] 6.135609
## [1] 1.369224
mean(DatasetA$ExamScore); sd(DatasetA$ExamScore)
## [1] 90.06906
## [1] 6.795224

Interpretation: The average study time was \(M = 4.97, SD = 1.37\). The average exam score was \(M = 90.07, SD = 6.80\).

Part 2 & 3: Check Normality (Dataset A)

For StudyHours \(p \geq .05\) → variable is normal
For ExamScore \(p < .05\) → variable is not normal

shapiro.test(DatasetA$StudyHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
hist(DatasetA$StudyHours, main="Histogram of Study Hours", col="lightblue", breaks=20)

hist(DatasetA$ExamScore, main="Histogram of Exam Scores", col="lightgreen", breaks=20)

Decision: Because exam scores were not normally distributed based on the Shapiro–Wilk test (\(p = 0.006\)), a Spearman correlation will be used.

Part 4 & 5: Correlation Analysis & Scatterplot (Dataset A)

cor_test_A <- cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
cor_test_A
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825
ggscatter(DatasetA, x = "StudyHours", y = "ExamScore", add = "reg.line", 
          xlab = "Study Hours", ylab = "Exam Score", title = "Study vs Exam")

Interpretation * Direction: The line of best fit slopes upward, indicating a positive relationship. As study hours increase, exam scores increase. * Strength: The points cluster very tightly around the line, indicating a strong relationship. * Linearity: The points follow a straight-line pattern, indicating the relationship is monotonic and approximately linear, which supports using a correlation analysis. * Outliers: No extreme outliers are present that would significantly distort the relationship.


Research Question 2: Phone Use and Sleep

Part 1: Descriptive Statistics (Dataset B)

DatasetB <- read_excel("/Users/asfia/Desktop/DatasetB.xlsx")
# Independent Variable: Screen Time | Dependent Variable: Sleeping Hours
mean(DatasetB$ScreenTime); sd(DatasetB$ScreenTime)
## [1] 5.063296
## [1] 2.056833
mean(DatasetB$SleepingHours); sd(DatasetB$SleepingHours)
## [1] 6.938459
## [1] 1.351332

Interpretation: The average screen time was \(M = 5.06, SD = 2.06\). The average sleep time was \(M = 6.94, SD = 1.35\).

Part 2 & 3: Check Normality (Dataset B)

For ScreenTime \(p < .05\) → variable is not normal
For SleepingHours \(p \geq .05\) → variable is normal

shapiro.test(DatasetB$ScreenTime)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
hist(DatasetB$ScreenTime, main="Histogram of Screen Time", col="pink", breaks=20)

hist(DatasetB$SleepingHours, main="Histogram of Sleeping Hours", col="lightyellow", breaks=20)

Decision: Because screen time was not normally distributed based on the Shapiro–Wilk test (\(p < .001\)), a Spearman correlation will be used.

Part 4 & 5: Correlation Analysis & Scatterplot (Dataset B)

cor_test_B <- cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
cor_test_B
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674
ggscatter(DatasetB, x = "ScreenTime", y = "SleepingHours", add = "reg.line", 
          xlab = "Screen Time", ylab = "Sleeping Hours", title = "Phone vs Sleep")

Interpretation * Direction: The line of best fit slopes downward, indicating a negative relationship. As screen time increases, sleeping hours decrease. * Strength: The points moderately cluster around the line, indicating a moderate to strong relationship. * Linearity: The points follow a generally straight-line pattern, indicating the relationship is monotonic and approximately linear, which supports using a Spearman correlation. * Outliers: A few points may appear slightly distant, but none are extreme enough to significantly distort the relationship.


Part 6: Report the Results

Interpretation Direction: The line of best fit slopes downward, which indiactes a negative relationship. As screen time increases, sleeping hours decrease.

Strength: The points are moderately cluster around the line, indicating a moderate to strong relationship.

Linearity: The points follow a straight-line pattern, which indiactes the relationship is monotonic and approximately linear, which supports using a Spearman correlation.

Outliers: A couple of points may appear slightly distant, but none are extreme enough to signify the distorted relationship.