Part 1: Set Up and Import Data

# Loading required packages
library(readxl)
library(ggpubr)

## Loading required package: ggplot2

# Importing datasets from the computer
DatasetA <- read_excel("/Users/asfia/Desktop/DatasetA.xlsx")
DatasetB <- read_excel("/Users/asfia/Desktop/DatasetB.xlsx")

Research Question 1: Study Hours and Exam Scores

Part 1: Descriptive Statistics (Dataset A)

DatasetA <- read_excel("/Users/asfia/Desktop/DatasetA.xlsx")
# Independent Variable: Study Hours | Dependent Variable: Exam Score
mean(DatasetA$StudyHours); sd(DatasetA$StudyHours)

## [1] 6.135609

## [1] 1.369224

mean(DatasetA$ExamScore); sd(DatasetA$ExamScore)

## [1] 90.06906

## [1] 6.795224

Interpretation: The average study time was \(M = 4.97, SD = 1.37\). The average exam score was \(M = 90.07, SD = 6.80\).

Part 2 & 3: Check Normality (Dataset A)

For StudyHours \(p \geq .05\) → variable is normal
For ExamScore \(p < .05\) → variable is not normal

shapiro.test(DatasetA$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetA$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

hist(DatasetA$StudyHours, main="Histogram of Study Hours", col="lightblue", breaks=20)

hist(DatasetA$ExamScore, main="Histogram of Exam Scores", col="lightgreen", breaks=20)

Decision: Because exam scores were not normally distributed based on the Shapiro–Wilk test (\(p = 0.006\)), a Spearman correlation will be used. The variable “StudyHours” appears normally distributed. The data looks symmetrical as most of the data is clustered in the centre for around 5-7 hours. The data appears to have a proper bell curve as it is not excessively flat or tall.

Part 4 & 5: Correlation Analysis & Scatterplot (Dataset A)

cor_test_A <- cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")

## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

cor_test_A

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

ggscatter(DatasetA, x = "StudyHours", y = "ExamScore", add = "reg.line", 
          xlab = "Study Hours", ylab = "Exam Score", title = "Study vs Exam")

Interpretation #Results

The independent variable, study hours (M = 6.14, SD = 1.37), was correlated with the dependent variable, exam score (M = 90.07, SD = 6.80), ρ(98) = .90, p < .001. The relationship was positive and strong. As study hours increased, exam scores increased. The Spearman Correlation test was selected because the variable “ExamScore” was abnormally distributed according to the histogram and the Shapiro-Wilk test (p < .05). The p-value is < 2.2e-16, which is significantly below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is 0.90. The correlation is positive, which means as study hours increase, exam scores also increase. The correlation value is greater than 0.50, which means the relationship is strong. Note: The argument “exact=FALSE” was added to the code because the data contained “ties” (duplicate values), preventing R from computing an exact p-value otherwise.

Research Question 2: Phone Use and Sleep

Part 1: Descriptive Statistics (Dataset B)

DatasetB <- read_excel("/Users/asfia/Desktop/DatasetB.xlsx")
# Independent Variable: Screen Time | Dependent Variable: Sleeping Hours
mean(DatasetB$ScreenTime); sd(DatasetB$ScreenTime)

## [1] 5.063296

## [1] 2.056833

mean(DatasetB$SleepingHours); sd(DatasetB$SleepingHours)

## [1] 6.938459

## [1] 1.351332

Interpretation: The average screen time was \(M = 5.06, SD = 2.06\). The average sleep time was \(M = 6.94, SD = 1.35\).

Part 2 & 3: Check Normality (Dataset B)

For ScreenTime \(p < .05\) → variable is not normal
For SleepingHours \(p \geq .05\) → variable is normal

shapiro.test(DatasetB$ScreenTime)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

shapiro.test(DatasetB$SleepingHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

hist(DatasetB$ScreenTime, main="Histogram of Screen Time", col="pink", breaks=20)

hist(DatasetB$SleepingHours, main="Histogram of Sleeping Hours", col="lightyellow", breaks=20)

Decision: Because screen time was not normally distributed based on the Shapiro–Wilk test (\(p < .001\)), a Spearman correlation will be used.The histogram shows that sleeping hours are approximately normally distributed, with a symmetrical shape and most values centered around the middle.

Part 4 & 5: Correlation Analysis & Scatterplot (Dataset B)

cor_test_B <- cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
cor_test_B

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

ggscatter(DatasetB, x = "ScreenTime", y = "SleepingHours", add = "reg.line", 
          xlab = "Screen Time", ylab = "Sleeping Hours", title = "Phone vs Sleep")

Interpretation * Direction: The line of best fit slopes downward, indicating a negative relationship. As screen time increases, sleeping hours decrease. * Strength: The points moderately cluster around the line, indicating a moderate to strong relationship. * Linearity: The points follow a generally straight-line pattern, indicating the relationship is monotonic and approximately linear, which supports using a Spearman correlation. * Outliers: A few points may appear slightly distant, but none are extreme enough to significantly distort the relationship. The Spearman Correlation test was selected because the variable “ScreenTime” was abnormally distributed according to the histogram and the Shapiro-Wilk test (p < .05). The p-value is < .001, which is significantly below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.55. The correlation is negative, which means as screen time increases, sleeping hours decrease. The correlation value is greater than -0.50 (in absolute terms), which means the relationship is strong. Note: The argument “exact=FALSE” was added to the code because the data contained “ties” (duplicate values), preventing R from computing an exact p-value otherwise.

Part 6: Report the Results

Interpretation #Results

#Results The independent variable, screen time (M = 5.06, SD = 2.06), was correlated with the dependent variable, sleeping hours (M = 6.94, SD = 1.35), ρ(98) = −.55, p < .001. The relationship was negative and moderate. As screen time increased, sleeping hours decreased.

Assignment 4: Pearson and Spearman Correlations

Asfiyakhanam

2026-02-04