test2

library(readxl)
library(ggpubr)

## Loading required package: ggplot2

DatasetA <- read_excel("C:/Users/tanie/Downloads/DatasetA.xlsx")
DatasetB <- read_excel("C:/Users/tanie/Downloads/DatasetB.xlsx")
mean(DatasetA$StudyHours)

## [1] 6.135609

sd(DatasetA$StudyHours)

## [1] 1.369224

mean(DatasetA$ExamScore)

## [1] 90.06906

sd(DatasetA$ExamScore)

## [1] 6.795224

hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “StudyHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve. The variable “ExamScore” appears abnormally distributed. The data looks negatively skewed (most data is on the right).

shapiro.test(DatasetA$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetA$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")

## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The Spearman Correlation test was selected because both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 2.2e-16, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -.9008825. The correlation is positive, which means as study hours increases, exam score increases. The correlation value is greater -0.50, which means the relationship is strong.

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "StudyHours",
  ylab = "ExamScore"
)

The line of best fit is pointing to the top right. This means the direction of the data is positive. As StudyHours increases, ExamScore increases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear.

mean(DatasetB$ScreenTime)

## [1] 5.063296

sd(DatasetB$ScreenTime)

## [1] 2.056833

mean(DatasetB$SleepingHours)

## [1] 6.938459

sd(DatasetB$SleepingHours)

## [1] 1.351332

hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "red",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "blue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “ScreenTime” appears abnormally distributed. The data looks positively skewed (most data is on the left). The variable “SleepingHours” appears normally distributed. The data looks symmetrical (most data is in the middle).It has a proper bell curve.

shapiro.test(DatasetB$ScreenTime)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

shapiro.test(DatasetB$SleepingHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The Spearman Correlation test was selected because both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 3.521e-09, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.5544674. The correlation is negative, which means as screen time increases, sleeping hours decreases. The correlation value is greater -0.50, which means the relationship is strong.

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepingHours"
)

The line of best fit is pointing top left. This means the direction of the data is negative. As screen time increases, sleeping hours decreases. The dots are not closely hugging the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is possibly one outlier (the individual who has screen time for 3 hours and sleeps more than 9 hours).

test2

Tanaka

2026-02-04