Assignment 1: Q1

library(readxl) 
library(ggpubr)

## Loading required package: ggplot2

DatasetA <- read_excel("C:/Users/Joyce/Downloads/DatasetA.xlsx")

mean(DatasetA$StudyHours)

## [1] 6.135609

sd(DatasetA$StudyHours)

## [1] 1.369224

mean(DatasetA$ExamScore)

## [1] 90.06906

sd(DatasetA$ExamScore)

## [1] 6.795224

hist(DatasetA$StudyHours,
     main = "Study Hours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetA$ExamScore,
     main = "Exam Score",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “Study Hours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve. The variable “Exam Score” does not appears normally distributed. The data does not look symmetrical (most data is not in the middle). The data does not appears to have a proper bell curve.

shapiro.test(DatasetA$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetA$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

The Shaprio-Wilk p-value for Study Hours normality test is greater than .05 (.93), so the data is normal. The Shapiro-Wilk p-value for the Exam Score normality test is less than .05 (.0065), so the data is not normal.

cor.test(DatasetA$StudyHours, 
         DatasetA$ExamScore,
         method = "spearman")

## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The Spearman Correlation test was selected because both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 2.2e-16, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is 0.90. The correlation is positive, which means as Study Hours increases, Exam Score increase. The correlation value is greater 0.50, which means the relationship is strong.

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "Study Hours",
  ylab = "Exam Score"
)

The line of best fit is pointing to the top right. This means the direction of the data is positive. As Study Hours increases, Exam Score increases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is no major outlier that appear to affect the relationship Study Hours and Exam Score.

Study hours (M = 6.14, SD = 1.37) was correlated with the exam score (M = 90.07, SD = 6.80), ρ(98) = .90, p = .001. The relationship was positive and strong. As study hours increased, exam scores increased

Assignment 1: Q1

Joyce Ben

2026-02-04