First Assignment

We have 4 sections

DatasetA <- read_excel(“/Users/ha113ab/Desktop/datasets/DatasetA.xlsx”)—- we are reading the datasets from the folder that it was saved in accessing it through path.

In The First Section We are doing a Descriptive Statistics: here we are calculating means and standard deviations.

mean(DatasetA$StudyHours) ---- finds the average study hours sd(DatasetA$StudyHours) —- finds how spread out the study hours are. The same concepts follows for ExamScore

Then we displayed it through

hist(DatasetA$StudyHours, main = “StudyHours”, breaks = 20, col = “orange”, border = “black”, cex.main = 1, cex.axis = 1, cex.lab = 1)

AND

hist(DatasetA$ExamScore, main = “ExamScore”, breaks = 20, col = “grey”, border = “white”, cex.main = 1, cex.axis = 1, cex.lab = 1)

In The Second Section We are doing a Normality Tests: here we are checking if the data is normally distributed, or bell-shaped. shapiro_study <- shapiro.test(DatasetA$StudyHours) tests if study hours follow a normal distribution. The same test is used for exam scores.

In The Third Section We are doing a Correlational Analysis: here we are checking relationships between variables. cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = “spearman”) checks the same but does not assume a straight-line relationship.

In The Fourth and Finak Sectionwe are basucally visualizing where hist() creates bar charts to show distributions. The first histogram displays how many students studied different amounts of hours. The second histogram shows how many students received different exam scores. ggscatter() creates a scatterplot with dots for each student and a trend line.

library(readxl)
library(ggpubr)

## Loading required package: ggplot2

DatasetA <- read_excel("DatasetA.xlsx")

mean(DatasetA$StudyHours)

## [1] 6.135609

sd(DatasetA$StudyHours)

## [1] 1.369224

mean(DatasetA$ExamScore)

## [1] 90.06906

sd(DatasetA$ExamScore)

## [1] 6.795224

hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "orange",
     border = "black")

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "grey",
     border = "white")

shapiro.test(DatasetA$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetA$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## t = 20.959, df = 98, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8606509 0.9346369
## sample estimates:
##      cor 
## 0.904214

cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")

## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "StudyHours",
  ylab = "ExamScore"
)

First Assignment

Haileab Bekele

2026-02-03