Second Assignment

PUBLISHED LINK:

library(readxl)

#library(readxl): Loads the readxl package so R can read Excel files (.xlsx).

library(ggpubr)

## Loading required package: ggplot2

# library(ggpubr):Loads the ggpubr package, which makes it easier to create publication-style plots (like scatter plots with regression lines).

DatasetB <- read_excel("DatasetB.xlsx")

#DatasetB <- read_excel("DatasetB.xlsx"): Reads the Excel file DatasetA.xlsx 

mean(DatasetB$ScreenTime)

## [1] 5.063296

sd(DatasetB$ScreenTime)

## [1] 2.056833

# mean(DatasetB$ScreenTime) ---- finds the average ScreenTime; Study hours had a mean of 5.063 hours 
# sd(DatasetB$ScreenTime) ---- finds how spread out the study hours are.  Study hours had standard deviation (SD = 2.056.

mean(DatasetB$SleepingHours)

## [1] 6.938459

sd(DatasetB$SleepingHours)

## [1] 1.351332

# mean(DatasetB$SleepingHours) ---- finds the average exam score Exam scores had a mean of 6.938
# sd(DatasetB$SleepingHour) ---- finds how spread out the exam scores are. Exam Score had Standard deviation (SD = 1.351).




hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "navyblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

# Here we are building the Histogram Visualization for StudyHours
#The variable "ScreenTime" may follow a roughly normal distribution, as the frequency appears to peak in the middle range of values (around 4–6 hours) and decreases toward both lower (2–4 hours) and higher (8–10 hours) ends.


hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "green",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

# Here we are building the Histogram Visualization for SleepingHours
# The variable "SleepingHours" does not appear normally distributed. The data does not look symmetrical and does not form a proper bell curve, as the frequency values are evenly listed (0–11) rather than concentrated in the middle. This suggests a more uniform or skewed distribution rather than a normal one.


shapiro.test(DatasetB$ScreenTime)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

# Here we are conducting the Shapiro-Wilk test to determine whether a dataset follows a normal distribution or not. from this point we can check the null hypothesis and say that the data is drawn from a normally distributed set of data or not.

# The Shapiro-Wilk p-value for the ScreenTime normality test is less than .05 (0.000), so the data is  normally distributed.


shapiro.test(DatasetB$SleepingHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

# Here we are conducting the Shapiro-Wilk test to determine whether a dataset follows a normal distribution. from this point we can check the null hypothesis and say that the data is drawn from a normally distributed set of data with that being said; 

# The Shapiro-Wilk p-value for the SleepHours normality test is greater than .05 ( 0.300), so the data is  normally distributed.



cor.test(
           DatasetB$ScreenTime, 
           DatasetB$SleepingHours, 
           method = "spearman"
           
           )

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

# The Spearman Correlation test was selected because both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests.
# The p-value (probability value) is .000, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported.
# The rho-value is -0.554.
# The correlation is negative, which means as coffee drinking increases, hours sleeping decreases.
# The correlation value is less than -0.50, which means the relationship is moderately strong to strong.

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepHours"
)

# The line of best fit is pointing to the bottom right. This means the direction of the data is negative. As ScreenTime increases, SleepHours decreases.
# The dots closely hug the line. This means there is a strong relationship between the variables.
# The dots form a straight-line pattern. This means the data is linear.
# There is possibly one outlier (the individual with ScreenTime 2.5 hours and SleepHours 8.5). However, the dot is toward the upper left of the line of best fit. Therefore, it does not appear to strongly impact the relationship between the independent and dependent variables.

Second Assignment

Haileab Bekele

2026-02-04