Assignment4

library(readxl) 
library(ggpubr)

## Loading required package: ggplot2

DATASET A

MEAN AND STANDARD DEVIATION

DatasetZ <- read_excel("C:/Users/Leyav/Downloads/DatasetA.xlsx")

mean(DatasetZ$StudyHours)

## [1] 6.135609

sd(DatasetZ$StudyHours)

## [1] 1.369224

mean(DatasetZ$ExamScore)

## [1] 90.06906

sd(DatasetZ$ExamScore)

## [1] 6.795224

Independent Variable- students study, Dependent variable- exam score

CHECK NORMALITY

histograms

hist(DatasetZ$StudyHours, 
     main = "StudyHours",  
     breaks = 20,  
     col = "lightblue",  
     border = "white",  
     cex.main = 1, 
     cex.axis = 1, 
     cex.lab = 1)

hist(DatasetZ$ExamScore, 
     main = "ExamScore", 
     breaks = 20, 
     col = "lightcoral", 
     border = "white",  
     cex.main = 1,  
     cex.axis = 1,  
     cex.lab = 1)

Shapiro–Wilk tests for to check the normality of each variable.

shapiro.test(DatasetZ$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetZ$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetZ$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetZ$ExamScore
## W = 0.96286, p-value = 0.006465

The Shaprio-Wilk p-value for studyhours normality test is greater than .05 (.9349), so the data is normal. The Shapiro-Wilk p-value for the examscore normality test is greater than .05 (0.006465), so the data is not normal.

Since one of the variable is not normal , we need to use Spearman Correlation.

Correlation Analysis

cor.test(DatasetZ$StudyHours,DatasetZ$ExamScore, method = "spearman")

## Warning in cor.test.default(DatasetZ$StudyHours, DatasetZ$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetZ$StudyHours and DatasetZ$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The Spearman Correlation test was selected because one variable was abnormally distributed according to the histograms and the Shapiro-Wilk tests The p-value (probability value) is <2.2e-16, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is 0.9008825 . The correlation is positive, which means as students study hours increases and their exam score increases. The correlation value is greater 0.50, which means the relationship is strong.

Scatter plots

 ggscatter(   DatasetZ, 
 x = "StudyHours", 
 y = "ExamScore",
 add = "reg.line", 
 xlab = "Studyhours", 
 ylab = "Examscore" )

The line of best fit is pointing to the top right. This means the direction of the data is positive. As studyhours increases, examscore increases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is no possible outlier.

Report the Results

The independent variable (M = 6.135609, SD = 1.369224) was correlated with the dependent variable (M = 90.06906, SD = 6.795224), ρ = 0.90, p = 0.000 The relationship was positive and strong. As the independent variable increased, the dependent variable increased.

DATASET B

MEAN AND STANDARD DEVIATION DATASET

DatasetB <- read_excel("C:/Users/Leyav/Downloads/DatasetB.xlsx")

mean(DatasetB$ScreenTime)

## [1] 5.063296

sd(DatasetB$ScreenTime)

## [1] 2.056833

mean(DatasetB$SleepingHours)

## [1] 6.938459

sd(DatasetB$SleepingHours)

## [1] 1.351332

Independent Variable- screen time , Dependent Variable- sleeping hours

Check Normality

histograms

hist(DatasetB$ScreenTime, 
     main = "ScreenTime", 
     breaks = 20,  
     col = "lightblue", 
     border = "white", 
     cex.main = 1,  
     cex.axis = 1,  
     cex.lab = 1)

hist(DatasetB$SleepingHours,  
     main = "Sleepinghours",   
     breaks = 20,  
     col = "lightcoral",  
     border = "white",  
     cex.main = 1, 
     cex.axis = 1,  
     cex.lab = 1)

Shapiro–Wilk tests for to check the normality of each variable.

shapiro.test(DatasetB$ScreenTime)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

shapiro.test(DatasetB$SleepingHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

The Shaprio-Wilk p-value for screen time normality test is Less than .05 (0.000), so the data is not normal. The Shapiro-Wilk p-value for the sleeping time normality test is greater than .05 (0.3004), so the data is normal.

Since one of the variable is not normal , we need to use Spearman Correlation.

Correlation Analysis

cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The Spearman Correlation test was selected because one variable was abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 3.521e-09, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.5544674 . The correlation is negative, which means as screen time increases and their sleeping hours decreases The correlation value is greater -0.55 which means the relationship is strong.

Scatter plots

ggscatter(   DatasetB,   x = "ScreenTime",   y = "SleepingHours",   add = "reg.line",   xlab = "ScreenTime",   ylab = "SleepingHours" )

The line of best fit is pointing to the bottom right. This means the direction of the data is negative. As screen time increases, sleeping time decreases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is no possible outlier.

Report the Results

The independent variable (M = 5.063296, SD = 2.056833) was correlated with the dependent variable (M = 6.938459, SD = 1.351332), ρ = -0.55, p = 0.000 The relationship was negative and strong. As the independent variable increased, the dependent variable decreased.

Assignment4

Leya

2026-02-04