library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetB <- read_excel("C:/Users/varun/Downloads/DatasetB.xlsx")
 mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours) 
## [1] 1.351332
 hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1) 

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "pink",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1) 

The variable “ScreenTime” appears not normally distributed. The dataset is positively skewed (most data is on left).

The variable “SleepingHours” appears is normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.

shapiro.test(DatasetB$ScreenTime) 
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

The Shaprio-Wilk p-value for ScreenTime normality test is less than .05 (.00), so the data is NOT normal.We use Spearman Correlation.

The Shaprio-Wilk p-value for SleepingHours normality test is greater than .05 (.30), so the data is normal.We use Pearson Correlation.

cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The Spearman Correlation test was selected because ScreenTime failed Shapiro-wilk normality test.

The p-value (probability value) is 3.521e-09, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported.

The rho-value is -0.5544674.

The correlation is negative, which means as ScreenTime increases, SleepingHours decreases.

The correlation value falls within ± 0.50 to 1.00, which means the relationship is strong.

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepingHours"
)

The line of best fit points to the bottom-right, this indicates a negative relationship. Specifically, as the ScreenTime increases, the SleepingHours decreases.

The dots closely hug the line. This means there is a stronger relationship between the variables.

The dots form a straight-line pattern. This means the data is linear.

There is possibly no outlier (It does not appear to impact the relationship between the independent and dependent variables.

library(rmarkdown)