library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetB <- read_excel("C:/Users/tejas/Downloads/DatasetB.xlsx")
Independent variable - ScreenTime Dependent variable - SleepingHours
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean - 5.063296 sd - 2.056833 mean(DatasetB$SleepingHours)
sd(DatasetB$SleepingHours)
## [1] 1.351332
mean - 6.938459 sd - 1.351332
hist(DatasetB$ScreenTime,
main = "ScreenTime",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “ScreenTime” appears not normally distributed. The data looks positively skewed (most data is on the left). The data also appears to have a proper bell curve.
hist(DatasetB$SleepingHours,
main = "SleepingHours",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “SleepingHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
W = 0.90278, p-value = 1.914e-06(0.000001914) The Shaprio-Wilk p-value for ExamScore normality test is less than .05, so the data is not normal.
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
W = 0.98467, p-value = 0.3004
The Shaprio-Wilk p-value for StudyHours normality test is greater than .05, so the data is normal.
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
The Spearman Correlation test was selected because both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 0.000000003521, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.5544674. The correlation is negative, which means as ScreenTime increases, SleepingHours decreases. The correlation value is greater -0.50, which means the relationship is strong.
ggscatter(
DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "ScreenTime",
ylab = "SleepingHours"
)
The line of best fit is pointing to the bottom right. This means the direction of the data is negative. As ScreenTime increases, SleepingHours decreases. The dots loosely hug the line. This means there is a weak relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is possibly no outliers. Therefore, it does not appear to impact the relationship between the independent and dependent variables.