library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetA <- read_excel("/Users/mbongenimoyo/Downloads/DatasetA.xlsx")
DatasetB <- read_excel("/Users/mbongenimoyo/Downloads/DatasetB.xlsx")
mean(DatasetA $StudyHours)
## [1] 6.135609
sd(DatasetA $StudyHours)
## [1] 1.369224
mean(DatasetA $ExamScore)
## [1] 90.06906
sd(DatasetA $ExamScore)
## [1] 6.795224
hist(DatasetA $StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetA $ExamScore,
main = "ExamScore",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “StudyHours” appears normally distributed. The data looks
symmetrical (most data is in the middle). The data also appears to have
a proper bell curve. The variable “ExamScore” appears abnormally
distributed. The data looks negatively skewed (most appears on the
right).
shapiro.test(DatasetA $StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA $ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
cor.test(DatasetA $StudyHours, DatasetA $ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
The Spearman Correlation test was selected because both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 2.2e-16, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is 0.9008825 The correlation is positive, which means as StudyHours increases, ExamScore increases. The correlation value is greater 0.50, which means the relationship is strong. Degrees of freedom is 2
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "StudyHours",
ylab = "ExamScore"
)
The line of best fit is pointing to the top right. This means the diretion of the data is positive which means as StudyHours increases, ExamScore increases The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear.
mean(DatasetB $ScreenTime)
## [1] 5.063296
sd(DatasetB $ScreenTime)
## [1] 2.056833
mean(DatasetB $ScreenTime)
## [1] 5.063296
sd(DatasetB $ScreenTime)
## [1] 2.056833
hist(DatasetB $ScreenTime,
main = "ScreenHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetB $SleepingHours,
main = "SleepingHours",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “ScreenTime” appears abnormally distributed. The data positively skewed (most data from the left to the right). The variable “SleepingHours” appears normally distributed. The data looks symmetric skewed (most appears on the middle).Th data appears to have a proper bell curve.
shapiro.test(DatasetB $ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB $SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
cor.test(DatasetB $ScreenTime, DatasetB $SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
The Spearman Correlation test was selected because both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 3.521e-09, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.5544674 . The correlation is negative, which means as sleep time increases, hours sleeping decreases. The correlation value is greater -0.50, which means the relationship is strong.
ggscatter(
DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "ScreenTime",
ylab = "SleepingHours"
)
The line of best fit is pointing to the top left. This means the diretion of the data is negative. As ScreenTime increases, SleepingHours decreases. The dots are not close the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is possibly one outlier (the individual who has screeen time of 3 hours and sleeps more than 9 hours. However, the dot is towards the top left.