library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DATASET A
DatasetA <- read_excel("C:/Users/elapr/Downloads/DatasetA.xlsx")
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
Independent variable is StudyHours.
Dependent variable is ExamScore.
hist(DatasetA$StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetA$ExamScore,
main = "Examscore",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
The Shaprio-Wilk p-value for StudyHours normality test is greater than .05 (.93), so the data is normal.
The Shapiro-Wilk p-value for the ExamScore normality test is less than .05 (.0064), so the data is not normal.
cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
The Spearman Correlation test was selected because dependent variable(ExamScore) were abnormally distributed according to the histograms and the Shapiro-Wilk tests.
The p-value (probability value) is 2.2e-16, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported.
The rho-value is 0.9008.
The correlation is positive, which means as StudyHours increases, ExamScore increases.
The correlation value is greater than 0.50(0.9008), which means the relationship is strong.
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "StudyHours",
ylab = "ExamScore"
)
The line of best fit is pointing to the top right. This means the diretion of the data is positive. As StudyHours increases, ExamScore increases.
The dots closely hug the line. This means there is a strong relationship between the variables.
The dots form a straight-line pattern. This means the data is linear.
There is no possible outlier in this scatter plot.
StudyHours (M = 6.13, SD = 1.369) was correlated with ExamScore (M = 90.06, SD = 6.79), r.ho = 0.9008, p = 2.2e-16.
The relationship was positive and strong. As the StudyHours increased, the ExamScore increased.
DATASET B
DatasetB <- read_excel("C:/Users/elapr/Downloads/DatasetB.xlsx")
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
Independent variable is ScreenTime.
Dependent variable is SleepingHours.
hist(DatasetB$ScreenTime,
main = "ScreenTime",
breaks = 20,
col = "Red",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetB$SleepingHours,
main = "SleepingHours",
breaks = 20,
col = "Blue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
The Shaprio-Wilk p-value for ScreenTime normality test is less than .05 (.000001914), so the data is not normal.
The Shapiro-Wilk p-value for the ExamScore normality test is greater than .05 (.3004), so the data is normal.
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
The Spearman Correlation test was selected because independent variable(ScreenTime) were abnormally distributed according to the histograms and the Shapiro-Wilk tests.
The p-value (probability value) is 3.521e-09, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported.
The rho-value is -0.5544.
The correlation is negative, which means as ScreenTime increases, SleepingHours Decreases.
The correlation value is -0.5544, which means the relationship is strong.
ggscatter(
DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "ScreenTime",
ylab = "SleepingHours"
)
The line of best fit is pointing to the bottom right. This means the diretion of the data is negative. As ScreenTime increases, SleepingHours decreases.
The dots closely hug the line. This means there is a strong relationship between the variables.
The dots form a straight-line pattern. This means the data is linear.
There is no possible outlier in this scatter plot.
ScreenTime (M = 5.063, SD = 2.056) was correlated with SleepingHours (M = 6.938, SD = 1.3513), r.ho = -0.5544, p = 3.521e-09.
The relationship was negative and strong. As the ScreenTime increased, the SleepingHours Decreased.