library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DATASET A
MEAN AND STANDARD DEVIATION
DatasetZ <- read_excel("C:/Users/Leyav/Downloads/DatasetA.xlsx")
mean(DatasetZ$StudyHours)
## [1] 6.135609
sd(DatasetZ$StudyHours)
## [1] 1.369224
mean(DatasetZ$ExamScore)
## [1] 90.06906
sd(DatasetZ$ExamScore)
## [1] 6.795224
Independent Variable- students study, Dependent variable- exam score
CHECK NORMALITY
histograms
hist(DatasetZ$StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetZ$ExamScore,
main = "ExamScore",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
Shapiro–Wilk tests for to check the normality of each variable.
shapiro.test(DatasetZ$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetZ$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetZ$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetZ$ExamScore
## W = 0.96286, p-value = 0.006465
The Shaprio-Wilk p-value for studyhours normality test is greater than .05 (.9349), so the data is normal. The Shapiro-Wilk p-value for the examscore normality test is greater than .05 (0.006465), so the data is not normal.
Since one of the variable is not normal , we need to use Spearman Correlation.
Correlation Analysis
cor.test(DatasetZ$StudyHours,DatasetZ$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetZ$StudyHours, DatasetZ$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetZ$StudyHours and DatasetZ$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
The Spearman Correlation test was selected because one variable was abnormally distributed according to the histograms and the Shapiro-Wilk tests The p-value (probability value) is <2.2e-16, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is 0.9008825 . The correlation is positive, which means as students study hours increases and their exam score increases. The correlation value is greater 0.50, which means the relationship is strong.
Scatter plots
ggscatter( DatasetZ,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "Studyhours",
ylab = "Examscore" )
The line of best fit is pointing to the top right. This means the direction of the data is positive. As studyhours increases, examscore increases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is no possible outlier.
Report the Results
The independent variable (M = 6.135609, SD = 1.369224) was correlated with the dependent variable (M = 90.06906, SD = 6.795224), ρ = 0.90, p = 0.000 The relationship was positive and strong. As the independent variable increased, the dependent variable increased.
DATASET B
MEAN AND STANDARD DEVIATION DATASET
DatasetB <- read_excel("C:/Users/Leyav/Downloads/DatasetB.xlsx")
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
Independent Variable- screen time , Dependent Variable- sleeping hours
Check Normality
histograms
hist(DatasetB$ScreenTime,
main = "ScreenTime",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetB$SleepingHours,
main = "Sleepinghours",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
Shapiro–Wilk tests for to check the normality of each variable.
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
The Shaprio-Wilk p-value for screen time normality test is Less than .05 (0.000), so the data is not normal. The Shapiro-Wilk p-value for the sleeping time normality test is greater than .05 (0.3004), so the data is normal.
Since one of the variable is not normal , we need to use Spearman Correlation.
Correlation Analysis
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
The Spearman Correlation test was selected because one variable was abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 3.521e-09, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.5544674 . The correlation is negative, which means as screen time increases and their sleeping hours decreases The correlation value is greater -0.55 which means the relationship is strong.
Scatter plots
ggscatter( DatasetB, x = "ScreenTime", y = "SleepingHours", add = "reg.line", xlab = "ScreenTime", ylab = "SleepingHours" )
The line of best fit is pointing to the bottom right. This means the direction of the data is negative. As screen time increases, sleeping time decreases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is no possible outlier.
Report the Results
The independent variable (M = 5.063296, SD = 2.056833) was correlated with the dependent variable (M = 6.938459, SD = 1.351332), ρ = -0.55, p = 0.000 The relationship was negative and strong. As the independent variable increased, the dependent variable decreased.