library(readxl)
library(ggpubr)
DATASET A
DatasetA <- read_excel("C:/Users/lavan/Downloads/DatasetA.xlsx")
Independant variable StudyHours and Dependant Vriable ExamScore
Descriptive Statistics
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
Part 3: Check Normality
hist(DatasetA$StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “StudyHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.
hist(DatasetA$ExamScore,
main = "ExamScores",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “ExamScore” does not appears normally distributed. The data looks negatively skewed (most data is on the right). The data does not appears to have a proper bell curve.
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
The Shaprio-Wilk p-value for StudyHours normality test is greater than .05 (.9349), so the data is normal. The Shapiro-Wilk p-value for the ExamScore normality test is less than .05 (.0064), so the data is not normal.
Part 4: Correlation Analysis
cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
The spearman Correlation test was selected because one variable is not normally distributed according to the histograms and the Shapiro-Wilk tests. The p-value for the correlation (.0001) is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The correlation value is 0.9008. The correlation is positive, which means as StudyHours increases, ExamScore increases. The correlation value is greater than 0.50 but less than 1, which means the relationship is strong.
Part 5: Scatterplots
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "StudyHours",
ylab = "ExamScore"
)
The line of best fit is pointing to the top right. This means the
diretion of the data is positive. As StudyHours increases, ExamScore
increases. The dots closely hug the line. This means there is a strong
relationship between the variables. The dots form a straight-line
pattern. This means the data is linear. There may be one or two possible
outliers, but the dots are close to the line of best fit. Therefore,
they do not appear to impact the relationship between the independent
and dependent variables.
Part 6: Report the Results Descriptive Statistics Study Hours: Mean = 6.14, SD = 1.37 Exam Score: Mean = 90.07, SD = 6.80 Normality (Shapiro–Wilk) Study Hours: p = 0.9349 Normal Exam Score: p = 0.0065 Not normal correlation results S = 16518, p-value < 2.2e-16 alternative hypothesis: true rho is not equal to 0 rho 0.9008825 The StudyHours (M = 6.14, SD = 1.37) was correlated with the ExamScore (M = 90.07, SD = 6.80), r(98) = .90, p < .05. The relationship was positive and strong. As the StudyHours increased, the ExamScore increased.
DATASET B
DatasetB <- read_excel("C:/Users/lavan/Downloads/DatasetB.xlsx")
Independant variable ScreenTime and Dependant Vriable SleepingHours
Descriptive Statistics
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
Part 3: Check Normality
hist(DatasetB$ScreenTime,
main = "ScreenTime",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “ScreenTime” does not appears normally distributed. The data looks positively skewed (most data is on the left). The data does not appears to have a proper bell curve.
hist(DatasetB$SleepingHours,
main = "SleepingHours",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “SleepingHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
The Shaprio-Wilk p-value for ScreenTime normality test is less than .05 (.00019), so the data is not normal. The Shapiro-Wilk p-value for the SleepingHours normality test is greater than .05 (.3004), so the data is normal.
Part 4: Correlation Analysis
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
The spearman Correlation test was selected because one variable is not normally distributed according to the histograms and the Shapiro-Wilk tests. The p-value for the correlation (3.521e-09) is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The correlation value is -0.5544. The correlation is negative, which means as ScreenTime increases, SleepingHours Decreases. The correlation value is greater than -0.50 but less than -1, which means the relationship is strong.
Part 5: Scatterplots
ggscatter(
DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "ScreenTime",
ylab = "SleepingHours"
)
The line of best fit is pointing downward from left to right. This means
the direction of the data is negative. As Screen Time increases,
Sleeping Hours decreaseS. The dots closely hug the line. This means
there is a strong relationship between the variables. The dots form a
straight-line pattern. This means the data is linear. There may be one
or two possible outliers, but the dots are close to the line of best
fit. Therefore, they do not appear to impact the relationship between
the independent and dependent variables.
Part 6: Report the Results Descriptive Statistics ScreenTime: Mean = 5.06, SD = 2.0568 SleepingHours: Mean = 6.9384, SD = 1.3513 Normality (Shapiro–Wilk) Study Hours: p = 0.0019 not Normal Exam Score: p = 0.3004 normal correlation results S = 259052, p-value = 3.521e-09 alternative hypothesis: true rho is not equal to 0 sample estimates: rho -0.5544674 The ScreenTime (M = 5.06, SD = 2.06) was negatively correlated with the SleepingHours (M = 6.94, SD = 1.35), r = −.54, p < .05. The relationship was negative and strong. As the ScreenTime increases, the SleepingHours decreases.