library(readxl)
DatasetA <- read_excel("D:/SLU/AdvAppliedAnalytics/DatasetA.xlsx")
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
hist(DatasetA$StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “StudyHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.
hist(DatasetA$ExamScore,
main = "ExamScore",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “StudyHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
The Shaprio-Wilk p-value for StudyHours normality test is greater than 0.05 (0.93), so the data is normal
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
The Shaprio-Wilk p-value for ExamScore normality test is less than 0.05 (0.0064), so the data is not normal
cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
The Spearman Correlation test was selected because one of the
variables was abnormally distributed according to the histograms and the
Shapiro-Wilk tests. The p-value (probability value) is 2.2e-16, which is
below .05. This means the results are statistically significant. The
alternate hypothesis is supported. The rho-value is 0.9008825
The correlation is positive, which means as StudyHours increases,
ExamScore increases. The correlation value is greater 0.50, which means
the relationship is strong.
library(ggpubr)
## Loading required package: ggplot2
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "Study Hours",
ylab = "Exam Score"
)
A Spearmans correlation analysis was conducted to examine the relationship between Study Hours and Exam Score The independent variable is Study Hours had a mean of 6.14 and a standard deviation of 1.37. The dependent variable is Exam Score had mean of 90.07 and a standard deviation of 6.80. Correlation coefficient rho = 0.90, p-value < 2.2e-16 which is less than 0.05 The relationship was positive and strong. As Study Hours increased the Exam Scores increased.
Study Hours (M = 6.14, SD = 1.37) was correlated with Exam Scores (M = 90.07, SD = 6.80), ρ = 0.90, p < 0.001 The relationship was positive and strong. As the study time increased the exams scores increased.