library(readxl)
DatasetA <- read_excel("D:/SLU/AdvAppliedAnalytics/DatasetA.xlsx")
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
The independent variable is Study Hours had a mean of 6.14 and a standard deviation of 1.37.
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
The dependent variable is Exam Score had mean of 90.07 and a standard deviation of 6.80.
hist(DatasetA$StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “StudyHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.
hist(DatasetA$ExamScore,
main = "ExamScore",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “ExamScore” appears uneven distributed. The data looks negative skewed (some data is in the right). The data does not have a proper bell curve its flat.
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
The Shaprio-Wilk p-value for StudyHours normality test is greater than 0.05 (0.93), so the data is normal
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
The Shaprio-Wilk p-value for ExamScore normality test is less than 0.05 (0.0064), so the data is not normal
cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
The Spearman Correlation test was selected because one of the
variables was abnormally distributed according to the histograms and the
Shapiro-Wilk tests. The p-value (probability value) is 2.2e-16, which is
below .05. This means the results are statistically significant. The
alternate hypothesis is supported. The rho-value is 0.9008825
The correlation is positive, which means as StudyHours increases,
ExamScore increases. The correlation value is greater 0.50, which means
the relationship is strong.
library(ggpubr)
## Loading required package: ggplot2
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "Study Hours",
ylab = "Exam Score"
)
The line of best fit is pointing to the top right. This means the direction of the data is positive. As study hours increases, exam scores increases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There may be a possible outlier (a student who studied fewer hours but earned a relatively high score); however, the point is still close to the line of best fit and does not appear to significantly affect the overall relationship between the independent and dependent variables.
A Spearmans correlation analysis was conducted to examine the relationship between Study Hours and Exam Score. The independent variable is Study Hours had a mean of 6.14 and a standard deviation of 1.37. The dependent variable is Exam Score had mean of 90.07 and a standard deviation of 6.80. Correlation coefficient rho = 0.90, p-value < 2.2e-16 which is less than 0.05 The relationship was positive and strong. As Study Hours increased the Exam Scores increased.
Study Hours (M = 6.14, SD = 1.37) was correlated with Exam Scores (M = 90.07, SD = 6.80), ρ = 0.90, p < 0.001 The relationship was positive and strong. As the study time increased the exams scores increased.