Import Datasets

library(readxl)
DatasetA <- read_excel("D:/SLU/AdvAppliedAnalytics/DatasetA.xlsx")

Calculate mean and standard deviation

  1. Study Hours - Independent Variable
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
  1. Exam Score - Dependent Variable
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224

Create Histograms for IV - Study Hours

hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “StudyHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.

Create Histograms for DV - Exam Scores

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “StudyHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.

Conduct Shapiro–Wilk tests for to check the normality of each variable

  1. StudyHours
shapiro.test(DatasetA$StudyHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

The Shaprio-Wilk p-value for StudyHours normality test is greater than 0.05 (0.93), so the data is normal

  1. ExamScore
shapiro.test(DatasetA$ExamScore) 
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

The Shaprio-Wilk p-value for ExamScore normality test is less than 0.05 (0.0064), so the data is not normal

Correlation Analysis

cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The Spearman Correlation test was selected because one of the variables was abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 2.2e-16, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is 0.9008825
The correlation is positive, which means as StudyHours increases, ExamScore increases. The correlation value is greater 0.50, which means the relationship is strong.

Scatterplots

library(ggpubr)
## Loading required package: ggplot2
ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "Study Hours",
  ylab = "Exam Score"
)

A Spearmans correlation analysis was conducted to examine the relationship between Study Hours and Exam Score The independent variable is Study Hours had a mean of 6.14 and a standard deviation of 1.37. The dependent variable is Exam Score had mean of 90.07 and a standard deviation of 6.80. Correlation coefficient rho = 0.90, p-value < 2.2e-16 which is less than 0.05 The relationship was positive and strong. As Study Hours increased the Exam Scores increased.

Results Report

Study Hours (M = 6.14, SD = 1.37) was correlated with Exam Scores (M = 90.07, SD = 6.80), ρ = 0.90, p < 0.001 The relationship was positive and strong. As the study time increased the exams scores increased.