Import Datasets

library(readxl)
DatasetA <- read_excel("D:/SLU/AdvAppliedAnalytics/DatasetA.xlsx")

Calculate mean and standard deviation

  1. Study Hours - Independent Variable
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
  1. Exam Score - Dependent Variable
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224

Create Histograms for IV - Study Hours

hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “StudyHours” appears normally distributed. The data looks symmetrical (most data is in the middle). The data also appears to have a proper bell curve.

Create Histograms for DV - Exam Scores

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “StudyHours” appears evenly distributed, negative skewed (most data is at right). The data also appears flat.

Conduct Shapiro–Wilk tests for to check the normality of each variable

  1. StudyHours
shapiro.test(DatasetA$StudyHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

The Shaprio-Wilk p-value for StudyHours normality test is greater than 0.05 (0.93), so the data is normal

  1. ExamScore
shapiro.test(DatasetA$ExamScore) 
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

The Shaprio-Wilk p-value for ExamScore normality test is less than 0.05 (0.0064), so the data is not normal

Correlation Analysis

cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The Spearman Correlation test was selected because one of the variables was abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is 2.2e-16, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is 0.9008825
The correlation is positive, which means as StudyHours increases, ExamScore increases. The correlation value is greater 0.50, which means the relationship is strong.

Scatterplots

library(ggpubr)
## Loading required package: ggplot2
ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "Study Hours",
  ylab = "Exam Score"
)

A Spearmans correlation analysis was conducted to examine the relationship between Study Hours and Exam Score The independent variable is Study Hours had a mean of 6.14 and a standard deviation of 1.37. The dependent variable is Exam Score had mean of 90.07 and a standard deviation of 6.80. Correlation coefficient rho = 0.90, p-value < 2.2e-16 which is less than 0.05 The relationship was positive and strong. As Study Hours increased the Exam Scores increased.

Results Report

Study Hours (M = 6.14, SD = 1.37) was correlated with Exam Scores (M = 90.07, SD = 6.80), ρ = 0.90, p < 0.001 The relationship was positive and strong. As the study time increased the exams scores increased.