library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetA <- read_excel("C:/Users/srina/OneDrive/Documents/Madhu Master's/Applied Analytics/DatasetA.xlsx")
DatasetB <- read_excel("C:/Users/srina/OneDrive/Documents/Madhu Master's/Applied Analytics/DatasetB.xlsx")
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$ExamScore)
## [1] 6.795224
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$StudyHours)
## [1] 1.369224
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
hist(DatasetA$StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetA$ExamScore,
main = "ExamScore",
breaks = 20,
col = "lightgreen",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “ExamScore” appears slightly skewed. Although most values
cluster toward higher scores, the distribution does not perfectly form a
bell curve, suggesting some skewness and non-ideal kurtosis.
hist(DatasetB$ScreenTime,
main = "ScreenTime",
breaks = 20,
col = "orange",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetB$SleepingHours,
main = "SleepingHours",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable “StudyHours” appears approximately normally distributed.
The histogram is fairly symmetrical with most values clustered around
the center, indicating low skewness and appropriate kurtosis.
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
The Shapiro-Wilk p-value for StudyHours was greater than .05, indicating that the data for study hours is normally distributed. The Shapiro-Wilk p-value for ExamScore was less than .05, indicating that the data for exam scores is not normally distributed.
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
The Shapiro-Wilk p-value for ScreenTime was less than .05, indicating that the data for screen time is not normally distributed. The Shapiro-Wilk p-value for SleepingHours was greater than .05, indicating that the data for sleeping hours is normally distributed.
cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
A Spearman Correlation was selected because at least one variable (ExamScore) violated the assumption of normality. The p-value for the correlation was less than .05, indicating that the results are statistically significant. The rho value was .90, indicating a strong positive relationship between study hours and exam scores.
cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
A Spearman Correlation was selected because at least one variable (ScreenTime) violated the assumption of normality. The p-value for the correlation was less than .05, indicating that the results are statistically significant. The rho value was -.55, indicating a moderate negative relationship between screen time and sleeping hours.
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "StudyHours",
ylab = "ExamScore"
)
ggscatter(
DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "ScreenTime",
ylab = "SleepingHours"
)
Study Hours and Exam Score: The independent variable, study hours (M =
6.14, SD = 1.37), was correlated with the dependent variable, exam score
(M = 90.07, SD = 6.80), ρ(98) = .90, p = .000. The relationship was
positive and strong. As study hours increased, exam scores
increased.
Screen Time and Sleeping Hours: The independent variable, screen time (M = 5.06, SD = 2.06), was correlated with the dependent variable, sleeping hours (M = 6.94, SD = 1.35), ρ(98) = −.55, p = .000. The relationship was negative and moderate. As screen time increased, sleeping hours decreased.