library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetA <- read_excel("C:/Users/srina/OneDrive/Documents/Madhu Master's/Applied Analytics/DatasetA.xlsx")
DatasetB <- read_excel("C:/Users/srina/OneDrive/Documents/Madhu Master's/Applied Analytics/DatasetB.xlsx")

mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$ExamScore)
## [1] 6.795224
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$StudyHours)
## [1] 1.369224
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetA$ExamScore,
          main = "ExamScore",
           breaks = 20,
           col = "lightgreen",
           border = "white",
   cex.main = 1,
      cex.axis = 1,
      cex.lab = 1)

The variable “ExamScore” appears slightly skewed. Although most values cluster toward higher scores, the distribution does not perfectly form a bell curve, suggesting some skewness and non-ideal kurtosis.

hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "orange",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “StudyHours” appears approximately normally distributed. The histogram is fairly symmetrical with most values clustered around the center, indicating low skewness and appropriate kurtosis.

shapiro.test(DatasetA$ExamScore)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
shapiro.test(DatasetA$StudyHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

The Shapiro-Wilk p-value for StudyHours was greater than .05, indicating that the data for study hours is normally distributed. The Shapiro-Wilk p-value for ExamScore was less than .05, indicating that the data for exam scores is not normally distributed.

shapiro.test(DatasetB$ScreenTime) 
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

The Shapiro-Wilk p-value for ScreenTime was less than .05, indicating that the data for screen time is not normally distributed. The Shapiro-Wilk p-value for SleepingHours was greater than .05, indicating that the data for sleeping hours is normally distributed.

cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

A Spearman Correlation was selected because at least one variable (ExamScore) violated the assumption of normality. The p-value for the correlation was less than .05, indicating that the results are statistically significant. The rho value was .90, indicating a strong positive relationship between study hours and exam scores.

cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

A Spearman Correlation was selected because at least one variable (ScreenTime) violated the assumption of normality. The p-value for the correlation was less than .05, indicating that the results are statistically significant. The rho value was -.55, indicating a moderate negative relationship between screen time and sleeping hours.

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "StudyHours",
  ylab = "ExamScore"
)

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepingHours"
)

Study Hours and Exam Score: The independent variable, study hours (M = 6.14, SD = 1.37), was correlated with the dependent variable, exam score (M = 90.07, SD = 6.80), ρ(98) = .90, p = .000. The relationship was positive and strong. As study hours increased, exam scores increased.

Screen Time and Sleeping Hours: The independent variable, screen time (M = 5.06, SD = 2.06), was correlated with the dependent variable, sleeping hours (M = 6.94, SD = 1.35), ρ(98) = −.55, p = .000. The relationship was negative and moderate. As screen time increased, sleeping hours decreased.