install.packages(“readxl”) install.packages(“ggpubr”)

library(readxl)
library(ggpubr)
## Loading required package: ggplot2

#Question 1

DatasetA <- read_excel("C:/Users/pooja/Downloads/DatasetA.xlsx")
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable StudyHours appears normally distributed. The histogram looks symmetrical, with most values clustered around the center, forming a bell-shaped curve.

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable ExamScore appears slightly skewed. Although most values are concentrated toward higher scores, the distribution does not perfectly follow a bell-shaped curve.

shapiro.test(DatasetA$StudyHours) 
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

The Shapiro–Wilk p-value for StudyHours is 0.9349, which is greater than 0.05. Therefore, StudyHours is normally distributed.

shapiro.test(DatasetA$ExamScore)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

The Shapiro–Wilk p-value for ExamScore is 0.0065, which is less than 0.05. Therefore, ExamScore is not normally distributed.

cor.test(DatasetA$StudyHours,DatasetA$ExamScore, method =  "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The Spearman Correlation test was selected because at least one variable was abnormally distributed according to the histograms and the Shapiro–Wilk tests. The p-value (probability value) is less than .001, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is .90. The correlation is positive, which means as study hours increase, exam scores increase. The correlation value is greater than .50, which means the relationship is strong.

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "StudyHours",
  ylab = "ExamScore"
)

The line of best fit slopes upward from left to right, indicating a positive relationship between StudyHours and ExamScore. The data points closely follow the line of best fit, showing a strong relationship. The points form a clear straight-line pattern, indicating the relationship is linear. There are no extreme outliers that appear to significantly affect the relationship.

StudyHours (M = 6.14, SD = 1.37) was correlated with ExamScore (M = 90.07, SD = 6.79), ρ = .90, p < .001. The relationship was positive and strong. As study hours increased, exam scores increased.

Question 2

DatasetB <- read_excel("C:/Users/pooja/Downloads/DatasetB.xlsx")

mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable ScreenTime appears abnormally distributed. The histogram shows skewness, with values spread unevenly and a longer tail.

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable SleepingHours appears normally distributed. The histogram is fairly symmetrical and resembles a bell-shaped curve.

shapiro.test(DatasetB$ScreenTime) 
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

The Shapiro–Wilk p-value for ScreenTime is 1.91e-06, which is less than 0.05. Therefore, ScreenTime is not normally distributed.

shapiro.test(DatasetB$SleepingHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

The Shapiro–Wilk p-value for SleepingHours is 0.3004, which is greater than 0.05. Therefore, SleepingHours is normally distributed

cor.test(DatasetB$ScreenTime,DatasetB$SleepingHours, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The Spearman Correlation test was selected because at least one variable was abnormally distributed according to the histograms and the Shapiro–Wilk tests. The p-value (probability value) is less than .001, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -.55. The correlation is negative, which means as screen time increases, sleeping hours decrease. The correlation value is greater than -.50, which means the relationship is strong.

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepingHours"
)

The line of best fit slopes downward from left to right, indicating a negative relationship. The points moderately cluster around the line, showing a strong relationship. The data shows a mostly linear pattern. No extreme outliers appear to significantly impact the relationship

ScreenTime (M = 5.06, SD = 2.06) was correlated with SleepingHours (M = 6.94, SD = 1.35), ρ = -.55, p < .001. The relationship was negative and strong. As screen time increased, sleeping hours decreased.