title: “ASSIGN-4” author: “pavan teja reddy” date: “2026-02-04” output: html_document

library(readxl)

library(ggpubr)
## Loading required package: ggplot2
DatasetA <- read_excel("C:/Users/pooja/Downloads/DatasetA.xlsx")

mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
DatasetB <- read_excel("C:/Users/pooja/Downloads/DatasetB.xlsx")

mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable StudyHours appears normally distributed. The histogram looks symmetrical, with most values clustered around the center, forming a bell-shaped curve.

The variable ExamScore appears slightly skewed. Although most values are concentrated toward higher scores, the distribution does not perfectly follow a bell-shaped curve.

shapiro.test(DatasetA$StudyHours) 
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

The Shapiro–Wilk p-value for StudyHours is 0.9349, which is greater than 0.05. Therefore, StudyHours is normally distributed.

The Shapiro–Wilk p-value for ExamScore is 0.0065, which is less than 0.05. Therefore, ExamScore is not normally distributed.

hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable ScreenTime appears abnormally distributed. The histogram shows skewness, with values spread unevenly and a longer tail.

The variable SleepingHours appears normally distributed. The histogram is fairly symmetrical and resembles a bell-shaped curve.

shapiro.test(DatasetB$ScreenTime) 
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

The Shapiro–Wilk p-value for ScreenTime is 1.91e-06, which is less than 0.05. Therefore, ScreenTime is not normally distributed.

The Shapiro–Wilk p-value for SleepingHours is 0.3004, which is greater than 0.05. Therefore, SleepingHours is normally distributed.

cor.test(DatasetA$StudyHours,DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The Spearman Correlation test was selected because at least one variable was abnormally distributed according to the histograms and the Shapiro–Wilk tests.

The p-value (probability value) is less than .001, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported.

The rho-value is .90.

The correlation is positive, which means as study hours increase, exam scores increase.

The correlation value is greater than .50, which means the relationship is strong.

cor.test(DatasetB$ScreenTime,DatasetB$SleepingHours, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The Spearman Correlation test was selected because at least one variable was abnormally distributed according to the histograms and the Shapiro–Wilk tests.

The p-value (probability value) is less than .001, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported.

The rho-value is -.55.

The correlation is negative, which means as screen time increases, sleeping hours decrease.

The correlation value is greater than -.50, which means the relationship is strong.

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "StudyHours",
  ylab = "ExamScore"
)

The line of best fit slopes upward from left to right, indicating a positive relationship between StudyHours and ExamScore.

The data points closely follow the line of best fit, showing a strong relationship.

The points form a clear straight-line pattern, indicating the relationship is linear.

There are no extreme outliers that appear to significantly affect the relationship.

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepingHours"
)

The line of best fit slopes downward from left to right, indicating a negative relationship.

The points moderately cluster around the line, showing a strong relationship.

The data shows a mostly linear pattern.

No extreme outliers appear to significantly impact the relationship.

StudyHours (M = 6.14, SD = 1.37) was correlated with ExamScore (M = 90.07, SD = 6.79), ρ = .90, p < .001.

The relationship was positive and strong. As study hours increased, exam scores increased.

ScreenTime (M = 5.06, SD = 2.06) was correlated with SleepingHours (M = 6.94, SD = 1.35), ρ = -.55, p < .001.

The relationship was negative and strong. As screen time increased, sleeping hours decreased.