title: “ASSIGN-4” author: “pavan teja reddy” date: “2026-02-04” output: html_document
library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetA <- read_excel("C:/Users/pooja/Downloads/DatasetA.xlsx")
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
DatasetB <- read_excel("C:/Users/pooja/Downloads/DatasetB.xlsx")
mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
hist(DatasetA$StudyHours,
main = "StudyHours",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetA$ExamScore,
main = "ExamScore",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable StudyHours appears normally distributed. The histogram looks symmetrical, with most values clustered around the center, forming a bell-shaped curve.
The variable ExamScore appears slightly skewed. Although most values are concentrated toward higher scores, the distribution does not perfectly follow a bell-shaped curve.
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
The Shapiro–Wilk p-value for StudyHours is 0.9349, which is greater than 0.05. Therefore, StudyHours is normally distributed.
The Shapiro–Wilk p-value for ExamScore is 0.0065, which is less than 0.05. Therefore, ExamScore is not normally distributed.
hist(DatasetB$ScreenTime,
main = "ScreenTime",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
hist(DatasetB$SleepingHours,
main = "SleepingHours",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
The variable ScreenTime appears abnormally distributed. The histogram
shows skewness, with values spread unevenly and a longer tail.
The variable SleepingHours appears normally distributed. The histogram is fairly symmetrical and resembles a bell-shaped curve.
shapiro.test(DatasetB$ScreenTime)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06
shapiro.test(DatasetB$SleepingHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004
The Shapiro–Wilk p-value for ScreenTime is 1.91e-06, which is less than 0.05. Therefore, ScreenTime is not normally distributed.
The Shapiro–Wilk p-value for SleepingHours is 0.3004, which is greater than 0.05. Therefore, SleepingHours is normally distributed.
cor.test(DatasetA$StudyHours,DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
The Spearman Correlation test was selected because at least one variable was abnormally distributed according to the histograms and the Shapiro–Wilk tests.
The p-value (probability value) is less than .001, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported.
The rho-value is .90.
The correlation is positive, which means as study hours increase, exam scores increase.
The correlation value is greater than .50, which means the relationship is strong.
cor.test(DatasetB$ScreenTime,DatasetB$SleepingHours, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5544674
The Spearman Correlation test was selected because at least one variable was abnormally distributed according to the histograms and the Shapiro–Wilk tests.
The p-value (probability value) is less than .001, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported.
The rho-value is -.55.
The correlation is negative, which means as screen time increases, sleeping hours decrease.
The correlation value is greater than -.50, which means the relationship is strong.
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "StudyHours",
ylab = "ExamScore"
)
The line of best fit slopes upward from left to right, indicating a
positive relationship between StudyHours and ExamScore.
The data points closely follow the line of best fit, showing a strong relationship.
The points form a clear straight-line pattern, indicating the relationship is linear.
There are no extreme outliers that appear to significantly affect the relationship.
ggscatter(
DatasetB,
x = "ScreenTime",
y = "SleepingHours",
add = "reg.line",
xlab = "ScreenTime",
ylab = "SleepingHours"
)
The line of best fit slopes downward from left to right, indicating a
negative relationship.
The points moderately cluster around the line, showing a strong relationship.
The data shows a mostly linear pattern.
No extreme outliers appear to significantly impact the relationship.
StudyHours (M = 6.14, SD = 1.37) was correlated with ExamScore (M = 90.07, SD = 6.79), ρ = .90, p < .001.
The relationship was positive and strong. As study hours increased, exam scores increased.
ScreenTime (M = 5.06, SD = 2.06) was correlated with SleepingHours (M = 6.94, SD = 1.35), ρ = -.55, p < .001.
The relationship was negative and strong. As screen time increased, sleeping hours decreased.