Assignment-4

library(readxl)
library(ggpubr)

## Loading required package: ggplot2

Loading DATASET A

DatasetA <- read_excel("C:/Users/datta/Downloads/DatasetA.xlsx")

In the datasetA, Independant variable is StudyHours and Dependant Variable is ExamScore

Descriptive Statistics:

mean(DatasetA$StudyHours)

## [1] 6.135609

sd(DatasetA$StudyHours)

## [1] 1.369224

mean(DatasetA$ExamScore)

## [1] 90.06906

sd(DatasetA$ExamScore)

## [1] 6.795224

Study Hours: Mean = 6.14, SD = 1.37 Exam Score: Mean = 90.07, SD = 6.80

Part 3: Checking Normality

hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

StudyHours appears normally distributed. The histogram is approximately symmetrical with most values in the center and a bell-shaped curve.

hist(DatasetA$ExamScore,
     main = "ExamScores",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

ExamScore appears slightly non-normal. The histogram shows some skewness and does not form a perfectly symmetrical bell curve.

Checking Normality Stastistically:

shapiro.test(DatasetA$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetA$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

The Shapiro-Wilk p-value for StudyHours was greater than .05 (p = .935), indicating the data is normally distributed. The Shapiro-Wilk p-value for ExamScore was less than .05 (p = .006), indicating the data is not normally distributed.

Part 4: Correlation Analysis A Spearman correlation was selected because at least one variable in each dataset violated normality assumptions according to the histograms and Shapiro-Wilk tests

cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")

## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The Spearman correlation was statistically significant as the p-value is less than 0.05. Alternate hypothesis is supported. The rho value was .90, indicating a strong, positive relationship. As study hours increased, exam scores increased.

Part 5: Scatterplots

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "StudyHours",
  ylab = "ExamScore"
)

The line of best fit points upwards, indicating a positive relationship. The points closely follow the line, suggesting a strong relationship between the variables. As Study hours increases, the Exam score also increased. The pattern is linear, and no extreme outliers are observed. So there is no outlier that is impacting the relationship between the variables.

Part 6: Report the Results The independent variable, StudyHours (M = 6.14, SD = 1.37), was correlated with the dependent variable, ExamScore (M = 90.07, SD = 6.80), ρ(98) = .90, p < .001. The relationship was positive and strong. As study hours increased, exam scores increased.

Loading DATASET B

DatasetB <- read_excel("C:/Users/datta/Downloads/DatasetB.xlsx")

For the Dataset B, the Independant variable Screen Time and Dependant Variable is Sleeping Hours

Descriptive Statistics:

mean(DatasetB$ScreenTime)

## [1] 5.063296

sd(DatasetB$ScreenTime)

## [1] 2.056833

mean(DatasetB$SleepingHours)

## [1] 6.938459

sd(DatasetB$SleepingHours)

## [1] 1.351332

ScreenTime: Mean = 5.06, SD = 2.0568 SleepingHours: Mean = 6.9384, SD = 1.3513

Part 3: Check Normality

hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

ScreenTime appears abnormally distributed. The histogram shows noticeable skewness with values clustering toward one side.

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

SleepingHours appears normally distributed. The histogram is symmetrical with a clear bell-shaped curve.

Checking Normality Statistically:

shapiro.test(DatasetB$ScreenTime)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

shapiro.test(DatasetB$SleepingHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

The Shapiro-Wilk p-value for ScreenTime was less than .05 (p < .001), indicating the data is not normally distributed. The Shapiro-Wilk p-value for SleepingHours was greater than .05 (p = .300), indicating the data is normally distributed.

Part 4: Correlation Analysis A Spearman correlation was selected because at least one variable in each dataset violated normality assumptions according to the histograms and Shapiro-Wilk tests.

cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The p-value is less than 0.05, so Spearman correlation was statistically significant. The Alternate hypothesis is supported. The rho value was -0.55, indicating a strong, negative relationship. The correlation between variables is negative. As Screen time is increased, Sleeping Hours is decreased.

Part 5: Scatterplots

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepingHours"
)

The line of best fit is pointing downward from left to right. This indicates that the direction of the data is negative. As Screen Time increases, Sleeping Hours decreaseS. The dots closely hug the best fit line, indicating a strong relationship between variables. The dots form a straight-line pattern. This means the data is linear. There may be a outlier, however the dots closely hug the line, so there is no impact on the relationship between the variables.

Part 6: Report the Results The independent variable, ScreenTime (M = 5.06, SD = 2.06), was correlated with the dependent variable, SleepingHours (M = 6.94, SD = 1.35), ρ(98) = −.55, p < .001. The relationship was negative and moderate. As screen time increased, sleeping hours decreased.

Assignment-4

DATTA GANESH KOLLI

2026-02-08