Assignment4

library(readxl)
library(ggpubr)

## Loading required package: ggplot2

Loaded required Packages

PART -1 IMPORT THE DATASET A

DatasetA <- read_excel("C:/Users/spvar/Downloads/DatasetA.xlsx")

PART -2 DESCRIPTIVE STATISTICS

mean(DatasetA$StudyHours)

## [1] 6.135609

sd(DatasetA$StudyHours)

## [1] 1.369224

sd(DatasetA$ExamScore)

## [1] 6.795224

mean(DatasetA$ExamScore)

## [1] 90.06906

The mean of the independent variable StudyHours is 6.135609 and the SD is 1.369224

The mean of the dependent variable Examscore is 90.06906 and the SD is 6.795224

PART-3 CHECK NORMALITY

hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “StudyHours” appears approximately normally distributed. The histogram is fairly symmetrical with most values clustered around the center, indicating low skewness and appropriate kurtosis.

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "lightgreen",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “ExamScore” appears slightly skewed. Although most values cluster toward higher scores, the distribution does not perfectly form a bell curve, suggesting some skewness and non-ideal kurtosis.

PART - 4 CORRELATION ANALYSIS

shapiro.test(DatasetA$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetA$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

The Shapiro-Wilk p-value for StudyHours was greater than .05, indicating that the data for study hours is normally distributed. The Shapiro-Wilk p-value for ExamScore was less than .05, indicating that the data for exam scores is not normally distributed.

PART-5 SCATTERPLOTS

cor.test(DatasetA$StudyHours,DatasetA$ExamScore, method ="spearman")

## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

A Spearman Correlation was selected because at least one variable (ExamScore) violated the assumption of normality. The p-value for the correlation was less than .05, indicating that the results are statistically significant.The alternate hypothesis is supported. The rho value was .90. The correlation is positive, which means as Study Hours increases, Exam Score increases, indicating a strong positive relationship between study hours and exam scores.

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "StudyHours",
  ylab = "ExamScore"
)

The line of best fit is pointing to the top right. This means the diretion of the data is positive. As Study Hours increases, Exam Score increases.The dots closely hug the line indicating the strong relationship between the variables.The dots form a straight-line pattern. This means the data is linear.There is may be a outlier, however the dot is towards the center of the line of best fit. Therefore, it does not appear to impact the relationship between the independent and dependent variables.

PART- 6 REPORT THE RESULTS

Study Hours and Exam Score: The independent variable, study hours (M = 6.14, SD = 1.37), was correlated with the dependent variable, exam score (M = 90.07, SD = 6.80), ρ(98) = .90, p = .000. The relationship was positive and strong. As study hours increased, exam scores increased.

PART 1 : IMPORT THE DATA SET B

DatasetB <- read_excel("C:/Users/spvar/Downloads/DatasetB.xlsx")

PART 2: DESCRIPTIVE STATISTICS

mean(DatasetB$ScreenTime)

## [1] 5.063296

sd(DatasetB$ScreenTime)

## [1] 2.056833

mean(DatasetB$SleepingHours)

## [1] 6.938459

sd(DatasetB$SleepingHours)

## [1] 1.351332

The mean of the independent variable Screen Time is 5.063296 and the SD is 2.056833

The mean of the dependent variable Sleeping Hours is 6.938459 and the SD is 1.351332

PART-3 CHECK NORMALITY

hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "orange",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

PART 4: CORRELATION ANALYSIS

shapiro.test(DatasetB$ScreenTime)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

shapiro.test(DatasetB$SleepingHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

The Shapiro-Wilk p-value for ScreenTime was less than .05, indicating that the data for screen time is not normally distributed. The Shapiro-Wilk p-value for SleepingHours was greater than .05, indicating that the data for sleeping hours is normally distributed.

PART-5 SCATTERPLOTS

cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

A Spearman Correlation was selected because at least one variable (ScreenTime) violated the assumption of normality. The p-value for the correlation was less than .05, indicating that the results are statistically significant. The rho value was -.55, indicating a moderate negative relationship between screen time and sleeping hours.

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepingHours"
)

The line of bestfit is pointing downward from left to right.This indicates that the direction of the data is negative. As Screen Time increases, Sleeping Hours decreases. The dots closely hug the best fit libe, indicating a strong relationship between variables.The dots form a straight-line pattern. This means the data is linear. There may be a outlier, however the dots closely hug the line. So there is no impact on the relationship between the variables.

PART 6 : REPORT THE RESULTS

Screen Time and Sleeping Hours: The independent variable, screen time (M = 5.06, SD = 2.06), was correlated with the dependent variable, sleeping hours (M = 6.94, SD = 1.35), ρ(98) = −.55, p = .000. The relationship was negative and moderate. As screen time increased, sleeping hours decreased.

Assignment4

Vardhan Sreepurushothama

2026-02-04