Assignment4

install.packages(“readxl”)

install.packages(“ggpubr”)

library(readxl)
library(ggpubr)

## Loading required package: ggplot2

Loading the Data set A:

DatasetA <- read_excel("C:/Users/nanda/Downloads/DatasetA.xlsx")

In the data set A, the independent variable is Study Hours, and dependent variable is Exam Score.

Descriptive Analysis:

mean(DatasetA$StudyHours)

## [1] 6.135609

sd(DatasetA$StudyHours)

## [1] 1.369224

The mean and standard deviation of the independent variable is m(6.135609), sd(1.369224)

mean(DatasetA$ExamScore)

## [1] 90.06906

sd(DatasetA$ExamScore)

## [1] 6.795224

The mean and standard deviation of the dependent variable is m(90.06906), sd(6.795224)

Normality Testing:

hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "red",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Study Hours appears normally distributed. The histogram is approximately symmetrical with most values in the center and a bell-shaped curve.

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Examscore appears slightly non-normal. The histogram shows some skewness and does not form a perfectly symmetrical bell curve.

Testing Normality Statistically:

shapiro.test(DatasetA$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetA$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

The Shapiro-Wilk p-value for StudyHours was greater than .05 (p-value=0.934) indicating the data is normally distributed. The Shapiro-Wilk p-value for Examscore was less than .05 (p-value=0.006) indicating the data is not normally distributed.

Correlation Testing: A Spearman correlation was selected because at least one variable in datasetA violated normality assumptions according to the histograms and Shapiro-Wilk tests.

cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")

## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The p-value for the correlation (.0001) is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The correlation value is 0.9008. The correlation is positive, which means as Study Hours increases, Exam Score increases. The correlation value is greater than 0.50 but less than 1, which means the relationship is strong.

Scatter Plot:

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "Study Hours",
  ylab = "Exam Score"
)

DatasetA: The line of best fit points upward, indicating a positive relationship. The points closely follow the line, suggesting a strong relationship. The pattern is linear, and no extreme outliners are observed. As Study hours increased, Exam Score also increases.

Reporting the Results: Descriptive Statistics Study Hours: Mean = 6.14, SD = 1.37 Exam Score: Mean = 90.07, SD = 6.80 Normality (Shapiro-wilk): Study Hours: p = 0.9349, Normal Exam Score: p = 0.0065, Not normal correlation results: p-value < .05 alternative hypothesis is true, as rho is not equal to 0 rho = 0.9008825. The correlation is strong. The relationship was positive and strong. As the Study Hours increased, the Exam Score increased.

Loading the Data Set B:

DatasetB<- read_excel("C:/Users/nanda/Downloads/DatasetB.xlsx")

In Data set B, the independent variable is Screen Time, and dependent variable is Sleeping Hours

Descriptive Analysis:

mean(DatasetB$ScreenTime)

## [1] 5.063296

sd(DatasetB$ScreenTime)

## [1] 2.056833

The mean and standard deviation of the independent variable is m(5.063296), sd(2.056833)

  mean(DatasetB$SleepingHours)

## [1] 6.938459

  sd(DatasetB$SleepingHours)

## [1] 1.351332

The mean and standard deviation of the dependent variable is m(6.938459), sd(1.351332)

Normality Testing:

hist(DatasetB$ScreenTime,
       main = "ScreeTime",
       breaks = 20,
       col = "red",
       border = "white",
       cex.main = 1,
       cex.axis = 1,
       cex.lab = 1)

ScreenTime appears abnormally distributed. The histogram shows noticable skewness with values clustering towards one side.

hist(DatasetB$SleepingHours,
       main = "SleepingHours",
       breaks = 20,
       col = "lightblue",
       border = "white",
       cex.main = 1,
       cex.axis = 1,
       cex.lab = 1)

SleepingHours appears normally distributed. The histogram is symmetrical with a clear bell-shaped curve.

Testing Normality Statistically:

shapiro.test(DatasetB$ScreenTime)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

shapiro.test(DatasetB$SleepingHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

The Shapiro-Wilk p-value for ScreeTime was less than .05 (p-value=0.001) indicating the data is not normally distributed. The Shapiro-Wilk p-value for SleepingHours was greater than .05 (p-value=0.300) indicating the data is normally distributed.

Correlation Testing: A Spearman correlation was selected because at least one variable in datasetB violated normality assumptions according to the histograms and Shapiro-Wilk tests.

cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The p-value is less than .05, so the results are statistically significant. The rho value is not 0, so the alternate hypothesis is supported. rho value is -0.55, the correlation is negative and strong. As the screen time increases, sleeping hours decreases.

Scatter Plot:

ggscatter(
    DatasetB,
    x = "ScreenTime",
    y = "SleepingHours",
    add = "reg.line",
    xlab = "Screen Time",
    ylab = "Sleeping Hours"
  
)

The line of best fit points downward, indicating a negative relationship. The points moderately hug the line, suggesting a strong relationship. The pattern appears linear , with no extreme outliners. As screen time increases, sleeping hours decreases.

Reporting the Results: Descriptive Statistics ScreenTime: Mean = 5.06, SD = 2.0568 SleepingHours: Mean = 6.9384, SD = 1.3513 Normality: Study Hours: p = 0.0019, not Normal Exam Score: p = 0.3004, normal correlation results: p-value = 3.521e-09 <.05 alternative hypothesis is as true rho is not equal to 0. rho -0.5544674, the correlation is negative and strong., The relationship was negative and strong. As the Screen Time increases, the Sleeping Hours decreases.

Assignment4

Nandan

2026-02-04