Assignment4

library(readxl)
library(ggpubr)

## Loading required package: ggplot2

library(rmarkdown)

Research Question

What is the relationship between how much students study (hours) and their exam score (percentage)?

Part1: Import Dataset

DatasetA <- read_excel("/Users/anupshrestha/Downloads/DatasetA.xlsx")

Part1: Calculate Mean and Standard Deviation

Mean and SD of Independent Variable StudyHours for DatasetA

mean(DatasetA$StudyHours)

## [1] 6.135609

sd(DatasetA$StudyHours)

## [1] 1.369224

Mean and SD of Dependent Variable ExamScore for DatasetA

mean(DatasetA$ExamScore)

## [1] 90.06906

sd(DatasetA$ExamScore)

## [1] 6.795224

#Part 3: Check Normality Histogram for Independent Variable StudyHours for DatasetA

hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Histogram for Dependent Variable ExamScore for DatasetA

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

#Part3: Shapiro-Wilk test Shapiro-Wilk test for DatasetA

shapiro.test(DatasetA$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetA$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

since p-value for dependent variable is 0.006465 which is less than .05, spearman correlatioon should be used

#Part4: Correlation Analysis

Correlation analysis for DatasetA

cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")

## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The Spearman Correlation test was selected because both variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests. The p-value (probability value) is .2.2e-16, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is 0.9008825. The correlation is positive, which means as Study Hours increases, Exam Score increases. The correlation value is greater -0.50, which means the relationship is strong.

#part5: Scatterplots Creating a Scatterplot for DatasetA

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "StudyHours",
  ylab = "ExamScore"
)

The line of best fit is pointing to the top right. This means the diretion of the data is positive. As StudyHours increases, ExamScore increases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is possibly no outlier

study Hours (M = 6.14, SD = 1.37) was correlated with Exam Score (M = 90.07, SD = 6.80), ρ(28) = 0.9008825, p = 2.2e-16(very close to 0). The relationship was positive and strong. As the study hour increased, the exam score increased

Research Question

What is the relationship between how much a person uses their phone (hours) and how much they sleep (hours)?

Part1: Import Dataset

DatasetB <- read_excel("/Users/anupshrestha/Downloads/DatasetB.xlsx")

Part1: Calculate Mean and Standard Deviation

Mean and SD of Independent Variable ScreenTime for DatasetB

mean(DatasetB$ScreenTime)

## [1] 5.063296

sd(DatasetB$ScreenTime)

## [1] 2.056833

Mean and SD of Dependent Variable SleepingHours for DatasetB

mean(DatasetB$SleepingHours)

## [1] 6.938459

sd(DatasetB$SleepingHours)

## [1] 1.351332

#Part 3: Check Normality

Histogram for Independent Variable ScreenTime for DatasetB

hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Histogram for Independent Variable ScreenTime for DatasetB

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

#Part3: Shapiro-Wilk test Shapiro-Wilk test for DatasetB

shapiro.test(DatasetB$ScreenTime)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

shapiro.test(DatasetB$SleepingHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

since p-value for independent variable is 1.914e-06 which is less than .05, spearman correlatioon should be used

#Part4: Correlation Analysis

Correlation analysis for DatasetB

cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The Spearman Correlation test was selected because independent variables were abnormally distributed according to the histograms and the Shapiro-Wilk tests.

The p-value (probability value) is 3.521e-09, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported. The rho-value is -0.5544674. The correlation is negative, which means as ScreenTime increases, hours sleeping decreases. The correlation value is greater -0.50, which means the relationship is strong.

#part5: Scatterplots

Creating a Scatterplot for DatasetB

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepingHours"
)

The line of best fit is pointing to the bottom right. This means the diretion of the data is negative. As ScreenTime increases, SleepingHour Decreases. The dots closely hug the line. This means there is a strong relationship between the variables. The dots form a straight-line pattern. This means the data is linear. There is possibly no outlier

Screen Time (M = 5.063296, SD = 2.056833) was correlated with Sleeping Hours (M = 6.938459, SD = 1.351332), ρ() = -0.5544674, p = 3.521e-09 (very close to 0). The relationship was negetive and strong. As the screen time increased, the sleeping hour increased

Assignment4

Anup Shrestha

2026-02-05

Research Question

Part1: Import Dataset

Part1: Calculate Mean and Standard Deviation

Research Question

Part1: Import Dataset

Part1: Calculate Mean and Standard Deviation