library(rmarkdown)

library(readxl)

library(ggpubr)
## Loading required package: ggplot2
DatasetA <- read_excel("/Users/poojithareddybavanam/Downloads/DatasetA.xlsx")


mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
DatasetB <- read_excel("/Users/poojithareddybavanam/Downloads/DatasetB.xlsx")


mean(DatasetB$ScreenTime)
## [1] 5.063296
sd(DatasetB$ScreenTime)
## [1] 2.056833
mean(DatasetB$SleepingHours)
## [1] 6.938459
sd(DatasetB$SleepingHours)
## [1] 1.351332
hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “StudyHours” appears approximately normally distributed. The data looks fairly symmetrical, with most values clustered around the middle. The distribution shows a reasonable bell-shaped curve, with no extreme skewness or kurtosis.

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “ExamScore” appears abnormally distributed. The data is negatively skewed, with most values clustered toward the higher end of the scale. The distribution does not display a clear bell-shaped curve.

hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “ScreenTime” does not appear normally distributed. The data does not look symmetrical, with most data concentrated toward the lower values and a tail extending to the right. The data does not appear to have a proper bell curve.

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

The variable “SleepingHours” appears normally distributed. The data looks fairly symmetrical (most data is in the middle). The data also appears to have a proper bell curve.

shapiro.test(DatasetA$StudyHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

The Shapiro–Wilk p-value for the StudyHours normality test is greater than .05 (.9349), so the data is normal.

shapiro.test(DatasetA$ExamScore)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

The Shapiro–Wilk p-value for the ExamScore normality test is less than .05 (.006), so the data is not normal.

shapiro.test(DatasetB$ScreenTime)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

The Shapiro–Wilk p-value for the ScreenTime normality test is less than .05 (1.91e-06), so the data is not normal.

shapiro.test(DatasetB$SleepingHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

The Shapiro–Wilk p-value for the SleepingHours normality test is greater than .05 (.3004), so the data is normal.

cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman")
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

The Spearman Correlation test was selected because at least one variable was abnormally distributed according to the histograms and the Shapiro–Wilk tests.

The p-value (probability value) is less than .001, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported.

The rho-value is .90.

The correlation is positive, which means as study hours increase, exam scores increase.

The correlation value is greater than .50, which means the relationship is strong.

cor.test(DatasetB$ScreenTime, DatasetB$SleepingHours, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

The Spearman Correlation test was selected because at least one variable was abnormally distributed according to the histograms and the Shapiro–Wilk tests.

The p-value (probability value) is less than .001, which is below .05. This means the results are statistically significant. The alternate hypothesis is supported.

The rho-value is -.55.

The correlation is negative, which means as screen time increases, sleeping hours decrease.

The correlation value is greater than -.50, which means the relationship is strong.

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "StudyHours",
  ylab = "ExamScore"
)

The line of best fit is pointing to the top right. This means the direction of the data is positive. As study hours increase, exam scores increase.

The dots closely hug the line. This means there is a strong relationship between the variables.

The dots form a straight-line pattern. This means the data is linear.

There are no obvious extreme outliers in the data. The dots are close to the line of best fit and do not appear to impact the relationship between the independent and dependent variables.

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  xlab = "ScreenTime",
  ylab = "SleepingHours"
)

The line of best fit slopes downward from left to right, indicating a negative relationship.

The points moderately cluster around the line, showing a strong relationship.

The data shows a mostly linear pattern.

No extreme outliers appear to significantly impact the relationship.

StudyHours (M = 6.14, SD = 1.37) was correlated with ExamScore (M = 90.07, SD = 6.79), ρ = .90, p < .001.

The relationship was positive and strong. As study hours increased, exam scores increased.

ScreenTime (M = 5.06, SD = 2.06) was correlated with SleepingHours (M = 6.94, SD = 1.35), ρ = -.55, p < .001.

The relationship was negative and strong. As screen time increased, sleeping hours decreased.