Corelation-analysis.knit

Web URL: http://rpubs.com/pmohammed/1392646

Overview

The goal of this assignment was to explore relationships between variables using correlation analysis in RStudio. Depending on whether the data met assumptions of normality, either Pearson or Spearman correlation coefficients were considered. Normality was evaluated through histograms and the Shapiro–Wilk test.

The analysis focused on two research questions:

Is there a relationship between the number of hours students study and their exam scores?

Is there a relationship between daily phone use and the number of hours a person sleeps?

Research Question 1: Study Hours and Exam Scores Descriptive Statistics

On average, students reported studying 6.14 hours (SD = 1.37). The mean exam score was 90.07% (SD = 6.80).

Normality Assessment

Results from the Shapiro–Wilk test showed that study hours were normally distributed, while exam scores were not (p < .05). Because this violated the normality assumption, a Spearman correlation was used for further analysis.

Correlation Results

The Spearman correlation analysis revealed a strong positive relationship between study hours and exam scores, ρ(98) = .90, p < .001.

DatasetA The variable “StudyHours” appears approximately normally distributed. The histogram is fairly symmetrical with most values clustered around the center, indicating low skewness and appropriate kurtosis. The variable “ExamScore” appears slightly skewed. Although most values cluster toward higher scores, the distribution does not perfectly form a bell curve, suggesting some skewness and non-ideal kurtosis. The Shapiro-Wilk p-value for StudyHours was greater than .05, indicating that the data for study hours is normally distributed. The Shapiro-Wilk p-value for ExamScore was less than .05, indicating that the data for exam scores is not normally distributed. A Spearman Correlation was selected because at least one variable (ExamScore) violated the assumption of normality. The p-value for the correlation was less than .05, indicating that the results are statistically significant. The rho value was .90, indicating a strong positive relationship between study hours and exam scores. The scatterplot shows a positive and strong linear relationship. As study hours increase, exam scores increase, and the data points closely follow a straight line.

Interpretation

This finding suggests that students who spent more time studying generally achieved higher exam scores. The strength of the relationship indicates that study time plays an important role in academic performance.

Research Question 2: Screen Time and Sleeping Hours Descriptive Statistics

Participants reported an average daily screen time of 5.06 hours (SD = 2.06). The mean amount of sleep was 6.94 hours per night (SD = 1.35).

Normality Assessment

Shapiro–Wilk tests indicated that screen time was not normally distributed, whereas sleeping hours were normally distributed. Since at least one variable violated the assumption of normality, a Spearman correlation was again selected.

Correlation Results

The results showed a moderate negative relationship between screen time and sleeping hours, ρ(98) = −.55, p < .001.

Interpretation

This result indicates that individuals who spent more time on their phones tended to sleep fewer hours. Increased screen use appears to be associated with reduced sleep duration.

DatasetB The variable “ScreenTime” appears positively skewed, with more observations clustered at lower values and a longer tail toward higher values, indicating non-normal skewness and kurtosis. The variable “SleepingHours” appears approximately normally distributed, with a fairly symmetrical distribution and a clear central peak. The Shapiro-Wilk p-value for ScreenTime was less than .05, indicating that the data for screen time is not normally distributed. The Shapiro-Wilk p-value for SleepingHours was greater than .05, indicating that the data for sleeping hours is normally distributed. A Spearman Correlation was selected because at least one variable (ScreenTime) violated the assumption of normality. The p-value for the correlation was less than .05, indicating that the results are statistically significant. The rho value was -.55, indicating a moderate negative relationship between screen time and sleeping hours. The scatterplot shows a negative and moderate linear relationship. As screen time increases, sleeping hours decrease, and the data points generally follow a downward linear pattern.

Spearman Correlation Results

Study Hours and Exam Score: The independent variable, study hours (M = 6.14, SD = 1.37), was correlated with the dependent variable, exam score (M = 90.07, SD = 6.80), ρ(98) = .90, p = .000. The relationship was positive and strong. As study hours increased, exam scores increased.

Screen Time and Sleeping Hours: The independent variable, screen time (M = 5.06, SD = 2.06), was correlated with the dependent variable, sleeping hours (M = 6.94, SD = 1.35), ρ(98) = −.55, p = .000. The relationship was negative and moderate. As screen time increased, sleeping hours decreased.

Conclusion

Overall, the findings from this analysis highlight two clear patterns. Greater study time was strongly associated with better exam performance, emphasizing the importance of consistent studying for academic success. In contrast, increased screen time was linked to reduced sleep, suggesting that excessive phone use may negatively affect sleep habits.

Together, these results underscore the value of maintaining healthy study routines and being mindful of screen use to support both academic outcomes and overall well-being.

Libraries

library(readxl)
library(ggpubr)

## Loading required package: ggplot2

setwd("C:/Users/Mrlaz/Applied Analytics")

DatasetA <- read_excel("DatasetA.xlsx")
DatasetB <- read_excel("DatasetB.xlsx")
mean(DatasetA$StudyHours)

## [1] 6.135609

sd(DatasetA$StudyHours)

## [1] 1.369224

mean(DatasetA$ExamScore)

## [1] 90.06906

sd(DatasetA$ExamScore)

## [1] 6.795224

mean(DatasetB$ScreenTime)

## [1] 5.063296

sd(DatasetB$ScreenTime)

## [1] 2.056833

mean(DatasetB$SleepingHours)

## [1] 6.938459

sd(DatasetB$SleepingHours)

## [1] 1.351332

hist(DatasetA$StudyHours,
     main = "StudyHours",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetA$ExamScore,
     main = "ExamScore",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetB$ScreenTime,
     main = "ScreenTime",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetB$SleepingHours,
     main = "SleepingHours",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

shapiro.test(DatasetA$StudyHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349

shapiro.test(DatasetA$ExamScore)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465

shapiro.test(DatasetB$ScreenTime)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$ScreenTime
## W = 0.90278, p-value = 1.914e-06

shapiro.test(DatasetB$SleepingHours)

## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetB$SleepingHours
## W = 0.98467, p-value = 0.3004

cor.test(DatasetA$StudyHours,
         DatasetA$ExamScore,
         method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## t = 20.959, df = 98, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8606509 0.9346369
## sample estimates:
##      cor 
## 0.904214

cor.test(
  DatasetA$StudyHours,
  DatasetA$ExamScore,
  method = "spearman"
)

## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

cor.test(
  DatasetB$ScreenTime,
  DatasetB$SleepingHours,
  method = "spearman"
)

## 
##  Spearman's rank correlation rho
## 
## data:  DatasetB$ScreenTime and DatasetB$SleepingHours
## S = 259052, p-value = 3.521e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5544674

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  conf.int = TRUE
)

ggscatter(
  DatasetB,
  x = "ScreenTime",
  y = "SleepingHours",
  add = "reg.line",
  conf.int = TRUE
)