Load Libraries

library(readxl)
library(ggpubr)
## Loading required package: ggplot2
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Import Dataset

data <- read_excel("C:/Users/Admin/Downloads/Dataset6.4.xlsx")

Check data

head(data)
## # A tibble: 6 × 3
##   Student_ID Stress_Pre Stress_Post
##        <dbl>      <dbl>       <dbl>
## 1          1       53.5       45.5 
## 2          2       37.4       33.9 
## 3          3       35.8        9.49
## 4          4       89.0       82.8 
## 5          5       30.5       26.8 
## 6          6       42.5       26.9
str(data)
## tibble [35 × 3] (S3: tbl_df/tbl/data.frame)
##  $ Student_ID : num [1:35] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Stress_Pre : num [1:35] 53.5 37.4 35.8 89 30.5 ...
##  $ Stress_Post: num [1:35] 45.48 33.92 9.49 82.77 26.82 ...

Create Variables

Stress_Pre  <- data$Stress_Pre
Stress_Post <- data$Stress_Post

Differences <- Stress_Post - Stress_Pre
print(Differences)
##  [1]  -8.044361120  -3.529426078 -26.350753567  -6.276567859  -3.701374633
##  [6] -15.552084134  -4.356073741  -3.647770620  -0.008788193  -5.302543769
## [11]  -8.806996646  -3.200139192  -1.385585292 -19.179181634   3.899627158
## [16]  -6.612429249  -2.533382079  -1.317457905  -0.782853924 -32.441286182
## [21]  -4.132848699  -6.536275900  -6.896524983  -5.238701383 -36.697172771
## [26] -28.957287115  -9.817187881 -10.508880209 -19.386322836 -10.616491219
## [31] -13.700632408  -7.433337532 -20.087719095  -7.326665368 -15.099150128

Calculate descriptive statistics for each group

cat("Pre-Test Mean: ", mean(Stress_Pre, na.rm = TRUE), "\n")
## Pre-Test Mean:  51.53601
cat("Pre-Test Median: ", median(Stress_Pre, na.rm = TRUE), "\n")
## Pre-Test Median:  47.24008
cat("Pre-Test SD: ", sd(Stress_Pre, na.rm = TRUE), "\n\n")
## Pre-Test SD:  17.21906
cat("Post-Test Mean: ", mean(Stress_Post, na.rm = TRUE), "\n")
## Post-Test Mean:  41.4913
cat("Post-Test Median: ", median(Stress_Post, na.rm = TRUE), "\n")
## Post-Test Median:  40.84836
cat("Post-Test SD: ", sd(Stress_Post, na.rm = TRUE), "\n")
## Post-Test SD:  18.88901

Histogram of Difference Scores

hist(Differences,
     main = "Histogram of Difference Scores",
     xlab = "Difference (Post - Pre)",
     col = "lightblue",
     border = "black",
     breaks = 10)

#Interpretation

The histogram displays the distribution of the difference scores.If the histogram appears roughly symmetric and bell-shaped, the normality assumption is likely satisfied.If the histogram appears skewed or irregular, the normality assumption may be violated.

Boxplot

boxplot(Differences,
        main = "Boxplot of Differences",
        col = "lightgreen",
        border = "black")

Interpretation

The boxplot helps identify potential outliers and the overall spread of the data.Data points beyond the whiskers indicate potential outliers.The presence of several outliers may suggest that the normality assumption is violated.So the boxplots is not normal.

Shapiro–Wilk Test of Normality

shapiro.test(Differences)
## 
##  Shapiro-Wilk normality test
## 
## data:  Differences
## W = 0.87495, p-value = 0.0008963

Interpretation

The Shapiro–Wilk test evaluates whether the difference scores are normally distributed.If p > .05 → Data are considered normal. If p < .05 → Data are not normal

Select the Correct Test

If the Shapiro–Wilk test is not significant (p > .05), a paired-samples t-test is appropriate.If the Shapiro–Wilk test is significant (p < .05), a Wilcoxon signed-rank test is appropriate.

Statistical Test

wilcox.test(Stress_Pre, Stress_Post, paired = TRUE)
## 
##  Wilcoxon signed rank exact test
## 
## data:  Stress_Pre and Stress_Post
## V = 620, p-value = 2.503e-09
## alternative hypothesis: true location shift is not equal to 0

Calculate the Effect Size (Rank Biserial Correlation for Mann-Whitney U)

library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
df_long <- data.frame(
  id = rep(1:length(Stress_Pre), 2),
  time = rep(c("Pre", "Post"), each = length(Stress_Pre)),
  stress = c(Stress_Pre, Stress_Post)
)

wilcox_effsize(df_long, stress ~ time, paired = TRUE, id = id)
## # A tibble: 1 × 7
##   .y.    group1 group2 effsize    n1    n2 magnitude
## * <chr>  <chr>  <chr>    <dbl> <int> <int> <ord>    
## 1 stress Post   Pre      0.844    35    35 large

Interpretation

There was a significant difference in the stress between Pre-Stress Group (Mdn = 47.24) and Post-Stress (Mdn = 40.84), V = 620, p < .001 The effect size was very large (r₍rb₎ = 0.84).