Matej Suhalj

Research Question:
Are there differences in math, reading and writing scores for those 3 exams?

mydata <- read.table("./original_data4.csv", header = TRUE, sep = ",")
head(mydata)
##   ID Gender MathScore ReadingScore WritingScore
## 1  0 female        72           72           74
## 2  1 female        69           90           88
## 3  2 female        90           95           93
## 4  3   male        47           57           44
## 5  4   male        76           78           75
## 6  5 female        71           83           78

Unit of observation: one student

Description of data:

The data set was taken from Kaggle.com (Students exam scores: Extended dataset) and the sample size is 999 students. I will choose a random sample of 200 students.

#Random sample of 200 units. Needed for accurate hypothesis testing later on.
set.seed(1) 
mydata <- mydata[sample(nrow(mydata), 200), ]
library(psych)
mydata$GenderF <- factor(mydata$Gender,
                         levels = c("male", "female"),
                         labels = c(0, 1))
head(mydata)
##      ID Gender MathScore ReadingScore WritingScore GenderF
## 836 835 female        60           64           74       1
## 679 678   male        81           75           78       0
## 129 128   male        82           82           74       0
## 930 929 female        48           56           51       1
## 509 508   male        79           78           77       0
## 471 470 female        83           85           90       1
library(rstatix)
## Warning: package 'rstatix' was built under R version 4.3.2
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ ggplot2::%+%()   masks psych::%+%()
## ✖ ggplot2::alpha() masks psych::alpha()
## ✖ dplyr::filter()  masks rstatix::filter(), stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#Wide format of data is changed to long format.
mydata_long <- mydata %>%
                 pivot_longer(
                   cols = c("MathScore", "ReadingScore", "WritingScore"),
                   names_to = "Exams", 
                   values_to = "Points") %>%
                 convert_as_factor(Exams)


mydata_long <- as.data.frame(mydata_long)

tail(mydata_long)
##      ID Gender GenderF        Exams Points
## 595 586 female       1    MathScore     55
## 596 586 female       1 ReadingScore     73
## 597 586 female       1 WritingScore     73
## 598  15 female       1    MathScore     69
## 599  15 female       1 ReadingScore     75
## 600  15 female       1 WritingScore     78
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.3.2
#Boxplot for each variable.
ggboxplot(mydata_long, 
          x = "Exams", 
          y = "Points", 
          add = "jitter")

library(rstatix)

#Descriptive statistics for each variable.
mydata_long %>%
  group_by(Exams) %>%
  get_summary_stats(Points, type = "common")
## # A tibble: 3 × 11
##   Exams        variable     n   min   max median   iqr  mean    sd    se    ci
##   <fct>        <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MathScore    Points     200    23    99     65  21    64.8  15.7  1.11  2.19
## 2 ReadingScore Points     200    24   100     68  21    68.1  15.1  1.07  2.11
## 3 WritingScore Points     200    15   100     68  22.2  67.4  15.8  1.12  2.21

If I will use rANOVA I will compare means. If I will use Friedman ANOVA I will compare medians.

library(tidyverse)
library(ggpubr)
library(rstatix)

#Finding outliers.
mydata_long %>%
  group_by(Exams) %>%
  identify_outliers(`Points`)
## # A tibble: 5 × 7
##   Exams           ID Gender GenderF Points is.outlier is.extreme
##   <fct>        <int> <chr>  <fct>    <int> <lgl>      <lgl>     
## 1 MathScore      842 female 1           23 TRUE       FALSE     
## 2 ReadingScore   596 male   0           24 TRUE       FALSE     
## 3 ReadingScore    76 male   0           26 TRUE       FALSE     
## 4 WritingScore   596 male   0           15 TRUE       FALSE     
## 5 WritingScore    76 male   0           22 TRUE       FALSE
#Removing outliers.
mydata_long <- mydata_long %>%
  filter(!ID %in% c(842, 76, 596))
#Checking if there are anymore outliers
mydata_long %>%
  group_by(Exams) %>%
  identify_outliers(`Points`)
## # A tibble: 1 × 7
##   Exams        ID Gender GenderF Points is.outlier is.extreme
##   <fct>     <int> <chr>  <fct>    <int> <lgl>      <lgl>     
## 1 MathScore   338 female 1           24 TRUE       FALSE
#Removing outliers #2
mydata_long <- mydata_long %>%
  filter(!ID == (338))
#Checking if there are anymore outliers #2
mydata_long %>%
  group_by(Exams) %>%
  identify_outliers(`Points`)
## # A tibble: 1 × 7
##   Exams        ID Gender GenderF Points is.outlier is.extreme
##   <fct>     <int> <chr>  <fct>    <int> <lgl>      <lgl>     
## 1 MathScore   466 female 1           26 TRUE       FALSE
#Removing outliers #3
mydata_long <- mydata_long %>%
  filter(!ID == (466))
#Checking if there are anymore outliers #3
mydata_long %>%
  group_by(Exams) %>%
  identify_outliers(`Points`)
## [1] Exams      ID         Gender     GenderF    Points     is.outlier is.extreme
## <0 rows> (or 0-length row.names)

Assumptions to perform a parametric test (rANOVA):
- All variables are numeric.
- Normality: The variables in the population are normally distributed (Shapiro-Wilk test).
- Sphericity: the variances of the differences between the examines variables are equal (Mauchly test for sphericity).
- No significant outliers.

library(rstatix)

#Checking normality with Shapiro-Wilk test
mydata_long %>%
  group_by(Exams) %>%
  shapiro_test(Points)
## # A tibble: 3 × 4
##   Exams        variable statistic     p
##   <fct>        <chr>        <dbl> <dbl>
## 1 MathScore    Points       0.994 0.587
## 2 ReadingScore Points       0.992 0.378
## 3 WritingScore Points       0.990 0.183

Variable Points needs to be normally distributed within each of the stated exams.
Example of MathScore:
H0: Variable Points is normally distributed within MathScore.
H1: Variable Points is NOT normally distributed within MathScore.

I cannot reject any of the H0 hypothesis, therefore normality is not violated and I can continue with rANOVA.

rANOVA

Hypothesis for rANOVA
H0: Mean of all 3 observed variables (MathScore, ReadingScore and WritingScore) are the same.
H1: At least one mean out of those 3 variables is different.
I reject H0 at p<0.05 and accept H1.

Hypothesis for Sphericity
H0:Sphericity is met.
H1:Sphericity in NOT met.
I reject H0 at p<0.05 and accept H1.

library(rstatix)

#rANOVA
ANOVA_results <- anova_test(dv = Points, #Dependent variable
                            wid = ID, #Subject identifier
                            within = Exams, #Within-subject factor variable
                            data = mydata_long)

ANOVA_results #Summary of results.
## ANOVA Table (type III tests)
## 
## $ANOVA
##   Effect DFn DFd      F        p p<.05   ges
## 1  Exams   2 388 18.285 2.58e-08     * 0.009
## 
## $`Mauchly's Test for Sphericity`
##   Effect    W       p p<.05
## 1  Exams 0.54 1.4e-26     *
## 
## $`Sphericity Corrections`
##   Effect   GGe       DF[GG]    p[GG] p[GG]<.05   HFe       DF[HF]    p[HF]
## 1  Exams 0.685 1.37, 265.69 2.18e-06         * 0.688 1.38, 266.94 2.08e-06
##   p[HF]<.05
## 1         *
get_anova_table(ANOVA_results, correction = "auto")
## ANOVA Table (type III tests)
## 
##   Effect  DFn    DFd      F        p p<.05   ges
## 1  Exams 1.37 265.69 18.285 2.18e-06     * 0.009

I rejected rANOVA hypothesis based on this p-value (2.18*10-06).

library(effectsize)
## 
## Attaching package: 'effectsize'
## The following objects are masked from 'package:rstatix':
## 
##     cohens_d, eta_squared
## The following object is masked from 'package:psych':
## 
##     phi
interpret_eta_squared(0.009, rules = "cohen1992")
## [1] "very small"
## (Rules: cohen1992)

The effect size is vary small.

library(rstatix)

#Comparing all paires of variables
pwc <- mydata_long %>%
  pairwise_t_test(Points ~ Exams, 
                  paired = TRUE,
                  p.adjust.method = "bonferroni")

pwc
## # A tibble: 3 × 10
##   .y.    group1  group2    n1    n2 statistic    df       p   p.adj p.adj.signif
## * <chr>  <chr>   <chr>  <int> <int>     <dbl> <dbl>   <dbl>   <dbl> <chr>       
## 1 Points MathSc… Readi…   195   195     -4.97   194 1.48e-6 4.44e-6 ****        
## 2 Points MathSc… Writi…   195   195     -3.92   194 1.2 e-4 3.6 e-4 ***         
## 3 Points Readin… Writi…   195   195      2.02   194 4.5 e-2 1.34e-1 ns

I cannot reject H0 for pair-wise comparison of ReadingScore and WritingScore, therefore I cannot claim that they differ in scores. For the other two comparison I can reject HO and conclude that their scores differ in comparison to one another.

library(rstatix)
pwc <- pwc %>% 
       add_y_position(fun = "median", step.increase = 0.35)

library(ggpubr)
ggboxplot(mydata_long, x = "Exams", y = "Points", add = "point", ylim=c(0, 100)) +
  stat_pvalue_manual(pwc, hide.ns = FALSE) +
  stat_summary(fun = mean, geom = "point", shape = 16, size = 4,
               aes(group = Exams), color = "darkred",
               position = position_dodge(width = 0.8)) +
  stat_summary(fun = mean, colour = "darkred", 
               position = position_dodge(width = 0.8),
               geom = "text", vjust = 0.5, hjust = -2,
               aes(label = round(after_stat(y), digits = 2), group = Exams)) +
  labs(subtitle = get_test_label(ANOVA_results,  detailed = TRUE),
       caption = get_pwc_label(pwc))

Conclusion:
Based on the sample data, I found that there are differences between math, reading and writing exams (p<0.05). The effect size is 0,009, which indicates that it’s very small, which also means that there are very small differences between scores in those 3 exams.

Now I will show the alternative test which I would have used if normality was violated - Friedman ANOVA.

Friedman ANOVA

H0: Location distribution of all 3 variables are the same.
H1: At least 1 location distribution is different.

I reject H0 at p<0.05.

library(rstatix)

#Friedman ANOVA.
FriedmanANOVA <- friedman_test(Points ~ Exams | ID,
                               data = mydata_long)

#Results
FriedmanANOVA 
## # A tibble: 1 × 6
##   .y.        n statistic    df         p method       
## * <chr>  <int>     <dbl> <dbl>     <dbl> <chr>        
## 1 Points   195      22.8     2 0.0000114 Friedman test
library(effectsize)
effectsize::kendalls_w(Points ~ Exams | ID,
                       data = mydata_long)
## Warning: 32 block(s) contain ties.
## Kendall's W |       95% CI
## --------------------------
## 0.06        | [0.02, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].
interpret_kendalls_w(0.06)
## [1] "slight agreement"
## (Rules: landis1977)

There is slight agreement in effect size, which just confirms my hypothesis testing where I rejected H0.

library(rstatix)

#Wilcoxon signed rank tests - comparing all possible paires.
paires_nonpar <- wilcox_test(Points ~ Exams, 
                             paired = TRUE, 
                             p.adjust.method = "bonferroni",
                             data = mydata_long)

paires_nonpar
## # A tibble: 3 × 9
##   .y.    group1       group2     n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>  <chr>        <chr>   <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 Points MathScore    Readin…   195   195     5126. 1.87e-6 5.61e-6 ****        
## 2 Points MathScore    Writin…   195   195     6073  2.47e-4 7.41e-4 ***         
## 3 Points ReadingScore Writin…   195   195     9519  6.8 e-2 2.05e-1 ns

I cannot reject H0 for pair-wise comparison of ReadingScore and WritingScore, therefore I cannot claim that they differ in scores. For the other two comparison I can reject HO and conclude that their scores differ in comparison to one another.

library(rstatix)
comparisons <- paires_nonpar %>% 
                 add_y_position(fun = "median", step.increase = 0.35)

library(ggpubr)
ggboxplot(mydata_long, x = "Exams", y = "Points", add = "point", ylim=c(0, 100)) +
  stat_pvalue_manual(comparisons, hide.ns = FALSE) +
  stat_summary(fun = median, geom = "point", shape = 16, size = 4,
               aes(group = Exams), color = "darkred",
               position = position_dodge(width = 0.8)) +
  stat_summary(fun = median, colour = "darkred", 
               position = position_dodge(width = 0.8),
               geom = "text", vjust = 0.5, hjust = -8,
               aes(label = round(after_stat(y), digits = 2), group = Exams)) +
  labs(subtitle = get_test_label(FriedmanANOVA,  detailed = TRUE),
       caption = get_pwc_label(comparisons))

Conclusion (Friedman ANOVA):
Based on the sample data, I found that there are differences between math, reading and writing exams (p<0.05). The effect size is 0,06, which indicates that there is slight agreement with the statement that scores from those 3 exams differ between them.