Matej Suhalj

Research Question:
Are there differences in math, reading and writing exams?

mydata <- read.table("./original_data4.csv", header = TRUE, sep = ",")
head(mydata)

##   ID Gender MathScore ReadingScore WritingScore
## 1  0 female        72           72           74
## 2  1 female        69           90           88
## 3  2 female        90           95           93
## 4  3   male        47           57           44
## 5  4   male        76           78           75
## 6  5 female        71           83           78

Unit of observation: one student

Description of data:

ID: student identificator
Gender: male, female
MathScore: results on a math exam
ReadingScore: results on a reading exam
WritingScore: results on a writing exam

The data set was taken from Kaggle.com (Students exam scores: Extended dataset) and the sample size is 999 students. I will choose a random sample of 200 students.

#Random sample of 200 units. Needed for accurate hypothesis testing later on.
set.seed(1) 
mydata <- mydata[sample(nrow(mydata), 200), ]

library(psych)
mydata$GenderF <- factor(mydata$Gender,
                         levels = c("male", "female"),
                         labels = c(0, 1))
head(mydata)

##      ID Gender MathScore ReadingScore WritingScore GenderF
## 836 835 female        60           64           74       1
## 679 678   male        81           75           78       0
## 129 128   male        82           82           74       0
## 930 929 female        48           56           51       1
## 509 508   male        79           78           77       0
## 471 470 female        83           85           90       1

library(rstatix)

## Warning: package 'rstatix' was built under R version 4.3.2

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.3.2

## Warning: package 'ggplot2' was built under R version 4.3.2

## Warning: package 'lubridate' was built under R version 4.3.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ ggplot2::%+%()   masks psych::%+%()
## ✖ ggplot2::alpha() masks psych::alpha()
## ✖ dplyr::filter()  masks rstatix::filter(), stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

#Wide format of data is changed to long format.
mydata_long <- mydata %>%
                 pivot_longer(
                   cols = c("MathScore", "ReadingScore", "WritingScore"),
                   names_to = "Exams", 
                   values_to = "Points") %>%
                 convert_as_factor(Exams)


mydata_long <- as.data.frame(mydata_long)

tail(mydata_long)

##      ID Gender GenderF        Exams Points
## 595 586 female       1    MathScore     55
## 596 586 female       1 ReadingScore     73
## 597 586 female       1 WritingScore     73
## 598  15 female       1    MathScore     69
## 599  15 female       1 ReadingScore     75
## 600  15 female       1 WritingScore     78

library(ggpubr)

## Warning: package 'ggpubr' was built under R version 4.3.2

#Boxplot for each variable.
ggboxplot(mydata_long, 
          x = "Exams", 
          y = "Points", 
          add = "jitter")

library(tidyverse)
library(ggpubr)
library(rstatix)

#Finding outliers.
mydata_long %>%
  group_by(Exams) %>%
  identify_outliers(`Points`)

## # A tibble: 5 × 7
##   Exams           ID Gender GenderF Points is.outlier is.extreme
##   <fct>        <int> <chr>  <fct>    <int> <lgl>      <lgl>     
## 1 MathScore      842 female 1           23 TRUE       FALSE     
## 2 ReadingScore   596 male   0           24 TRUE       FALSE     
## 3 ReadingScore    76 male   0           26 TRUE       FALSE     
## 4 WritingScore   596 male   0           15 TRUE       FALSE     
## 5 WritingScore    76 male   0           22 TRUE       FALSE

#Removing outliers.
mydata_long <- mydata_long %>%
  filter(!ID == (842))
mydata_long <- mydata_long %>%
  filter(!ID == (76))
mydata_long <- mydata_long %>%
  filter(!ID == (596))

#Checking if there are anymore outliers
mydata_long %>%
  group_by(Exams) %>%
  identify_outliers(`Points`)

## # A tibble: 1 × 7
##   Exams        ID Gender GenderF Points is.outlier is.extreme
##   <fct>     <int> <chr>  <fct>    <int> <lgl>      <lgl>     
## 1 MathScore   338 female 1           24 TRUE       FALSE

mydata_long <- mydata_long %>%
  filter(!ID == (338))

#Checking if there are anymore outliers #2
mydata_long %>%
  group_by(Exams) %>%
  identify_outliers(`Points`)

## # A tibble: 1 × 7
##   Exams        ID Gender GenderF Points is.outlier is.extreme
##   <fct>     <int> <chr>  <fct>    <int> <lgl>      <lgl>     
## 1 MathScore   466 female 1           26 TRUE       FALSE

mydata_long <- mydata_long %>%
  filter(!ID == (466))

#Checking if there are anymore outliers #3
mydata_long %>%
  group_by(Exams) %>%
  identify_outliers(`Points`)

## [1] Exams      ID         Gender     GenderF    Points     is.outlier is.extreme
## <0 rows> (or 0-length row.names)

Assumptions to perform a parametric test (rANOVA):
- All variables are numeric.
- Normality: The variables in the population are normally distributed (Shapiro-Wilk test).
- Sphericity: the variances of the differences between the examines variables are equal (Mauchly test for sphericity).
- No significant outliers.

library(rstatix)

#Checking normality with Shapiro-Wilk test
mydata_long %>%
  group_by(Exams) %>%
  shapiro_test(Points)

## # A tibble: 3 × 4
##   Exams        variable statistic     p
##   <fct>        <chr>        <dbl> <dbl>
## 1 MathScore    Points       0.994 0.587
## 2 ReadingScore Points       0.992 0.378
## 3 WritingScore Points       0.990 0.183

Variable Points needs to be normally distributed within each of the stated Exams.
Example of MathScore:
H0: Variable Points in normally distributed within MathScore.
H1: Variable Points in NOT normally distributed within MathScore.

H0 of WritingScore is rejected at p<0.05, therefore normality is violated and I will use Friedman ANOVA.

I will still first show the parametric test - rANOVA

H0: Mean of all the variables (MathScore, ReadingScore and WritingScore) are the same. H1: At least one mean out of those 3 is different.

library(rstatix)

#rANOVA
ANOVA_results <- anova_test(dv = Points, #Dependent variable
                            wid = ID, #Subject identifier
                            within = Exams, #Within-subject factor variable
                            data = mydata_long)

ANOVA_results #Summary of results.

## ANOVA Table (type III tests)
## 
## $ANOVA
##   Effect DFn DFd      F        p p<.05   ges
## 1  Exams   2 388 18.285 2.58e-08     * 0.009
## 
## $`Mauchly's Test for Sphericity`
##   Effect    W       p p<.05
## 1  Exams 0.54 1.4e-26     *
## 
## $`Sphericity Corrections`
##   Effect   GGe       DF[GG]    p[GG] p[GG]<.05   HFe       DF[HF]    p[HF]
## 1  Exams 0.685 1.37, 265.69 2.18e-06         * 0.688 1.38, 266.94 2.08e-06
##   p[HF]<.05
## 1         *

get_anova_table(ANOVA_results, correction = "auto")

## ANOVA Table (type III tests)
## 
##   Effect  DFn    DFd      F        p p<.05   ges
## 1  Exams 1.37 265.69 18.285 2.18e-06     * 0.009

I reject H0 at p<0.05.

After the show of the descriptive statistics, I will show the more accurate test my case - Friedman ANOVA.

library(rstatix)

#Descriptive statistics for each variable.
mydata_long %>%
  group_by(Exams) %>%
  get_summary_stats(Points, type = "common")

## # A tibble: 3 × 11
##   Exams        variable     n   min   max median   iqr  mean    sd    se    ci
##   <fct>        <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MathScore    Points     195    29    99     66  19.5  65.8  14.6  1.05  2.07
## 2 ReadingScore Points     195    31   100     69  21    69.0  14.1  1.01  2.00
## 3 WritingScore Points     195    33   100     68  22    68.4  14.6  1.05  2.06

It is more accurate in my case if I compare medians and not means, because normallity was violated, therefore Friedman ANOVA is the more appropriate test (non-parametric test).

Friedman ANOVA

library(rstatix)

#Friedman ANOVA.
FriedmanANOVA <- friedman_test(Points ~ Exams | ID,
                               data = mydata_long)

#Results
FriedmanANOVA

## # A tibble: 1 × 6
##   .y.        n statistic    df         p method       
## * <chr>  <int>     <dbl> <dbl>     <dbl> <chr>        
## 1 Points   195      22.8     2 0.0000114 Friedman test

H0: Location distribution of all 3 variables are the same.
H1: At least 1 location distribution is different.

I reject H0 at p<0.001.

library(effectsize)

## 
## Attaching package: 'effectsize'

## The following objects are masked from 'package:rstatix':
## 
##     cohens_d, eta_squared

## The following object is masked from 'package:psych':
## 
##     phi

effectsize::kendalls_w(Points ~ Exams | ID,
                       data = mydata_long)

## Warning: 32 block(s) contain ties.

## Kendall's W |       95% CI
## --------------------------
## 0.06        | [0.02, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].

interpret_kendalls_w(0.05)

## [1] "slight agreement"
## (Rules: landis1977)

There is slight agrement in effect size, which just confirms my hyposthesis testing where I rejected H0.

library(rstatix)

#Wilcoxon signed rank tests - comparing all possible paires.
paires_nonpar <- wilcox_test(Points ~ Exams, 
                             paired = TRUE, 
                             p.adjust.method = "bonferroni",
                             data = mydata_long)

paires_nonpar

## # A tibble: 3 × 9
##   .y.    group1       group2     n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>  <chr>        <chr>   <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 Points MathScore    Readin…   195   195     5126. 1.87e-6 5.61e-6 ****        
## 2 Points MathScore    Writin…   195   195     6073  2.47e-4 7.41e-4 ***         
## 3 Points ReadingScore Writin…   195   195     9519  6.8 e-2 2.05e-1 ns

All the p values are statistically significant. No location distributions in all 3 combinations are the same.

library(rstatix)
comparisons <- paires_nonpar %>% 
                 add_y_position(fun = "median", step.increase = 0.35)

library(ggpubr)
ggboxplot(mydata_long, x = "Exams", y = "Points", add = "point", ylim=c(0, 18)) +
  stat_pvalue_manual(comparisons, hide.ns = FALSE) +
  stat_summary(fun = median, geom = "point", shape = 16, size = 4,
               aes(group = Exams), color = "darkred",
               position = position_dodge(width = 0.8)) +
  stat_summary(fun = median, colour = "darkred", 
               position = position_dodge(width = 0.8),
               geom = "text", vjust = 0.5, hjust = -8,
               aes(label = round(after_stat(y), digits = 2), group = Exams)) +
  labs(subtitle = get_test_label(FriedmanANOVA,  detailed = TRUE),
       caption = get_pwc_label(comparisons))

Conclusion:
Based on the sample data, I found that there are differences between math, reading and writing exams (p<0.001). The effect size is 0,05, which means there is slight agreement in the statement that there is no difference between those 3 exams.

Homework 1

2024-01-06

Matej Suhalj

Friedman ANOVA