Mean Difference in Math Score Between Students Who Took Test Preparation Course vs Students Who Did Not

Mun Kar Yenc Natalie s3991774, Livia Nathania Fireta s3980951, Qingqing Wang s3886626

2023-05-23

Introduction

This dataset is used to test the various factors affecting a student’s test score in the subjects of math, reading and writing.

According to the study by Briggs (2009), the test preparation course had the most effect on the math score. Therefore, in this study, we concentrated on the effect of test preparation course in terms of math score.

Problem Statement

Data

stud_perf_exam <- read_csv("~/Applied Analytics/stud_perf_exam.csv")

glimpse(stud_perf_exam)
## Rows: 1,000
## Columns: 8
## $ gender                        <chr> "female", "female", "female", "male", "m…
## $ `race/ethnicity`              <chr> "group B", "group C", "group B", "group …
## $ `parental level of education` <chr> "bachelor's degree", "some college", "ma…
## $ lunch                         <chr> "standard", "standard", "standard", "fre…
## $ `test preparation course`     <chr> "none", "completed", "none", "none", "no…
## $ `math score`                  <dbl> 72, 69, 90, 47, 76, 71, 88, 40, 64, 38, …
## $ `reading score`               <dbl> 72, 90, 95, 57, 78, 83, 95, 43, 64, 60, …
## $ `writing score`               <dbl> 74, 88, 93, 44, 75, 78, 92, 39, 67, 50, …
stud_perf_exam$`test preparation course`<- stud_perf_exam$`test preparation course` %>% 
  factor(levels= c("none","completed"))

Data Source: https://www.kaggle.com/datasets/spscientist/students-performance-in-exams

The data is taken from a high school in United States.

Data (cont.)

Important Variables:

Preprocessing The Dataset:

The dataset is imported to the RMarkdown using the read_csv() function. The structure and data types of each variable in the dataset is inspected using the str() function. The important variables in the dataset is test preparation course converted to factor data type using factor() stating the levels without ordering the levels for the variable.

Descriptive Statistics for Test Preparation Course

math_testprep <- stud_perf_exam %>% 
  group_by(`test preparation course`) %>%
  summarise(Min = min(`math score`,na.rm = TRUE),
            Q1 = quantile(`math score`,probs = .25,na.rm = TRUE),
            Median = median(`math score`, na.rm = TRUE),
            Q3 = quantile(`math score`,probs = .75,na.rm = TRUE),
            Max = max(`math score`,na.rm = TRUE),
            Mean = mean(`math score`, na.rm = TRUE),
            SD = sd(`math score`, na.rm = TRUE),
            IQR =IQR(`math score`, na.rm = TRUE),
            Range = Q3-Q1,
            n = n())

knitr::kable(math_testprep)
test preparation course Min Q1 Median Q3 Max Mean SD IQR Range n
none 0 54 64 74.75 100 64.07788 15.19238 20.75 20.75 642
completed 23 60 69 79.00 100 69.69553 14.44470 19.00 19.00 358

Descriptive Statistics for Test Preparation Course (cont.)

Missing Value

colSums(is.na(stud_perf_exam))
##                      gender              race/ethnicity 
##                           0                           0 
## parental level of education                       lunch 
##                           0                           0 
##     test preparation course                  math score 
##                           0                           0 
##               reading score               writing score 
##                           0                           0

Missing values are scanned by each column of the dataset using the colSums() and is.na() functions. No missing values are detected from the dataset. Therefore, no further actions are undertaken.

Boxplot for Outliers for Test Preparation Course

Boxplot for Outliers for Test Preparation Course (cont.)

Q-Q Plot for Test Preparation Course

## [1]  43 632

Q-Q Plot for Test Preparation Course (cont.)

## [1] 299 238

All the data points fall within the normal range in the Q-Q plots. Therefore, it is safe to assume that the dataset is normally distributed.

Homogenity of Variances for Test Preparation Course

leveneTest(`math score`~`test preparation course`, data = stud_perf_exam)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   1   0.533 0.4655
##       998

The Levene’s Test reports a p-value that is compared to the standard 0.05 significance level. The Levene’s Test is used to compare the variances of none test preparation course and completed test preparation course groups’ scores for math.

The p-value for the Levene’s test for math score is p = 0.47 which is greater than 0.05, the significance level. We fail to reject the null hypothesis and it is safe to assume equal variance for the math score.

Two-Sample t-test for Test Preparation Course

The two-sample t-tests has the following hypotheses:

\[ \begin{align} \mu_1 = \text{The math score of students who completed the test preparation course}\\ \mu_2 = \text{The math score of students who did not complete the test preparation course} \end{align} \]

\[ H_0: \mu_1 = \mu_2 \\ H_A: \mu_1 \neq \mu_2 \\ \] Null Hypothesis (H0): The difference of math score between the none and completed test preparation course groups’ means is zero.

Alternative Hypothesis (HA): There is a difference of math score between the none and completed test preparation course groups’ means.

Two-sample t-test for Test Preparation Course (cont.)

t.test(`math score`~`test preparation course`, 
       data = stud_perf_exam, 
       var.equal = TRUE, 
       alternative = "two.sided")
## 
##  Two Sample t-test
## 
## data:  math score by test preparation course
## t = -5.7046, df = 998, p-value = 1.536e-08
## alternative hypothesis: true difference in means between group none and group completed is not equal to 0
## 95 percent confidence interval:
##  -7.550077 -3.685221
## sample estimates:
##      mean in group none mean in group completed 
##                64.07788                69.69553

Since it is safe to assume equal variances for math scores, the argument for var.equal within the t.test() function is set to be TRUE. The difference of math score between the none and completed group estimated by the sample was 69.69553-64.07788 = 5.61765. We are 95% confident that the difference in means between the two groups is between -7.55 and -3.69. The p-value is 1.54e-08 which is lower than 0.05 and therefore reject the null hypothesis. There is a statistically significant difference between the two groups in terms of math score.

Discussion

Discussion (cont.)

References

Briggs DC (2007) ‘The Effect of Admissions Test Preparation: Evidence from NELS:88’ , Chance, 14(1):10-18

Seshapanpu, J (2018) Students Performance in Exam, Kaggle website, Accessed 19 May 2023. https://www.kaggle.com/datasets/spscientist/students-performance-in-exams

Tafakori L (2023) ‘Sampling: Randomly Representative’[Course Module, MATH1324], RMIT University, Melbourne.

Tafakori L (2023) ‘Testing the Null: Data on Trial’[Course Module, MATH1324], RMIT University, Melbourne.