Inferential Statistical Analysis on Student Performance Dataset

1. Hypothesis Test: Test Preparation and Math Scores

Research Question: Is there a significant difference in the mean math scores between students who completed the test preparation course and those who did not?

Hypotheses:

Null Hypothesis (H0): The mean math scores of students who completed the test preparation course are equal to those who did not.

Alternative Hypothesis (H1): There is a significant difference in mean math scores between the two groups.

Statistical Test: Independent t-test

library(tidyverse)

## Warning: package 'readr' was built under R version 4.3.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

# Load the dataset
url <- "https://raw.githubusercontent.com/Naik-Khyati/data_621/main/blogs/blog1/StudentsPerformance.csv"
student_performance <- read.csv(url)

# Independent t-test
t_test_result <- t.test(math.score ~ test.preparation.course, data = student_performance)
t_test_result

## 
##  Welch Two Sample t-test
## 
## data:  math.score by test.preparation.course
## t = 5.787, df = 770.08, p-value = 1.043e-08
## alternative hypothesis: true difference in means between group completed and group none is not equal to 0
## 95 percent confidence interval:
##  3.712041 7.523257
## sample estimates:
## mean in group completed      mean in group none 
##                69.69553                64.07788

Result:

The p-value from the independent t-test is compared to the significance level (commonly 0.05). If the p-value is less than 0.05, we reject the null hypothesis.

Interpretation: The p-value obtained is [insert p-value here]. Since this is less than the significance level of 0.05, we reject the null hypothesis. This suggests that there is a significant difference in the mean math scores between students who completed the test preparation course and those who did not.

2. Linear Regression: Reading Scores predicting Writing Scores

Research Question: Can we predict a student’s writing score based on their reading score?

Hypotheses:

Null Hypothesis (H0): There is no linear relationship between reading and writing scores.

Alternative Hypothesis (H1): There is a significant linear relationship between reading and writing scores.

Statistical Test: Linear Regression

# Linear regression
linear_model <- lm(writing.score ~ reading.score, data = student_performance)
summary(linear_model)

## 
## Call:
## lm(formula = writing.score ~ reading.score, data = student_performance)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.9573  -2.9573   0.0363   3.1026  15.0557 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.667554   0.693792  -0.962    0.336    
## reading.score  0.993531   0.009814 101.233   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.529 on 998 degrees of freedom
## Multiple R-squared:  0.9113, Adjusted R-squared:  0.9112 
## F-statistic: 1.025e+04 on 1 and 998 DF,  p-value: < 2.2e-16

Result:

The p-value associated with the coefficient of the reading score is used to determine the significance of the relationship. If this p-value is less than 0.05, we reject the null hypothesis.

Interpretation: The p-value obtained for the coefficient of the reading score is [insert p-value here]. Since this is less than 0.05, we reject the null hypothesis. This indicates a significant linear relationship between reading and writing scores.

Real-Life Uses:

Educational Interventions: The significant difference in math scores based on test preparation suggests that educational interventions targeted at improving test preparation strategies can positively impact math performance.

Predictive Modeling: Understanding the relationship between reading and writing scores can inform educational strategies, allowing educators to predict and support students’ writing performance based on their reading scores.

Conclusion:

The inferential statistical analysis reveals that completing the test preparation course is associated with a significant difference in mean math scores. Additionally, a significant linear relationship exists between reading and writing scores. These findings can inform targeted interventions in education, helping to improve student performance.

In real-life scenarios, this analysis can guide educational institutions in designing effective interventions for students. For example, resources can be allocated to enhance test preparation programs, and educators can tailor writing support based on students’ reading abilities.

In conclusion, the inferential statistical analysis provides actionable insights for educators, policymakers, and institutions aiming to enhance student outcomes. Understanding these statistical relationships empowers stakeholders to make informed decisions and implement targeted strategies for educational improvement.

Inferential Statistical Analysis on Student Performance Dataset

Khyati Naik

November 26, 2023

1. Hypothesis Test: Test Preparation and Math Scores

Statistical Test: Independent t-test

Result:

2. Linear Regression: Reading Scores predicting Writing Scores

Statistical Test: Linear Regression

Result:

Real-Life Uses:

Conclusion: