Abstract

This project investigates how various social, familial, and behavioral factors are correlated with academic performance and average free time among Portuguese high school students. Using the Student Performance dataset compiled by Cortez and Silva in 2008, we explore three specific research questions: the first two concerning the influence of parental education and alcohol consumption on students’ final Portuguese grades (G3), and the last one focusing on how one’s relationship status influences the amount of free time students report. From the results, we found that higher levels of father’s education are strongly correlated with better academic performance, while weekday alcohol consumption is negatively correlated with grades. However, weekend drinking shows no significant effect. Finally, students’ relationship status does not appear to influence how much free time they report. These findings provide insight into which factors may meaningfully impact Portuguese high school students’ academic outcomes and time management, and suggest potential areas for school level intervention and student support.

Introduction

This project focuses on how different factors influence school performance in Portuguese secondary school students. The dataset our team used is the Student Performance dataset compiled by Cortez and Silva in 2008. The dataset contains 649 observations and 30 features collected through questionnaires from two Portuguese high schools, with features covering aspects of family background, social environment, academic records, and types of lifestyles. The primary objective of our analysis is to examine how different social and behavioral factors relate to the final Portuguese grade (G3), which is measured on a scale from 0 to 20. By answering the three focused research questions below, we aim to understand how various background and behavioral variables correlate with or predict academic performance. The results from this analysis could help inform educational strategies and potentially help students with their stress and mental health.

Research Question 1

After statistically controlling for weekly study time (on a 1–4 numeric scale where 1 indicates less than 2 hours and 4 indicates more than 10 hours) and internet access (coded as “yes” or “no”), how is the father’s highest education level (on a 0–4 numeric scale where 0 indicates no formal education and 4 indicates higher education) associated with Portuguese high school students’ final grades in Portuguese class (G3), which are measured on a 0–20 numeric scale?

Hypothesis:

\(H_0\): After controlling for weekly study time and internet access, the father’s highest education level (0 = no formal education to 4 = higher education) has no significant association with students’ final Portuguese grade (G3, measured on a 0–20 scale).

\(H_1\): After controlling for weekly study time and internet access, the father’s highest education level (0 = no formal education to 4 = higher education) is significantly positively associated with students’ final Portuguese grade (G3, 0–20 scale).

Research Question 2

Based on the number of alcoholic beverages consumed by Portuguese high school students per weekday and weekend (on a 1-5 numeric scale where 1 indicates very low and 5 indicating very high), how does it affect Portuguese high school students’ final grades in Portuguese class (G3), which are measured on a 0–20 numeric scale?

Hypothesis:

\(H_0\): There is no significant association between students’ alcohol consumption levels during weekdays and weekends (1 = very low to 5 = very high) and their final Portuguese grade (G3, 0–20 scale).

\(H_1\): Students’ alcohol consumption levels during weekdays and weekends (1 = very low to 5 = very high) are significantly associated with their final Portuguese grade (G3, 0–20 scale).

Research Question 3

Is there a statistically significant difference in reported average free time (on a 1-5 categorical scale where 1 indicates very low and 5 indicates very high) between Portuguese high school students who are in romantic relationships (coded as 1), and those who are not in romantic relationships (coded as 0)?

Hypothesis:

\(H_0\): There is no significant difference in average free time between students in a romantic relationship and those who are not.

\(H_1\): There is a significant difference in average free time (1 = very low to 5 = very high) between students in a romantic relationship and students who are not in romantic relationships.

Data Loading and Processing

Data Loading

library(dplyr)
library(tidyr)
library(readr)
library(ggplot2)
library(knitr)
library(MASS)  # For AIC/BIC 
library(car) # For vif

por_df <- read_delim("C:/Users/jaych/Downloads/PSTAT100-Final-Project-main/PSTAT100-Final-Project-main/data/student-por.csv", delim = ";")
head(por_df)

por_df <- por_df %>%
  mutate(
    school = as.factor(school),
    sex = as.factor(sex),
    address = as.factor(address),
    famsize = as.factor(famsize),
    Pstatus = as.factor(Pstatus),
    Mjob = as.factor(Mjob),
    Fjob = as.factor(Fjob),
    reason = as.factor(reason),
    guardian = as.factor(guardian),
    schoolsup = as.factor(schoolsup),
    famsup = as.factor(famsup),
    paid = as.factor(paid),
    activities = as.factor(activities),
    nursery = as.factor(nursery),
    higher = as.factor(higher),
    internet = as.factor(internet),
    romantic = as.factor(romantic),
    # ordered factor from 1-5 for very high, low etc.
    Dalc = factor(Dalc, levels = 1:5, ordered = TRUE), 
    Walc = factor(Walc, levels = 1:5, ordered = TRUE)
  )

Data Processing and Cleaning

The Student Performance dataset does not contain any missing values or fields with unclear meanings. Therefore, no data cleaning or preprocessing was necessary for this project.

Research Question 1

Modeling Process

The aim is to observe if, after controlling for study time and internet access, father’s education significantly influenced the student’s final Portuguese grade. The question at hand calls for a partial significance test, which works by having one one the models be a subset to another. Therefore, if there is a significant difference, it is due to the predictors added in between.

The models were both linear and used the following predictors:

Model 1: Study time and internet access.
Model 2: Model 1’s predictors as well as the education level of the student’s father.

The two linear models were compared using an ANOVA test. Since the only predictor added in model 2 was the father’s education, if there was a significant decrease in residual variance–as indicated by the resulting p-value–it would be due to this predictor.

Results

anova_kable <- function(model1, model2) {
  aov_out <- anova(model1, model2)

  Model <- rownames(aov_out)
  Res.Df <- aov_out[, "Res.Df"]
  RSS <- signif(aov_out[, "RSS"], 5)
  Df <- ifelse(is.na(aov_out[, "Df"]), "", aov_out[, "Df"])
  Sum_of_Sq <- ifelse(is.na(aov_out[, "Sum of Sq"]), "", signif(aov_out[, "Sum of Sq"], 5))
  F <- ifelse(is.na(aov_out[, "F"]), "", signif(aov_out[, "F"], 4))

  # Extract p-values as numeric, then format with signif( ,2)
  pvals_raw <- aov_out$`Pr(>F)`
  pvals_signif <- ifelse(
    is.na(pvals_raw),
    "",
    formatC(signif(pvals_raw, 2), format = "e", digits = 2)
  )

  Signif <- cut(as.numeric(pvals_raw),
                breaks = c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
                labels = c("***", "**", "*", ".", ""),
                right = TRUE)
  Signif[is.na(Signif)] <- ""

  df <- data.frame(
    Model = Model,
    Res.Df = Res.Df,
    RSS = RSS,
    Df = Df,
    `Sum of Sq` = Sum_of_Sq,
    F = F,
    `Pr(>F)` = pvals_signif,
    Signif = Signif,
    stringsAsFactors = FALSE
  )

  knitr::kable(df, align = "c",
               caption = "ANOVA Comparison Table")
}

lm_table <- function(model, digits = 2, pval_sci_threshold = 0.01, caption = "Linear Model Summary") {
  df <- broom::tidy(model) %>%
    dplyr::select(term, estimate, std.error, statistic, p.value) %>%
    dplyr::rename(
      Term = term,
      Estimate = estimate,
      `Std. Error` = std.error,
      `t value` = statistic,
      `Pr(>|t|)` = p.value
    )

  # Format values
  df$Estimate <- signif(df$Estimate, digits)
  df$`Std. Error` <- signif(df$`Std. Error`, digits)
  df$`t value` <- signif(df$`t value`, digits)

  # Format p-values with smart thresholding
  df$`Pr(>|t|)` <- ifelse(
    is.na(df$`Pr(>|t|)`),
    "",
    ifelse(
      df$`Pr(>|t|)` < pval_sci_threshold,
      formatC(signif(df$`Pr(>|t|)`, digits), format = "e", digits = digits),
      formatC(round(df$`Pr(>|t|)`, digits), format = "f", digits = digits)
    )
  )

  # Add significance stars
  pval_raw <- broom::tidy(model)$p.value
  df$Signif <- cut(pval_raw,
                   breaks = c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
                   labels = c("***", "**", "*", ".", ""),
                   right = TRUE)
  df$Signif[is.na(df$Signif)] <- ""

  # Return styled kable
  knitr::kable(df, align = "c", caption = caption)
}

Here were the results:

f1 <- G3 ~ studytime + internet
f2 <- G3 ~ studytime + internet + Fedu

model1 <- lm(data=por_df, formula = f1)
model2 <- lm(data=por_df, formula = f2)
anova_parsig <- anova(model1, model2)

anova_kable(model1, model2)

ANOVA Comparison Table
Model	Res.Df	RSS	Df	Sum.of.Sq	F	Pr..F.	Signif
1	646	6207.3
2	645	5995.9	1	211.42	22.74	2.30e-06	***

Interpretation

With a p-value of approximately 2.30 * \(10^{-6}\), there is sufficient evidence to reject the null hypothesis,
\(H_0\): that after controlling for weekly study time and internet access, the father’s highest education level has no significant association with students’ final Portuguese grade. Therefore, opting instead for the alternative hypothesis, the father’s highest education level does play a role in the students’ final Portuguese grade.

Visualization and Communication (Model Checking)

Model checking was performed to ensure this result from the previous section was accurate. To assess whether this model is appropriate, we examined Model 2, which contains the same predictors as Model 1 in addition to father’s education. For the partial significance test to be valid, Model 2 must satisfy the standard linear regression assumptions: independence, no structure in residuals, normality of residuals, and homoscedasticity (equal variance across fitted values).

Independence is satisfied because the dataset includes the full population of students from each school during that year. Additionally, because this is an observational study with no temporal or treatment structure, residual independence is expected and visually confirmed. It only remains to be seen whether the remaining conditions—normality and equal variance—hold true.

shapiro_pval <- function(resids, digits=3){
  shap <- shapiro.test(resids)
  signif(shap$p.value, digits)
}

qqplott <- function(residuals){
  ggplot(data.frame(residuals), aes(sample = residuals)) +
    stat_qq() +
    stat_qq_line() +
    labs(title = "Q-Q Plot of Residuals", x = "Theoretical Quantiles", y = "Sample Quantiles") +
    theme_minimal()
}

hist_resids <- function(resids, bins = 30) {
  df <- data.frame(residuals = resids)
  ggplot(df, aes(x = residuals)) +
    geom_histogram(bins = bins, fill = "steelblue", color = "black", alpha = 0.7) +
    labs(title = "Student Grade Residuals", x = "Residuals", y = "Count") +
    theme_minimal()
}


residplot_fitval <- function(model) {
  df <- data.frame(Fitted = fitted(model), Residuals = residuals(model))
  ggplot(df, aes(x = Fitted, y = Residuals)) +
    geom_point() +
    geom_hline(yintercept = 0, linetype = "dashed", color = "blue") +
    labs(title = "Residuals vs. Fitted Values", x = "Fitted Values", y = "Residuals") +
    theme_minimal()
}

resids <- model2$residuals

Normality of Residuals

hist_resids(resids)

qqplott(resids)

shap_test <- shapiro.test(resids)
shap_pval <- signif(shap_test$p.value, 3)
off_values_percent <- round(100 * (sum(resids < -7)/nrow(por_df)), 2)

The Q-Q plot indicates a deviation from normality in the residuals, and likewise, a Shapiro-wilk test result of 2.83^{-15}(p-value < 0.05) corroborates this deviation. Nevertheless, as evident through the histogram, the deviation is primarily due to a relatively small group of students who scored significantly lower than predicted, causing a mild left skew. These values represent approximately 2.62% of the residuals. Aside from this small subset, the residuals are normally distributed and centered around zero.

Given the large sample size and that the deviation stems from a small portion of the data, the normality assumption is considered reasonably satisfied for inference purposes.

Equal Variances Across Fitted Values

residplot_fitval(model2)

The residuals appear to have approximately constant variance across the range of fitted values, with no clear funneling or systematic spread. Thus, there is no strong evidence to reject the homoscedasticity assumption.

Hence, all assumptions for the partial significance test performed were reasonably satisfied. Therefore, the previously found p-value of approximately 2.30 * \(10^{-6}\) can be trusted.

Conclusion and Recommendation

After controlling for study time and internet access, the father’s education level remains a significant positive predictor of students’ final Portuguese grades. This likely reflects the academic support that more educated parents are able to provide at home; eg, helping with homework and explaining difficult topics. To support those lacking this advantage, schools should consider expanding tutoring and afterschool programs to foster a more equitable learning environment. Additionally, making greater use of school libraries—through activities like book clubs or guest author visits—could help motivate students to read more and strengthen their Portuguese language skills.

Research Question 2

Modeling Process

To investigate the association between alcohol consumption and students’ final Portuguese grades (G3), we employed a two-way ANOVA. This method allows us to examine the main effects of weekday alcohol consumption (Dalc) and weekend alcohol consumption (Walc) on G3, as well as their interaction effect. The model is specified as G3 ~ Dalc * Walc, which includes both main effects and the interaction term. Both Dalc and Walc are treated as ordered factors on a 1-5 scale, representing consumption levels from very low to very high.

Results

The ANOVA test was performed to assess the significance of weekday and weekend alcohol consumption, and their interaction, on the final grade. The results are summarized in the table below.

model_q2 <- aov(G3 ~ Dalc * Walc, data = por_df)

anova_kable_q2 <- function(aov_model, caption = "Two-Way ANOVA Summary") {
  aov_summary <- summary(aov_model)[[1]]
  
  df <- data.frame(
    Term = rownames(aov_summary),
    Df = aov_summary$Df,
    `Sum Sq` = signif(aov_summary$`Sum Sq`, 5),
    `Mean Sq` = signif(aov_summary$`Mean Sq`, 4),
    `F value` = signif(aov_summary$`F value`, 4),
    `Pr(>F)` = formatC(aov_summary$`Pr(>F)`, format = "f", digits = 4),
    stringsAsFactors = FALSE
  )
  
  p_values <- aov_summary$`Pr(>F)`
  df$Signif <- cut(p_values,
                   breaks = c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
                   labels = c("***", "**", "*", ".", ""),
                   right = TRUE)
  df$Signif[is.na(df$Signif)] <- ""

  knitr::kable(df, align = "c", caption = caption)
}

#ANOVA table for the Q2 model
anova_kable_q2(model_q2, caption = "ANOVA for Alcohol Consumption's Effect on Final Grades")

ANOVA for Alcohol Consumption’s Effect on Final Grades
Term	Df	Sum.Sq	Mean.Sq	F.value	Pr..F.	Signif
Dalc	4	327.51	81.880	8.282	0.0000	***
Walc	4	40.11	10.030	1.014	0.3992
Dalc:Walc	14	206.78	14.770	1.494	0.1077
Residuals	626	6188.90	9.886	NA	NA

Interpretation

The results from the two-way ANOVA indicate that weekday alcohol consumption (Dalc) has a statistically significant main effect on students’ final grades (G3), with a p-value of 0.0000. This suggests that higher levels of weekday drinking are significantly associated with differences in academic performance.

In contrast, weekend alcohol consumption (Walc) does not have a statistically significant effect on grades, with a p-value of 0.3992, which is well above the conventional 0.05 threshold. Therefore, weekend drinking alone does not show a significant association with final grades.

The interaction term Dalc:Walc has a p-value of 0.1077, which is also not statistically significant. This indicates that the effect of weekday alcohol consumption on grades does not significantly differ across levels of weekend alcohol consumption, and vice versa.

Visualization and Communication (Model Checking)

To better understand the relationship between alcohol consumption and academic performance, an interaction plot is generated. This plot helps to visualize the main effects of Dalc and Walc and the lack of a significant interaction between them:

interaction.plot(x.factor = por_df$Dalc, 
                 trace.factor = por_df$Walc, 
                 response = por_df$G3,
                 fun = mean,
                 type = "b",
                 xlab = "Weekday Alcohol Consumption (Dalc)",
                 ylab = "Mean Final Grade (G3)",
                 col = c("blue", "red", "green", "purple", "orange"),
                 pch = c(19, 17, 15, 12, 8),
                 trace.label = "Weekend Alcohol Consumption (Walc)",
                 main = "Interaction Plot of Alcohol Consumption and Final Grades")

The interaction plot displays mean final grades by levels of weekday and weekend alcohol consumption. For all levels of weekend alcohol use, there is a general downward trend in academic performance as weekday alcohol use increases. The lines representing different levels of weekend alcohol use are mostly parallel, suggesting no strong interaction between weekday and weekend drinking. One exception is the sharp increase for Walc = 1 at Dalc = 3, which may indicate variability or a small sample at that point. Overall, the plot suggests that weekday drinking is more consistently associated with lower grades than weekend drinking, and that the two do not strongly interact.

To check the assumptions of the ANOVA model, we analyze the residuals:

Normality of Residuals

resids_q2 <- residuals(model_q2)

# Histogram of residuals
hist(resids_q2, 
     main = "Histogram of Residuals for Q2 Model", 
     xlab = "Residuals",
     col = "lightblue",
     border = "black")

qqplott(resids_q2)

The histogram of the residuals shows a distribution that is approximately normal as it is roughly symmetric and bell-shaped, centered around at 0. The Q–Q plot supports this conclusion for the most part, as the residuals closely follow the theoretical quantile line in the center of the distribution. However, there is some noticeable deviation in the lower tail, suggesting potential skewness or the presence of outliers. Despite these tail deviations, the residuals largely conform to the normality assumption required for ANOVA. Overall, the assumption of normality appears reasonably satisfied.

Equal Variances Across Fitted Values

residplot_fitval(model_q2)

While the Shapiro-Wilk test returned a highly significant p-value (\(p \approx 4.94 \times 10^{-15}\)), indicating a deviation from normality in the residuals, visual assessments such as the histogram and Q–Q plot suggest that the deviation is relatively minor and occurs primarily in the tails.

Additionally, the residuals appear to have approximately constant variance across the range of fitted values, with no clear funneling or systematic patterns—supporting the assumption of homoscedasticity. The presence of vertically aligned points is expected in an ANOVA model due to discrete factor levels, and the vertical spread within those columns is reasonably consistent.

Given the large sample size, ANOVA is generally robust to mild departures from normality. Therefore, despite the significant p-value from the Shapiro-Wilk test, there is no strong evidence of a meaningful violation of model assumptions. The assumptions of normality and homoscedasticity are considered reasonably satisfied, supporting the trustworthiness of the ANOVA results.

Conclusion

Our two-way ANOVA analysis examined the relationship between Portuguese high school students’ alcohol consumption and their final grades in Portuguese class. The results revealed a significant negative association between weekday alcohol consumption and final grades, suggesting that higher levels of drinking during the school week are associated with lower academic performance.

In contrast, weekend alcohol consumption did not show a statistically significant effect on grades. The interaction between weekday and weekend alcohol use was not significant, indicating that the effect of weekday drinking on academic outcomes does not vary meaningfully across levels of weekend drinking.

These findings suggest that weekday drinking may be disruptive to students’ academic success, possibly due to its interference with school responsibilities, sleep quality, and cognitive performance. While the study does not establish causality, the strong association points to a behavioral pattern that may warrant intervention. Future research or policy efforts may consider addressing weekday alcohol consumption to support better academic outcomes.

Research Question 3

Modeling Process

When looking at Research Question 3, it was evident that I would be comparing the average amount of free time between two groups. Those groups being students in relationships versus students who are not in relationships. The first thing I did was create side by side density histograms showing the distributions of average free time between the two groups. I made sure it was a density histogram because there were more students not in relationships (412) than those in relationships (239). This ensured that the distributions could be compared 1 to 1 without being affected by unequal weighting. The next plot I created was an overlayed density plot of both distributions which allows the viewer to see the similarities and differences in distribution in both plots. I created these plots to get some preliminary insights on how the distributions looked before I used a t-test to see whether I could reject or fail to reject the null hypothesis that there is no significant difference in average free time between students in a romantic relationship and those who are not.

ggplot(por_df, aes(x = freetime)) +
  geom_histogram(aes(y = ..density.., fill = as.factor(romantic)), 
                 alpha = 0.7, binwidth = 1, color = "black") +
  facet_wrap(~ romantic) +
  labs(title = "Density Plot of Freetime Based on Relationship Status", 
       x = "Freetime", 
       y = "Density",
       fill = "In a Relationship") +
  theme_minimal()

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

ggplot(por_df, aes(x = freetime, fill = as.factor(romantic))) +
  geom_density(alpha = 0.4, bw = 0.35) +
  labs(title = "Density Estimate of Freetime (In Relationship vs Not in Relationship)",
       x = "Freetime",
       y = "Density",
       color = "Year")

Results

Summary of Welch’s t-test for Freetime by Romantic Status
Statistic	Value
Mean (No)	3.159
Mean (Yes)	3.218
Variance (No)	1.122
Variance (Yes)	1.079
t-statistic	-0.693
df	505.830
p-value	0.488
95% CI Lower	-0.226
95% CI Upper	0.108

We use a Welch’s two-sided t-test because the inputted data is independent, the sample sizes between the groups differ, and the variances are unequal. Based on the t-test, we get a t statistic of -0.67501, and a p value of 0.5. Because this is a two tailed test with a = 0.05, our alpha endpoints are approximately -1.96 and 1.96. The t statistic of -0.67501 falls between this interval, so we fail to reject the null hypothesis.

Interpretation

Based on the two-sided Welch’s t-test, we fail to reject the null hypothesis. This indicates that there is no significant difference in average free time between students in relationships and students not in relationships. This finding agrees with the density histograms and density plot because when looking at the models, the two groups appear to follow similar distributions. Running the t-test confirms the observation that there is no statistically significant difference in average free time.

Conclusion

The analysis for our third research question explored whether being in a romantic relationship affects the amount of average free time reported by Portuguese high school students. After comparing distributions and using Welch’s two sided t-test, the study found no statistically significant difference in free time between students who are in romantic relationships and those who are not (p = 0.488). Both the visualizations (density plots) and summary statistics (means and variances) supported this result, indicating similar distributions across groups.

Based on our findings, we can recommend that different services such as student support services and wellness programs can focus more broadly on time management strategies, balanced lifestyles, and stress management that apply to all students rather than tailoring efforts based on relationship status.

Although our findings give us actionable insights into average free time vs. relationship status, apparent limitations or bias can be the result of self-reported data and simplistic categorization, with our findings overlooking nuances such as quality, duration, or intensity of the relationship that could influence time availability.

To possibly improve or add to future insights and analysis, we could incorporate interviews or surveys that provide more detailed context around students’ relationships and time use, or utilize time-series data to help assess how changes in relationship status affect students’ behaviors and time availability across different periods.

Final Conclusion and Recommendations

Our project aimed to explore how various social, familial, and behavioral factors influence the academic performance and lifestyle of Portuguese high school students, using the Student Performance dataset compiled by Cortez and Silva (2008). Specifically, we examined the effects of parental education, alcohol consumption, and relationship status on students’ final grades in Portuguese and their reported free time. After our in depth analysis, we can address key insights, recommendations, limitations, and future directions.

Key Insights

Parental Education: We found strong evidence that a student’s father’s education level is positively associated with their final Portuguese grade, even after controlling for weekly study time and internet access. This suggests that higher parental education may provide academic advantages, likely through increased support, expectations, or access to educational resources.
Alcohol Consumption: Weekday alcohol consumption was significantly negatively associated with academic performance, whereas weekend alcohol consumption had no significant effect. This highlights the particular academic risk associated with drinking during school days, likely due to its impact on focus, sleep, and class participation.
Relationship Status and Free Time: No significant difference was found in average free time between students who are in romantic relationships and those who are not. This implies that relationship status alone may not meaningfully affect how much free time students perceive themselves to have.

Actionable Recommendations

Expand Academic Support: To help close the gap for students whose parents have lower education levels, schools should consider investing in academic mentoring, tutoring programs, and parent engagement initiatives that provide families with tools to support learning at home.
Target Weekday Alcohol Use: Educational campaigns should specifically address the risks of weekday alcohol consumption and its negative academic consequences. Collaborations with parents, counselors, and local health authorities could reinforce this messaging and offer preventive resources.
Universal Time Management Programs: Given that free time did not significantly vary with relationship status, time management resources, such as programs, apps, or advisory sessions, should be offered broadly to all students rather than tailored to specific demographic groups.

Limitations

While the findings offer meaningful insights, several limitations must be acknowledged:

The data is mostly observational, limiting our ability to imply causality.
Many variables, such as free time, alcohol use, and relationship status, rely on self-reported data, which may introduce bias or measurement error.
The dataset represents students from two Portuguese schools during a specific time period, which is a relative generalization to other regions or educational systems.

Future Directions

Qualitative research might provide deeper context, especially around family support structures and relationship dynamics.
Future analyses could also explore interaction between multiple variables, such as whether the impact of alcohol use varies by parental education or relationship status.
Expanding this type of analysis to other subjects or countries could validate the findings and help create more elaborate and valid student support strategies.

In summary, this project sheds light on several critical factors affecting student outcomes and well-being. While further research can be beneficial, the results provide a great foundation for insights aimed at improving academic performance and supporting student development in meaningful ways.

Understanding What Shapes Academic Outcomes in Portuguese High School Students

Modeling Student Performance Using Regression, ANOVA, and Hypothesis Testing

Jay Chang

2025-05-20

Abstract

Introduction

Research Question 1

Research Question 2

Research Question 3

Data Loading and Processing

Research Question 1

Modeling Process

Results

Interpretation

Visualization and Communication (Model Checking)

Normality of Residuals

Equal Variances Across Fitted Values

Conclusion and Recommendation

Research Question 2

Modeling Process

Results

Interpretation

Visualization and Communication (Model Checking)

Normality of Residuals

Equal Variances Across Fitted Values

Conclusion

Research Question 3

Modeling Process

Results

Interpretation

Conclusion

Final Conclusion and Recommendations

Key Insights

Actionable Recommendations

Limitations

Future Directions