1 - Introduction

The Exame Nacional do Ensino Médio (ENEM) is one of Brazil’s most consequential large-scale educational assessments, taken annually by millions of students who have completed or are completing secondary education. Originally created in 1998 as a diagnostic tool to evaluate the quality of basic education, the ENEM has since evolved into a high-stakes exam that plays a central role in shaping educational trajectories across the country. Today, ENEM scores are used in multiple national programs, including public university admissions (via Sisu), private university scholarships (Prouni), and government-funded student loans (FIES). Because of its broad range of functions and massive participation, the ENEM is arguably the most influential assessment in contemporary Brazilian education.

1.1 - Structure of the ENEM

The ENEM is divided into five major components:

  • Languages, Codes, and Their Technologies;
  • Human Sciences;
  • Natural Sciences;
  • Mathematics and Their Technologies;
  • Essay.

The first four components consist of standardized multiple-choice tests, while the essay is a written argumentative text evaluated by trained raters. The essay portion holds substantial weight within the overall scoring system and often serves as a decisive factor in university admissions.

1.2 - Essay Competencies

ENEM evaluates students on:

  1. C1 – Formal Written Portuguese
  2. C2 – Understanding & Developing the Topic
  3. C3 – Argumentation & Interpretation
  4. C4 – Cohesion & Coherence
  5. C5 – Social Intervention Proposal

1.3 - Research Questions

This project explores the following:

  • Do ENEM essay scores vary across Brazil’s regions?
  • How do regions differ across the five competencies?
  • What can regional patterns tell us about broader educational inequalities?
library(tidyverse)
library(geobr)
library(sf)

# Load and clean data
RESULTADOS_2024 <- read.table("~/Downloads/microdados_enem_2024/DADOS/RESULTADOS_2024.csv", sep=";", quote="\"")
enem <- RESULTADOS_2024[-1,]

names(enem) <- c(
  "ID","year","school","countyID","county","stateID","state","dpadm",
  "locationID","sfe","countytestID","countytest","statetestID","statetest",
  "attCN","attCH","attLC","attMT","testIDcn","testIDch","testIDlc","testIDmt",
  "scoreCN","scoreCH","scoreLC","scoreMT","answersCN","answersCH","answersLC",
  "answersMT","TP_LINGUA","rubricCN","rubricCH","rubricLC","rubricMT",
  "status_essay","scoreC1","scoreC2","scoreC3","scoreC4","scoreC5","essay_score"
)

# Assign regions
enem_reg <- enem %>%
  mutate(region = case_when(
    state %in% c("AC","AP","AM","PA","RO","RR","TO") ~ "North",
    state %in% c("AL","BA","CE","MA","PB","PE","PI","RN","SE") ~ "Northeast",
    state %in% c("DF","GO","MT","MS") ~ "Central-West",
    state %in% c("ES","MG","RJ","SP") ~ "Southeast",
    state %in% c("PR","RS","SC") ~ "South",
    TRUE ~ NA_character_
  ))

# Select relevant columns
enem_sel <- enem_reg %>%
  select(region, state, scoreC1:scoreC5, essay_score) %>%
  mutate(across(scoreC1:essay_score, as.numeric))

2 - Visualizations and Analysis

2.1 - Mean Essay Score by Region (Table)

mean_scores <- enem_sel %>%
  drop_na(region) %>%
  group_by(region) %>%
  summarise(across(scoreC1:essay_score, ~ mean(.x, na.rm = TRUE)))

mean_scores

2.2 - Map: Mean Essay Scores by Region

regions_br <- read_region(year = 2020)

region_mapping <- c(
  "North"="Norte","Northeast"="Nordeste","Central-West"="Centro Oeste",
  "Southeast"="Sudeste","South"="Sul"
)

map_data <- regions_br %>%
  left_join(mean_scores %>% mutate(region_geobr = region_mapping[region]),
            by = c("name_region" = "region_geobr"))

centroids <- st_centroid(map_data)

ggplot(map_data) +
  geom_sf(aes(fill = essay_score), color = "black") +
  geom_sf_text(data = centroids, aes(label = round(essay_score,1)), size = 4) +
  scale_fill_viridis_c() +
  labs(title = "ENEM Mean Essay Score by Region",
       fill = "Mean Essay Score") +
  theme_minimal()

The map displays pronounced regional differences in essay performance. The Southeast leads with the highest average score (≈ 651.5), followed by the South and Central-West. In contrast, the North (≈ 572.3) and Northeast (≈ 594.7) score substantially lower, revealing a strong north–south educational divide. The pattern mirrors socioeconomic and infrastructural disparities historically documented in Brazilian education.

2.3 - Bar Plot: Mean Competency Scores by Region

enem_long <- enem_sel %>%
  pivot_longer(scoreC1:scoreC5, names_to = "competency", values_to = "score")

ggplot(enem_long, aes(region, score, fill = competency)) +
  stat_summary(fun = "mean", geom = "bar", position = "dodge") +
  labs(title = "Mean of Essay Competencies by Region",
       x = "Region", y = "Mean Score") +
  theme_minimal()

The bar plot shows consistent ranking across competencies for every region. C2 (topic development) is the highest-scoring competency nationwide, while C5 (proposal for intervention) tends to be the lowest, particularly in northern regions. The Southeast performs highest across all competencies, whereas the North maintains the lowest scores.
The similarity in patterns indicates that regions differ in overall proficiency, not in the structure of their writing strengths and weaknesses.

2.4 - Line Plot: Regional Profiles Across Competencies

line_data <- enem_long %>%
  group_by(region, competency) %>%
  summarise(mean_score = mean(score, na.rm = TRUE))

ggplot(line_data, aes(competency, mean_score, group = region, color = region)) +
  geom_line(size = 1.2) +
  geom_point(size = 2.5) +
  labs(title = "Mean Scores of ENEM Essay Competencies by Region",
       x = "Competency", y = "Mean Score") +
  theme_minimal()

All regions show a similar shape: a strong peak at C2 and a decline toward C5. The Southeast maintains the highest trajectory across all competencies, whereas the North shows the lowest means, especially in C3 and C5.
The nearly parallel lines reinforce that regional disparities are systemic, not tied to particular writing skills.

3 - Conclusion

This quantitative analysis shows clear and consistent regional disparities in the ENEM 2024 essay scores. Regions differ in overall writing proficiency, not in specific competencies: all regions follow the same performance pattern (C2 highest → C5 lowest).
The sharp contrast between the Southeast/South and the North/Northeast mirrors historical socioeconomic inequalities, making ENEM a reflection of broader structural disparities in Brazilian education.

4 - References:

Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (INEP). (n.d.). Matriz de referência do ENEM. INEP. https://download.inep.gov.br/download/enem/matriz_referencia.pdf

Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (INEP). (2025). Microdados do ENEM 2024 [Dataset]. Portal Brasileiro de Dados Abertos. https://dados.gov.br/dados/conjuntos-dados/inep-microdados-do-enem

Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. (2025). A redação no ENEM 2025: Cartilha do(a) participante [PDF]. Ministério da Educação. https://download.inep.gov.br/publicacoes/institucionais/avaliacoes_e_exames_da_educacao_basica/a_redacao_no_enem_2025_cartilha_do_participante.pdf

Pereira, R. H. M., & Gonçalves, C. N. (2019). geobr: An R package to easily access shapefiles of the Brazilian Institute of Geography and Statistics (R package). GitHub repository.