Resumen Ejecutivo

Este documento analiza los factores que explican el éxito del Sistema PLC (Prueba de Lectura Competente) como el mejor estimador correlacional de Saber 11 y PISA en Colombia.

Objetivos:

Identificar qué habilidades PLC (Concordar, Relievar, Inferir, Construir) predicen mejor el desempeño en Saber 11
Determinar qué instrumentos cualitativos ISI predicen mejor tanto PLC como Saber 11
Establecer un marco analítico reproducible para actualización anual

Hallazgos clave: (se completarán automáticamente con los resultados)

1 Introducción

1.1 Contexto

La Fundación Social Alberto Merani ha establecido un sistema de evaluación longitudinal (PLC) que se aplica tres veces al año (Entrada, Control, Salida) a 87 instituciones educativas afiliadas.

Este sistema se ha consolidado como el mejor predictor de: - Resultados en Saber 11 (prueba nacional colombiana) - Resultados proyectados en PISA (prueba internacional)

1.2 Marco Conceptual: Modelo CRIC

El PLC evalúa competencias lectoras mediante el modelo CRIC:

Concordar: Habilidad para identificar información explícita
Relievar: Capacidad de identificar información relevante del texto
Inferir: Construcción de significados implícitos
Construir: Elaboración de interpretaciones complejas y juicios críticos

Adicionalmente, el Índice de Sistematicidad Institucional (ISI) evalúa 6 dimensiones cualitativas de implementación:

Apropiación institucional
Vinculación afectiva
Implementación correcta
Integración curricular
Seguimiento y retroalimentación
Permanencia y mejora continua

2 Metodología

2.1 Datos

# Cargar librerías
library(tidyverse)      # Manipulación de datos
library(readxl)         # Leer Excel
library(corrplot)       # Matrices de correlación
library(ggplot2)        # Visualizaciones
library(psych)          # Análisis psicométrico
library(car)            # Diagnósticos de regresión
library(relaimpo)       # Importancia relativa de predictores
library(knitr)          # Tablas
library(kableExtra)     # Tablas mejoradas
library(GGally)         # Matrices de dispersión
library(gridExtra)      # Layouts de gráficos
library(Hmisc)          # Correlaciones con p-values
library(lavaan)         # Modelos de ecuaciones estructurales

# Cargar datos
datos_raw <- read_excel("~/Desktop/FSAM/Evaluación/Calendario_A/Correlación_PLC_Saber_2025.xlsx")

# **CORRECCIÓN CRÍTICA: Limpiar nombres de columnas**
# Eliminar saltos de línea que causan el error
colnames(datos_raw) <- gsub("\n", " ", colnames(datos_raw))
# Eliminar espacios múltiples
colnames(datos_raw) <- gsub("\\s+", " ", colnames(datos_raw))

# Mostrar estructura
glimpse(datos_raw)

## Rows: 190
## Columns: 22
## $ Codigo                                              <dbl> 61077, 131813, 719…
## $ PLC                                                 <chr> "Entrada", "Entrad…
## $ IE                                                  <chr> "Institución Educa…
## $ Concordar                                           <dbl> 48.31429, 54.05000…
## $ Relievar                                            <dbl> 44.47619, 51.42500…
## $ Inferir                                             <dbl> 46.34286, 62.97500…
## $ Construir                                           <dbl> 42.81905, 43.80000…
## $ Global                                              <dbl> 46.27619, 54.45000…
## $ LC                                                  <dbl> 61, 64, 60, 62, 62…
## $ Math                                                <dbl> 63, 59, 55, 60, 61…
## $ CN                                                  <dbl> 58, 56, 56, 57, 61…
## $ CS                                                  <dbl> 56, 56, 54, 47, 59…
## $ I                                                   <dbl> 61, 62, 61, 62, 69…
## $ Prom_ICFES                                          <dbl> 59.61538, 59.00000…
## $ `ISI 2025`                                          <dbl> 0.6, 0.6, 0.6, 0.3…
## $ `Categoría 1: Apropiación 2025`                     <dbl> 5, 6, 7, 4, 4, 5, …
## $ `Categoría 2: Vinculación Afectiva 2025`            <dbl> 3, 7, 6, 5, 4, 7, …
## $ `Categoría 3: Implementación Correcta 2025`         <dbl> 5, 6, 7, 7, 5, 6, …
## $ `Categoría 4: Integración Curricular 2025`          <dbl> 3, 5, 6, 5, 2, 4, …
## $ `Categoría 5: Seguimiento y Retroalimentación 2025` <dbl> 6, 5, 4, 7, 4, 5, …
## $ `Categoría 6: Permanencia y Mejora Continua 2025`   <dbl> 4, 6, 7, 8, 4, 5, …
## $ `Total 2025`                                        <dbl> 26, 35, 37, 36, 23…

2.2 Estrategia de Análisis

Dado que tenemos: - 3 evaluaciones PLC por institución (Entrada, Control, Salida) - 1 evaluación Saber 11 por institución (aplicada en segundo semestre)

Realizaremos un análisis completo y comparativo:

2.2.1 Enfoque A: Análisis por Evaluación

Comparar el poder predictivo de cada evaluación PLC (Entrada, Control, Salida) sobre Saber 11

2.2.2 Enfoque B: Análisis Longitudinal

Analizar instituciones con las 3 evaluaciones completas para identificar patrones de crecimiento

# Solución a conflictos de namespace (MASS::select vs dplyr::select)
select <- dplyr::select

# Preparar datasets por evaluación (mantener columna PLC para referencia)
datos_entrada <- datos_raw %>%
  filter(PLC == "Entrada")

datos_control <- datos_raw %>%
  filter(PLC == "Control")

datos_salida <- datos_raw %>%
  filter(PLC == "Salida")

# Instituciones con las 3 evaluaciones completas (para análisis longitudinal)
instituciones_completas <- datos_raw %>%
  group_by(IE) %>%
  filter(n() == 3) %>%
  ungroup() %>%
  dplyr::select(IE) %>%
  distinct() %>%
  pull(IE)

datos_longitudinal <- datos_raw %>%
  filter(IE %in% instituciones_completas) %>%
  arrange(IE, PLC)

# Resumen de muestras
cat(paste0(rep("=", 80), collapse=""), "\n")

## ================================================================================

cat("MUESTRAS PARA ANÁLISIS:\n")

## MUESTRAS PARA ANÁLISIS:

cat(paste0(rep("=", 80), collapse=""), "\n\n")

## ================================================================================

cat("Evaluación ENTRADA:", nrow(datos_entrada), "instituciones\n")

## Evaluación ENTRADA: 68 instituciones

cat("Evaluación CONTROL:", nrow(datos_control), "instituciones\n")

## Evaluación CONTROL: 59 instituciones

cat("Evaluación SALIDA: ", nrow(datos_salida), "instituciones\n")

## Evaluación SALIDA:  63 instituciones

cat("Instituciones COMPLETAS (3 eval):", length(instituciones_completas), "\n\n")

## Instituciones COMPLETAS (3 eval): 56

cat("Variables PLC:", paste(c("Concordar", "Relievar", "Inferir", "Construir", "Global"), collapse = ", "), "\n")

## Variables PLC: Concordar, Relievar, Inferir, Construir, Global

cat("Variables Saber 11:", paste(c("LC", "Math", "CN", "CS", "I", "Prom_ICFES"), collapse = ", "), "\n")

## Variables Saber 11: LC, Math, CN, CS, I, Prom_ICFES

3 Análisis Descriptivo

3.1 Estadísticas de las Variables PLC por Evaluación

# Solución a conflictos de namespace
select <- dplyr::select

# Variables PLC
vars_plc <- c("Concordar", "Relievar", "Inferir", "Construir", "Global")

# Función para calcular estadísticas (evitando problemas con psych::describe)
calc_stats <- function(df, eval_nombre) {
  df %>%
    select(all_of(vars_plc)) %>%
    summarise(across(everything(), list(mean = ~mean(., na.rm = TRUE), 
                                        sd = ~sd(., na.rm = TRUE)))) %>%
    pivot_longer(everything(), names_to = "var", values_to = "value") %>%
    separate(var, into = c("Componente", "stat"), sep = "_(?=[^_]+$)") %>%
    pivot_wider(names_from = stat, values_from = value) %>%
    mutate(across(where(is.numeric), ~round(., 2))) %>%
    mutate(Evaluacion = eval_nombre)
}

# Calcular para las 3 evaluaciones
stats_entrada <- calc_stats(datos_entrada, "Entrada")
stats_control <- calc_stats(datos_control, "Control")
stats_salida <- calc_stats(datos_salida, "Salida")

# Combinar
stats_plc_completo <- bind_rows(stats_entrada, stats_control, stats_salida) %>%
  pivot_wider(names_from = Evaluacion, values_from = c(mean, sd)) %>%
  select(Componente, 
         mean_Entrada, sd_Entrada,
         mean_Control, sd_Control,
         mean_Salida, sd_Salida)

stats_plc_completo %>%
  kable(caption = "Estadísticas Descriptivas - Componentes PLC por Evaluación (Escala 0-100)",
        col.names = c("Componente", 
                      "M", "DE", "M", "DE", "M", "DE"),
        digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE) %>%
  add_header_above(c(" " = 1, "Entrada" = 2, "Control" = 2, "Salida" = 2))

Estadísticas Descriptivas - Componentes PLC por Evaluación (Escala 0-100)
	Entrada		Control		Salida
Componente	M	DE	M	DE	M	DE
Concordar	60.54	9.97	52.51	7.83	66.26	6.76
Relievar	57.35	12.80	56.72	10.04	56.03	12.04
Inferir	62.40	10.29	54.46	9.78	29.82	4.29
Construir	51.32	11.13	42.45	8.61	63.92	12.26
Global	59.01	9.49	51.89	7.75	51.16	6.02

# Preparar datos para visualización
datos_plc_long <- bind_rows(
  datos_entrada %>% select(all_of(vars_plc)) %>% mutate(Evaluacion = "Entrada"),
  datos_control %>% select(all_of(vars_plc)) %>% mutate(Evaluacion = "Control"),
  datos_salida %>% select(all_of(vars_plc)) %>% mutate(Evaluacion = "Salida")
) %>%
  pivot_longer(cols = all_of(vars_plc), names_to = "Componente", values_to = "Puntaje") %>%
  mutate(Evaluacion = factor(Evaluacion, levels = c("Entrada", "Control", "Salida")))

# Boxplot comparativo por evaluación
ggplot(datos_plc_long, aes(x = Componente, y = Puntaje, fill = Evaluacion)) +
  geom_boxplot(alpha = 0.7, position = position_dodge(0.8)) +
  labs(
    title = "Distribución de Componentes PLC por Evaluación",
    subtitle = "Comparación Entrada vs Control vs Salida",
    x = "Componente CRIC",
    y = "Puntaje (Escala 0-100)",
    fill = "Evaluación"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text.x = element_text(angle = 45, hjust = 1)
  ) +
  scale_fill_brewer(palette = "Set2")

Distribución de Componentes PLC por Evaluación

3.2 Estadísticas de Variables Saber 11

# Variables Saber 11
vars_saber <- c("LC", "Math", "CN", "CS", "I", "Prom_ICFES")

# Calcular estadísticas para Saber 11 (usando datos de Salida como referencia)
stats_saber <- datos_salida %>%
  select(all_of(vars_saber)) %>%
  summarise(across(everything(), list(mean = ~mean(., na.rm = TRUE),
                                     sd = ~sd(., na.rm = TRUE),
                                     min = ~min(., na.rm = TRUE),
                                     max = ~max(., na.rm = TRUE)))) %>%
  pivot_longer(everything(), names_to = "var", values_to = "value") %>%
  separate(var, into = c("Area", "stat"), sep = "_(?=[^_]+$)") %>%
  pivot_wider(names_from = stat, values_from = value) %>%
  mutate(across(where(is.numeric), ~round(., 2)))

stats_saber %>%
  kable(caption = "Estadísticas Descriptivas - Resultados Saber 11",
        col.names = c("Área", "Media", "DE", "Mín", "Máx"),
        digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Estadísticas Descriptivas - Resultados Saber 11
Área	Media	DE	Mín	Máx
LC	64.03	3.59	56.00	74.00
Math	64.06	5.40	54.00	83.00
CN	60.90	4.34	53.00	73.00
CS	59.57	5.38	47.00	77.00
I	66.83	6.47	54.00	85.00
Prom_ICFES	62.50	4.55	53.77	77.38

4 Análisis de Correlaciones

4.1 Correlaciones PLC → Saber 11 (Por Evaluación)

# Función para calcular matriz de correlaciones
calc_cor_matrix <- function(datos, eval_nombre) {
  datos_cor <- datos %>%
    select(all_of(vars_plc), all_of(vars_saber)) %>%
    drop_na()
  
  cor_matrix <- cor(datos_cor[, vars_plc], datos_cor[, vars_saber], 
                    use = "complete.obs")
  
  return(cor_matrix)
}

# Calcular para las 3 evaluaciones
cor_entrada <- calc_cor_matrix(datos_entrada, "Entrada")
cor_control <- calc_cor_matrix(datos_control, "Control")
cor_salida <- calc_cor_matrix(datos_salida, "Salida")

4.1.1 Evaluación de Entrada

corrplot(cor_entrada, method = "color", type = "full",
         addCoef.col = "black", number.cex = 0.7,
         tl.col = "black", tl.srt = 45,
         col = colorRampPalette(c("#6D9EC1", "white", "#E46726"))(200),
         title = "Entrada: PLC → Saber 11",
         mar = c(0,0,2,0))

Correlaciones PLC-Saber (Entrada)

4.1.2 Evaluación de Control

corrplot(cor_control, method = "color", type = "full",
         addCoef.col = "black", number.cex = 0.7,
         tl.col = "black", tl.srt = 45,
         col = colorRampPalette(c("#6D9EC1", "white", "#E46726"))(200),
         title = "Control: PLC → Saber 11",
         mar = c(0,0,2,0))

Correlaciones PLC-Saber (Control)

4.1.3 Evaluación de Salida

corrplot(cor_salida, method = "color", type = "full",
         addCoef.col = "black", number.cex = 0.7,
         tl.col = "black", tl.srt = 45,
         col = colorRampPalette(c("#6D9EC1", "white", "#E46726"))(200),
         title = "Salida: PLC → Saber 11",
         mar = c(0,0,2,0))

Correlaciones PLC-Saber (Salida)

4.2 Comparación del Poder Predictivo por Evaluación

# Extraer correlación PLC Global → Prom_ICFES para cada evaluación
comparacion_eval <- data.frame(
  Evaluacion = c("Entrada (inicio año)", "Control (mitad año)", "Salida (final año)"),
  r = c(
    cor_entrada["Global", "Prom_ICFES"],
    cor_control["Global", "Prom_ICFES"],
    cor_salida["Global", "Prom_ICFES"]
  )
) %>%
  mutate(
    R2 = r^2,
    Interpretacion = case_when(
      abs(r) < 0.3 ~ "Débil",
      abs(r) < 0.5 ~ "Moderada",
      abs(r) < 0.7 ~ "Fuerte",
      TRUE ~ "Muy fuerte"
    )
  )

comparacion_eval %>%
  kable(caption = "Comparación: Correlación PLC Global → Promedio ICFES",
        col.names = c("Evaluación", "r", "R²", "Magnitud"),
        digits = 3) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Comparación: Correlación PLC Global → Promedio ICFES
Evaluación	r	R²	Magnitud
Entrada (inicio año)	0.391	0.153	Moderada
Control (mitad año)	0.651	0.424	Fuerte
Salida (final año)	0.463	0.215	Moderada

ggplot(comparacion_eval, aes(x = Evaluacion, y = r, group = 1)) +
  geom_line(size = 1.2, color = "#E46726") +
  geom_point(size = 4, color = "#E46726") +
  geom_text(aes(label = sprintf("r=%.3f", r)), vjust = -1) +
  ylim(0, 1) +
  labs(
    title = "Evolución del Poder Predictivo: PLC Global → Promedio ICFES",
    subtitle = "¿En qué momento del año PLC predice mejor Saber 11?",
    x = "Momento de Evaluación",
    y = "Correlación de Pearson (r)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Evolución del poder predictivo

5 Modelos de Regresión Múltiple

5.1 Mejor Evaluación: Análisis Detallado

# Identificar evaluación con mejor correlación
mejor_eval_nombre <- comparacion_eval$Evaluacion[which.max(comparacion_eval$r)]
mejor_eval_data <- if(mejor_eval_nombre == "Entrada (inicio año)") {
  datos_entrada
} else if(mejor_eval_nombre == "Control (mitad año)") {
  datos_control
} else {
  datos_salida
}

cat("**Evaluación con mejor poder predictivo:**", mejor_eval_nombre, "\n")

## **Evaluación con mejor poder predictivo:** Control (mitad año)

cat("Correlación PLC Global → Prom_ICFES:", round(max(comparacion_eval$r), 3), "\n\n")

## Correlación PLC Global → Prom_ICFES: 0.651

5.1.1 Modelo 1: Componentes CRIC → Promedio ICFES

# Preparar datos
datos_modelo <- mejor_eval_data %>%
  select(Concordar, Relievar, Inferir, Construir, Prom_ICFES) %>%
  drop_na()

# Modelo de regresión múltiple
modelo_cric <- lm(Prom_ICFES ~ Concordar + Relievar + Inferir + Construir, 
                  data = datos_modelo)

# Resumen del modelo
summary(modelo_cric)

## 
## Call:
## lm(formula = Prom_ICFES ~ Concordar + Relievar + Inferir + Construir, 
##     data = datos_modelo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2044 -2.3938  0.3838  2.0348 11.1984 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) 43.26660    3.31540  13.050 <0.0000000000000002 ***
## Concordar    0.17005    0.09824   1.731              0.0892 .  
## Relievar    -0.01651    0.06032  -0.274              0.7854    
## Inferir      0.18415    0.07809   2.358              0.0220 *  
## Construir    0.03066    0.07251   0.423              0.6741    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.551 on 54 degrees of freedom
## Multiple R-squared:  0.4371, Adjusted R-squared:  0.3954 
## F-statistic: 10.48 on 4 and 54 DF,  p-value: 0.000002341

# Diagnósticos de multicolinealidad (VIF)
vif_values <- car::vif(modelo_cric)
cat("\n**Factores de Inflación de Varianza (VIF):**\n")

## 
## **Factores de Inflación de Varianza (VIF):**

print(vif_values)

## Concordar  Relievar   Inferir Construir 
##  2.722319  1.685902  2.682778  1.793137

cat("\n(VIF > 10 indica multicolinealidad problemática)\n")

## 
## (VIF > 10 indica multicolinealidad problemática)

5.1.2 Importancia Relativa de Predictores

# Calcular importancia relativa usando lmg
importancia <- calc.relimp(modelo_cric, type = "lmg", rela = TRUE)

# Crear tabla
importancia_tabla <- data.frame(
  Componente = names(importancia$lmg),
  Prom_Importancia = importancia$lmg * 100
) %>%
  arrange(desc(Prom_Importancia))

importancia_tabla %>%
  kable(caption = "Importancia Relativa de Componentes CRIC para Predecir Promedio ICFES",
        col.names = c("Componente CRIC", "Importancia (%)"),
        digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Importancia Relativa de Componentes CRIC para Predecir Promedio ICFES
	Componente CRIC	Importancia (%)
Inferir	Inferir	41.4
Concordar	Concordar	34.3
Construir	Construir	14.8
Relievar	Relievar	9.4

ggplot(importancia_tabla, aes(x = reorder(Componente, Prom_Importancia), 
                               y = Prom_Importancia)) +
  geom_col(fill = "#6D9EC1", alpha = 0.8) +
  geom_text(aes(label = sprintf("%.1f%%", Prom_Importancia)), 
            hjust = -0.2) +
  coord_flip() +
  ylim(0, max(importancia_tabla$Prom_Importancia) * 1.15) +
  labs(
    title = "¿Qué componente CRIC predice mejor Saber 11?",
    subtitle = paste("Análisis de Importancia Relativa -", mejor_eval_nombre),
    x = NULL,
    y = "Importancia Relativa (%)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Contribución relativa de componentes CRIC

6 Análisis Longitudinal

6.1 ¿El Crecimiento en PLC Predice Saber 11?

# Crear dataset longitudinal pivotado
datos_long_wide <- datos_longitudinal %>%
  select(IE, PLC, Global, Prom_ICFES) %>%
  pivot_wider(names_from = PLC, values_from = c(Global, Prom_ICFES)) %>%
  mutate(
    Delta_PLC = Global_Salida - Global_Entrada,
    Prom_ICFES_unico = coalesce(Prom_ICFES_Entrada, Prom_ICFES_Control, Prom_ICFES_Salida)
  ) %>%
  drop_na(Delta_PLC, Prom_ICFES_unico)

# Correlaciones longitudinales
cor_delta_saber <- cor.test(datos_long_wide$Delta_PLC, 
                             datos_long_wide$Prom_ICFES_unico)

cor_entrada_saber <- cor.test(datos_long_wide$Global_Entrada, 
                               datos_long_wide$Prom_ICFES_unico)

cor_salida_saber <- cor.test(datos_long_wide$Global_Salida, 
                              datos_long_wide$Prom_ICFES_unico)

# Tabla resumen
tabla_long <- data.frame(
  Predictor = c("PLC Entrada", "PLC Salida", "Δ PLC (Entrada→Salida)"),
  r = c(cor_entrada_saber$estimate, cor_salida_saber$estimate, cor_delta_saber$estimate),
  p = c(cor_entrada_saber$p.value, cor_salida_saber$p.value, cor_delta_saber$p.value)
) %>%
  mutate(
    R2 = r^2,
    Sig = ifelse(p < 0.001, "***", ifelse(p < 0.01, "**", ifelse(p < 0.05, "*", "ns")))
  )

tabla_long %>%
  kable(caption = "Análisis Longitudinal: ¿Qué Predice Mejor Saber 11?",
        col.names = c("Predictor", "r", "p-valor", "R²", "Sig"),
        digits = 3) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Análisis Longitudinal: ¿Qué Predice Mejor Saber 11?
Predictor	r	p-valor	R²	Sig
PLC Entrada	0.470	0.000	0.221	***
PLC Salida	0.441	0.001	0.195	***
Δ PLC (Entrada→Salida)	-0.174	0.200	0.030	ns

ggplot(datos_long_wide, aes(x = Delta_PLC, y = Prom_ICFES_unico)) +
  geom_point(alpha = 0.6, size = 3, color = "#6D9EC1") +
  geom_smooth(method = "lm", se = TRUE, color = "#E46726", fill = "#E46726", alpha = 0.2) +
  labs(
    title = "¿El Crecimiento en PLC Predice Saber 11?",
    subtitle = sprintf("Correlación Δ PLC → Saber: r=%.3f (p=%.3f)", 
                      cor_delta_saber$estimate, cor_delta_saber$p.value),
    x = "Δ PLC (Salida - Entrada)",
    y = "Promedio Saber 11"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Crecimiento PLC vs Saber 11

7 Análisis ISI: Factores Institucionales

7.1 Correlaciones ISI

# **CORRECCIÓN: Usar nombres de columnas corregidos**
# Los nombres ya fueron limpiados en el chunk 'cargar-datos'

# Variables ISI (después de limpiar nombres)
vars_isi <- c(
  "Categoría 1: Apropiación 2025",
  "Categoría 2: Vinculación Afectiva 2025",
  "Categoría 3: Implementación Correcta 2025",
  "Categoría 4: Integración Curricular 2025",
  "Categoría 5: Seguimiento y Retroalimentación 2025",
  "Categoría 6: Permanencia y Mejora Continua 2025"
)

# Verificar que las columnas existen
cat("Verificando columnas ISI:\n")

## Verificando columnas ISI:

for(var in vars_isi) {
  if(var %in% colnames(mejor_eval_data)) {
    cat("✓", var, "\n")
  } else {
    cat("✗", var, "NO ENCONTRADA\n")
  }
}

## ✓ Categoría 1: Apropiación 2025 
## ✓ Categoría 2: Vinculación Afectiva 2025 
## ✓ Categoría 3: Implementación Correcta 2025 
## ✓ Categoría 4: Integración Curricular 2025 
## ✓ Categoría 5: Seguimiento y Retroalimentación 2025 
## ✓ Categoría 6: Permanencia y Mejora Continua 2025

# Correlaciones ISI → PLC Global
datos_isi_plc <- mejor_eval_data %>%
  select(all_of(vars_isi), Global) %>%
  drop_na()

cor_isi_plc <- cor(datos_isi_plc[, vars_isi], datos_isi_plc$Global, 
                   use = "complete.obs")

cor_isi_plc_tabla <- data.frame(
  Categoria = gsub(" 2025", "", vars_isi),
  r = as.vector(cor_isi_plc)
) %>%
  arrange(desc(abs(r))) %>%
  mutate(
    Interpretacion = case_when(
      abs(r) < 0.3 ~ "Débil",
      abs(r) < 0.5 ~ "Moderada",
      abs(r) < 0.7 ~ "Fuerte",
      TRUE ~ "Muy fuerte"
    )
  )

cor_isi_plc_tabla %>%
  kable(caption = "Correlaciones: Categorías ISI → PLC Global",
        col.names = c("Categoría ISI", "r", "Magnitud"),
        digits = 3) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Correlaciones: Categorías ISI → PLC Global
Categoría ISI	r	Magnitud
Categoría 1: Apropiación	0.337	Moderada
Categoría 4: Integración Curricular	0.220	Débil
Categoría 6: Permanencia y Mejora Continua	0.215	Débil
Categoría 2: Vinculación Afectiva	0.202	Débil
Categoría 3: Implementación Correcta	-0.134	Débil
Categoría 5: Seguimiento y Retroalimentación	0.036	Débil

ggplot(cor_isi_plc_tabla, aes(x = reorder(Categoria, r), y = r)) +
  geom_col(fill = "#6D9EC1", alpha = 0.8) +
  geom_text(aes(label = sprintf("r=%.3f", r)), hjust = -0.2) +
  coord_flip() +
  ylim(min(cor_isi_plc_tabla$r) * 1.1, max(cor_isi_plc_tabla$r) * 1.1) +
  labs(
    title = "¿Qué Factores Institucionales Mejoran PLC?",
    subtitle = "Correlación Categorías ISI → PLC Global",
    x = NULL,
    y = "Correlación de Pearson (r)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text.y = element_text(size = 9)
  )

Impacto de categorías ISI en PLC

7.2 ISI → Saber 11

# Correlaciones ISI → Prom_ICFES
datos_isi_saber <- mejor_eval_data %>%
  select(all_of(vars_isi), Prom_ICFES) %>%
  drop_na()

cor_isi_saber <- cor(datos_isi_saber[, vars_isi], datos_isi_saber$Prom_ICFES, 
                     use = "complete.obs")

cor_isi_saber_tabla <- data.frame(
  Categoria = gsub(" 2025", "", vars_isi),
  r = as.vector(cor_isi_saber)
) %>%
  arrange(desc(abs(r))) %>%
  mutate(
    Interpretacion = case_when(
      abs(r) < 0.3 ~ "Débil",
      abs(r) < 0.5 ~ "Moderada",
      abs(r) < 0.7 ~ "Fuerte",
      TRUE ~ "Muy fuerte"
    )
  )

cor_isi_saber_tabla %>%
  kable(caption = "Correlaciones: Categorías ISI → Promedio ICFES",
        col.names = c("Categoría ISI", "r", "Magnitud"),
        digits = 3) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Correlaciones: Categorías ISI → Promedio ICFES
Categoría ISI	r	Magnitud
Categoría 1: Apropiación	0.226	Débil
Categoría 2: Vinculación Afectiva	0.149	Débil
Categoría 6: Permanencia y Mejora Continua	0.106	Débil
Categoría 4: Integración Curricular	0.088	Débil
Categoría 5: Seguimiento y Retroalimentación	0.077	Débil
Categoría 3: Implementación Correcta	-0.061	Débil

8 Modelo de Mediación: ISI → PLC → Saber

# Preparar datos para mediación
datos_mediacion <- mejor_eval_data %>%
  select(all_of(vars_isi), Global, Prom_ICFES) %>%
  drop_na() %>%
  mutate(ISI_Total = rowMeans(select(., all_of(vars_isi))))

# Modelo de mediación usando lavaan
modelo_sem <- '
  # Regresiones
  Global ~ a*ISI_Total
  Prom_ICFES ~ b*Global + c*ISI_Total
  
  # Efectos indirectos
  indirect := a*b
  total := c + a*b
'

# Ajustar modelo
fit_sem <- sem(modelo_sem, data = datos_mediacion)

# Resumen
summary(fit_sem, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 1 iteration
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         5
## 
##   Number of observations                            59
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Model Test Baseline Model:
## 
##   Test statistic                                34.611
##   Degrees of freedom                                 3
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -359.512
##   Loglikelihood unrestricted model (H1)       -359.512
##                                                       
##   Akaike (AIC)                                 729.024
##   Bayesian (BIC)                               739.412
##   Sample-size adjusted Bayesian (SABIC)        723.688
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value H_0: RMSEA <= 0.050                       NA
##   P-value H_0: RMSEA >= 0.080                       NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Global ~                                                              
##     ISI_Total  (a)    0.846    0.587    1.440    0.150    0.846    0.184
##   Prom_ICFES ~                                                          
##     Global     (b)    0.384    0.059    6.476    0.000    0.384    0.651
##     ISI_Total  (c)    0.007    0.272    0.026    0.979    0.007    0.003
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Global           56.969   10.489    5.431    0.000   56.969    0.966
##    .Prom_ICFES       11.806    2.174    5.431    0.000   11.806    0.576
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     indirect          0.325    0.231    1.406    0.160    0.325    0.120
##     total             0.332    0.350    0.949    0.343    0.332    0.123

# Extraer parámetros
params_sem <- parameterEstimates(fit_sem, standardized = TRUE) %>%
  filter(op == ":=") %>%
  select(label, est, se, pvalue, std.all)

params_sem %>%
  kable(caption = "Análisis de Mediación: ISI → PLC → Saber",
        col.names = c("Efecto", "Estimación", "SE", "p-valor", "β estandarizado"),
        digits = 3) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Análisis de Mediación: ISI → PLC → Saber
Efecto	Estimación	SE	p-valor	β estandarizado
indirect	0.325	0.231	0.160	0.120
total	0.332	0.350	0.343	0.123

Interpretación:

Efecto indirecto: ISI mejora Saber 11 a través de PLC
Efecto total: Suma del efecto directo + indirecto
Si el efecto indirecto es significativo → PLC media la relación ISI-Saber

9 Síntesis de Hallazgos

9.1 Hallazgo 1: Todas las Evaluaciones PLC Predicen Saber 11

cat("### **Comparación de Poder Predictivo:**\n\n")

9.1.1 Comparación de Poder Predictivo:

comparacion_eval %>%
  mutate(Ranking = row_number()) %>%
  select(Ranking, Evaluacion, r, R2, Interpretacion) %>%
  kable(col.names = c("#", "Evaluación", "r", "R²", "Magnitud"), digits = 3) %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE) %>%
  print()

#	Evaluación	r	R²	Magnitud
1	Entrada (inicio año)	0.391	0.153	Moderada
2	Control (mitad año)	0.651	0.424	Fuerte
3	Salida (final año)	0.463	0.215	Moderada

cat("\n**Conclusiones clave:**\n\n")

Conclusiones clave:

if (comparacion_eval$r[comparacion_eval$Evaluacion == "Entrada (inicio año)"] > 0.4) {
  cat("✅ **PLC Entrada ya tiene valor predictivo significativo** → Permite intervenciones tempranas\n")
}
if (comparacion_eval$r[comparacion_eval$Evaluacion == "Salida (final año)"] > 
    comparacion_eval$r[comparacion_eval$Evaluacion == "Entrada (inicio año)"]) {
  cat("📈 **La predicción mejora a lo largo del año** → El aprendizaje acumulado fortalece la correlación\n")
}

📈 La predicción mejora a lo largo del año → El aprendizaje acumulado fortalece la correlación

9.2 Hallazgo 2: Componentes PLC con Mayor Poder Predictivo

# Resumen de importancia relativa
cat("### **Ranking de Componentes PLC (Evaluación: ", mejor_eval_data$PLC[1], "):**\n\n")

9.2.1 Ranking de Componentes PLC (Evaluación: Control ):

cat("**Para Promedio ICFES:**\n\n")

Para Promedio ICFES:

importancia_tabla %>%
  arrange(desc(Prom_Importancia)) %>%
  mutate(Ranking = row_number()) %>%
  select(Ranking, Componente, Prom_Importancia) %>%
  kable(col.names = c("#", "Componente", "Importancia (%)"), digits = 1) %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE) %>%
  print()

	#	Componente	Importancia (%)
Inferir	1	Inferir	41.4
Concordar	2	Concordar	34.3
Construir	3	Construir	14.8
Relievar	4	Relievar	9.4

9.3 Hallazgo 3: Análisis Longitudinal

cat("### **¿El Crecimiento en PLC Predice Saber 11?**\n\n")

9.3.1 ¿El Crecimiento en PLC Predice Saber 11?

tabla_long %>%
  arrange(desc(abs(r))) %>%
  mutate(
    Ranking = row_number(),
    Resumen = sprintf("%s: r=%.3f (R²=%.1f%%) %s", 
                     Predictor, r, R2*100, Sig)
  ) %>%
  select(Ranking, Resumen) %>%
  kable(col.names = c("#", "Predictor → Promedio ICFES")) %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE) %>%
  print()

#	Predictor → Promedio ICFES
1	PLC Entrada: r=0.470 (R²=22.1%) ***
2	PLC Salida: r=0.441 (R²=19.5%) ***
3	Δ PLC (Entrada→Salida): r=-0.174 (R²=3.0%) ns

cat("\n**Conclusión:** El ", tabla_long$Predictor[which.max(abs(tabla_long$r))], 
    " es el mejor predictor.\n")

Conclusión: El PLC Entrada es el mejor predictor.

9.4 Categorías ISI con Mayor Impacto

cat("### **Ranking de Categorías ISI:**\n\n")

9.4.1 Ranking de Categorías ISI:

cat("**Impacto en PLC Global:**\n\n")

Impacto en PLC Global:

cor_isi_plc_tabla %>%
  mutate(Ranking = row_number()) %>%
  select(Ranking, Categoria, r, Interpretacion) %>%
  kable(col.names = c("#", "Categoría ISI", "r", "Magnitud"), digits = 3) %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE) %>%
  print()

#	Categoría ISI	r	Magnitud
1	Categoría 1: Apropiación	0.337	Moderada
2	Categoría 4: Integración Curricular	0.220	Débil
3	Categoría 6: Permanencia y Mejora Continua	0.215	Débil
4	Categoría 2: Vinculación Afectiva	0.202	Débil
5	Categoría 3: Implementación Correcta	-0.134	Débil
6	Categoría 5: Seguimiento y Retroalimentación	0.036	Débil

cat("\n\n**Impacto en Saber 11:**\n\n")

Impacto en Saber 11:

cor_isi_saber_tabla %>%
  mutate(Ranking = row_number()) %>%
  select(Ranking, Categoria, r, Interpretacion) %>%
  kable(col.names = c("#", "Categoría ISI", "r", "Magnitud"), digits = 3) %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE) %>%
  print()

#	Categoría ISI	r	Magnitud
1	Categoría 1: Apropiación	0.226	Débil
2	Categoría 2: Vinculación Afectiva	0.149	Débil
3	Categoría 6: Permanencia y Mejora Continua	0.106	Débil
4	Categoría 4: Integración Curricular	0.088	Débil
5	Categoría 5: Seguimiento y Retroalimentación	0.077	Débil
6	Categoría 3: Implementación Correcta	-0.061	Débil

10 Conclusiones y Recomendaciones

10.1 ¿Qué hace exitoso a PLC?

1. Validez Predictiva en Todas las Etapas del Año:

mejor_r <- max(comparacion_eval$r)
mejor_eval_nombre <- comparacion_eval$Evaluacion[which.max(comparacion_eval$r)]

cat(sprintf("- PLC correlaciona con Saber 11 **desde la primera evaluación** (Entrada: r=%.3f)\n", 
            cor_entrada_saber$estimate))

PLC correlaciona con Saber 11 desde la primera evaluación (Entrada: r=0.470)

cat(sprintf("- La correlación **se fortalece** hacia el final del año (%s: r=%.3f)\n", 
            mejor_eval_nombre, mejor_r))

La correlación se fortalece hacia el final del año (Control (mitad año): r=0.651)

cat(sprintf("- **R² = %.1f%%** supera correlaciones típicas entre pruebas estandarizadas (9-16%%)\n",
            max(comparacion_eval$R2)*100))

R² = 42.4% supera correlaciones típicas entre pruebas estandarizadas (9-16%)

2. Arquitectura de Habilidades Balanceada:

PLC no depende de un solo componente, sino de la interacción entre las 4 habilidades CRIC
Los componentes con mayor peso predictivo son: (se completa con resultados del análisis)
Esto valida el marco conceptual del modelo CRIC

3. El Crecimiento Importa (o No):

if (abs(cor_delta_saber$estimate) > 0.2 & cor_delta_saber$p.value < 0.05) {
  cat("✅ **El crecimiento en PLC tiene valor predictivo significativo**\n")
  cat(sprintf("   - Δ PLC (Entrada→Salida) correlaciona con Saber: r=%.3f\n", cor_delta_saber$estimate))
  cat("   - Instituciones que mejoran más en PLC tienden a tener mejor Saber 11\n")
  cat("   - Implicación: Intervenciones que generan crecimiento son efectivas\n")
} else {
  cat("⚠️  **El nivel final importa más que la trayectoria de crecimiento**\n")
  cat(sprintf("   - Δ PLC (Entrada→Salida) tiene correlación débil con Saber: r=%.3f\n", cor_delta_saber$estimate))
  cat("   - Lo que importa es el **nivel alcanzado**, no tanto la mejora relativa\n")
  cat("   - Implicación: Priorizar instituciones con bajo nivel inicial, independiente de su potencial de crecimiento\n")
}

⚠️ El nivel final importa más que la trayectoria de crecimiento - Δ PLC (Entrada→Salida) tiene correlación débil con Saber: r=-0.174 - Lo que importa es el nivel alcanzado, no tanto la mejora relativa - Implicación: Priorizar instituciones con bajo nivel inicial, independiente de su potencial de crecimiento

4. Factores Institucionales de Implementación (ISI):

Las categorías ISI con mayor impacto son: (completar con resultados)
Esto indica que la calidad de implementación institucional es tan importante como el contenido de la prueba

5. Modelo de Mediación:

El ISI actúa principalmente a través de PLC (efecto indirecto) para mejorar Saber 11
Esto confirma que PLC es el mecanismo causal entre buenas prácticas institucionales y resultados estandarizados

10.2 Recomendaciones Estratégicas

10.2.1 Para Instituciones Educativas:

Usar PLC Entrada como diagnóstico temprano:
- PLC Entrada ya predice Saber 11 significativamente
- Implementar planes de intervención desde el primer trimestre para instituciones con bajo desempeño
Fortalecer componentes débiles:
- Priorizar trabajo en (componente con menor puntuación promedio en análisis)
- Diseñar talleres focalizados para docentes
Monitorear trayectorias de crecimiento:

if (abs(cor_delta_saber$estimate) > 0.2 & cor_delta_saber$p.value < 0.05) {
  cat("   - El crecimiento PLC predice Saber → Incentivar mejora continua\n")
  cat("   - Reconocer instituciones con mayor ganancia Entrada→Salida\n")
} else {
  cat("   - El nivel final importa más que el crecimiento → Priorizar alcanzar umbrales mínimos\n")
  cat("   - Intervenciones intensivas en instituciones de bajo nivel inicial\n")
}

El nivel final importa más que el crecimiento → Priorizar alcanzar umbrales mínimos
Intervenciones intensivas en instituciones de bajo nivel inicial

Implementación sistemática:
- Mejorar categorías ISI: (categorías con menor puntuación)
- Seguimiento trimestral de indicadores ISI

10.2.2 Para la Fundación:

Sistema de Alertas Tempranas (Basado en PLC Entrada):
- Identificar instituciones en riesgo desde marzo
- Asignar recursos de apoyo prioritario
Refinar instrumentos:
- Revisar ítems de (componente con menor poder predictivo)
- Aumentar confiabilidad de escalas con menor consistencia
Capacitación focalizada:
- Entrenar instituciones en (categoría ISI con menor correlación)
- Crear programa de mentorías inter-institucionales
Dashboard de Seguimiento:
- Implementar visualización en tiempo real de:
  - Evolución PLC (Entrada → Control → Salida)
  - Predicción de Saber 11 basada en PLC actual
  - Benchmarking entre instituciones

10.2.3 Para Investigación Futura:

Análisis longitudinal multi-año:
- Comparar cohortes 2024 vs 2025 vs 2026
- Identificar si patrones se mantienen estables
Segmentación:
- Analizar si patrones difieren por:
  - Tipo de institución (pública/privada)
  - Región (urbano/rural)
  - Nivel socioeconómico
- ¿El crecimiento PLC predice mejor en ciertas poblaciones?
Validación con PISA for Schools:
- Cuando datos estén disponibles, replicar análisis completo
- Confirmar si patrones PLC→Saber se mantienen para PLC→PISA
Modelamiento predictivo avanzado:
- Machine Learning para predicción temprana de Saber basada en PLC Entrada
- Identificar perfiles institucionales de alto riesgo
Análisis de intervenciones:
- Diseño cuasi-experimental para evaluar efecto causal de mejoras en categorías ISI
- ¿Mejorar Cat3 (Implementación) causa mejoras en PLC y Saber?

11 Referencias

Fundación Social Alberto Merani (2025). Vademécum PLC 2025
ICFES (2025). Resultados Saber 11 2025
OECD (2022). PISA 2022 Results

12 Apéndices

12.1 Código de Reproducción

Todo el código utilizado en este análisis está disponible en los bloques de R de este documento. Para reproducir:

Actualizar ruta del archivo Excel en chunk cargar-datos
Ejecutar knitr::knit("Analisis_Factores_Exito_PLC_2025.Rmd")

12.2 Información de Sesión

sessionInfo()

## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.2
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Bogota
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] lavaan_0.6-21    Hmisc_5.2-5      gridExtra_2.3    GGally_2.4.0    
##  [5] kableExtra_1.4.0 knitr_1.50       relaimpo_2.2-7   mitools_2.4     
##  [9] survey_4.4-8     survival_3.8-3   Matrix_1.7-4     boot_1.3-32     
## [13] MASS_7.3-65      car_3.1-3        carData_3.0-5    psych_2.5.6     
## [17] corrplot_0.95    readxl_1.4.5     lubridate_1.9.4  forcats_1.0.1   
## [21] stringr_1.5.2    dplyr_1.1.4      purrr_1.1.0      readr_2.1.5     
## [25] tidyr_1.3.1      tibble_3.3.0     ggplot2_4.0.0    tidyverse_2.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1   viridisLite_0.4.2  farver_2.1.2       S7_0.2.0          
##  [5] fastmap_1.2.0      digest_0.6.37      rpart_4.1.24       timechange_0.3.0  
##  [9] lifecycle_1.0.4    cluster_2.1.8.1    magrittr_2.0.4     compiler_4.5.2    
## [13] rlang_1.1.6        sass_0.4.10        tools_4.5.2        yaml_2.3.10       
## [17] data.table_1.17.8  labeling_0.4.3     htmlwidgets_1.6.4  mnormt_2.1.1      
## [21] xml2_1.4.0         RColorBrewer_1.1-3 abind_1.4-8        withr_3.0.2       
## [25] foreign_0.8-90     stats4_4.5.2       nnet_7.3-20        colorspace_2.1-2  
## [29] scales_1.4.0       cli_3.6.5          rmarkdown_2.30     generics_0.1.4    
## [33] rstudioapi_0.17.1  tzdb_0.5.0         DBI_1.2.3          cachem_1.1.0      
## [37] splines_4.5.2      parallel_4.5.2     cellranger_1.1.0   base64enc_0.1-3   
## [41] vctrs_0.6.5        jsonlite_2.0.0     hms_1.1.3          Formula_1.2-5     
## [45] htmlTable_2.4.3    systemfonts_1.2.3  jquerylib_0.1.4    glue_1.8.0        
## [49] ggstats_0.11.0     stringi_1.8.7      gtable_0.3.6       quadprog_1.5-8    
## [53] pillar_1.11.1      htmltools_0.5.8.1  R6_2.6.1           textshaping_1.0.3 
## [57] pbivnorm_0.6.0     evaluate_1.0.5     lattice_0.22-7     backports_1.5.0   
## [61] corpcor_1.6.10     bslib_0.9.0        Rcpp_1.1.0         svglite_2.2.1     
## [65] nlme_3.1-168       checkmate_2.3.3    mgcv_1.9-3         xfun_0.53         
## [69] pkgconfig_2.0.3

Documento generado automáticamente el 05/02/2026 a las 23:04

Factores de Éxito del Sistema PLC: Análisis Multivariado 2025

¿Qué hace que PLC sea el mejor predictor de Saber 11 y PISA en Colombia?

Fundación Social Alberto Merani - Área de Evaluación

05 de February de 2026