The following map shows the distribution of a infrastructure quality index in São Paulo.
School Infrastructure Data
School-level data was obtained from the 2024 School Census, accessed via the Base dos Dados organization’s datalake. The analysis focuses on active schools with reported high school enrollment.
Show Code
# Set the billing ID and the municipality codebasedosdados::set_billing_id('vinicius-projetos-r')mun_code <-'3550308'# Query to download infrastructure data from the School Censuscensus_query =paste("SELECT id_escola, esgoto_rede_publica, agua_potavel, energia_inexistente, banheiro_pne, biblioteca, cozinha, laboratorio_ciencias, laboratorio_informatica, quadra_esportes, sala_diretoria, sala_leitura, sala_atendimento_especial, equipamento_computador, equipamento_copiadora, equipamento_impressora, equipamento_tv, banda_largaFROM `basedosdados.br_inep_censo_escolar.escola`WHERE ano = 2024 AND id_municipio = '", mun_code,"' AND tipo_situacao_funcionamento = '1' AND etapa_ensino_medio = 1", sep="")census_base =as.data.frame(basedosdados::read_sql(query =gsub("\\s+", " ", census_query)))for (col inc(2:dim(census_base)[2])){ census_base[,col] =as.numeric(census_base[,col])}
Geolocation
School locations were determined by their latitude and longitude coordinates, sourced from Inep’s Catálogo das Escolas (School Catalog).
A composite infrastructure index was built for each school using a 2-Parameter Logistic Item Response Theory (IRT) model. The final scores were then standardized to a scale with a mean of 50 and a standard deviation of 10. A higher index value corresponds to a better school infraestructure quality. The index incorporates
Official geographic boundaries for municipality and IBGE’s Weighting Areas were sourced using the R package geobr. The value shown for each region is the mean of the schools’ index located whitin that Weighting Area.
Show Code
estate =lookup_muni(code_muni =as.character(mun_code))$name_statemuni =lookup_muni(code_muni =as.character(mun_code))$name_muni# Get municipalitie's boundaries muni_boundary =read_municipality(code_muni = mun_code)# Get weighting area's boundariesweighting_areas =read_weighting_area() %>%filter(code_muni == mun_code)# Join tablesschools_in_areas <-st_join(final_base_sf, weighting_areas, join = st_intersects)# Calculate the mean index of each weighting areaareas_with_index <-left_join( weighting_areas, schools_in_areas %>%st_drop_geometry() %>%group_by(code_weighting) %>%summarise(mean_area =mean(infra_index, na.rm =TRUE),n_schools =n() ) %>%ungroup(),by ='code_weighting')
Maps
There is a greater concentration of schools with better infrastructure in São Paulo’s central areas, whereas the worst-performing averages are located on the city’s periphery. The map illustrates that the drop in infrastructure quality is often gradual.
Show Code
min_val <-min(areas_with_index$mean_area, na.rm =TRUE)max_val <-max(areas_with_index$mean_area, na.rm =TRUE)ggplot() +geom_sf(data = areas_with_index, aes(fill=mean_area), color ="white", linewidth =0.2) +scale_fill_distiller(name ="",palette ="PRGn",direction =1,na.value ="grey",limits =c(min_val, max_val),breaks =c(min_val, max_val),labels =label_number(accuracy =1)) +geom_sf(data = muni_boundary, fill =NA, color ="white", linewidth =0.5) +labs(title ="The Quality of School Infrastructure",subtitle =paste("Distribution of infrastructure levels in", muni),caption ="Fonte: Censo Escolar 2024; Base dos Dados." ) +theme_void() +theme(plot.title =element_text(family ="pf", size =100, face ="bold", color ="#222222", hjust =0.5),plot.subtitle =element_text(family ="pf", size =50, color ="#555555", hjust =0.5),legend.text =element_text(family ="pf", size =45),plot.caption =element_text(family ="pf", size =40, hjust =0.5),legend.position ='bottom',legend.key.width =unit(1.8, "cm"),legend.key.size=unit(0.7, "cm") )