Introduction

I’ve always had a deep appreciation for education, instilled by my father who consistently reminded me to be grateful for my education and the importance of educating the youth to create a better society for each generation to come. My interest in sustainability has grown over the years as the environmental situation has become more and more dire. I wanted to find a topic that combined both of these interests and attempted to uncover relationships between education and environmental variables across the world.

My dataset features over 3200 observations and 54 variables both categorical and quantitative. The data was obtained from Kaggle and was created using data exclusively from the WorldBank and the UN. The data spans the years of 2000 to 2018 and tracks 173 countries against sustainability metrics. My analysis focuses on 7 educational variables and 7 environmental variables.

Educational variables:

  1. Out of School (% Primary Age). Percentage of primary-aged children not enrolled in school. Higher values indicate barriers to basic education—linked to lower future awareness of health, environment, and civic issues.
  2. Compulsory Education (Years). Legal minimum number of years children must attend school.
  3. Primary Completion Rate (% of Age Group). Percentage of students completing the final grade of primary school. Indicates sustained engagement in school; a proxy for basic literacy.
  4. Pre-primary Enrollment (% Gross).Percentage of children enrolled in early childhood education.
  5. Primary Enrollment (% Gross). Percentage of children enrolled in primary school (regardless of age). Measures access to the most basic level of education.
  6. Secondary Enrollment (% Gross).Percentage of children enrolled in secondary education (regardless of age) Reflects more advanced education, linked to critical thinking, civic engagement, and environmental literacy.
  7. Pupil-Teacher Ratio (Primary). Number of students per primary school teacher. Lower ratios imply higher teaching quality.

Environmental variables:

  1. Adjusted Net Savings (Excl. Particulate, % of GNI). Net savings after accounting for physical and natural capital losses, excluding particulate pollution. Positive values suggest sustainable investment; negative values signal environmental degradation or unsustainable economics.
  2. CO2 Damage (% of GNI). Estimated economic damage from carbon emissions. Reflects the burden of climate-related costs; high values imply weak environmental policies or high fossil fuel reliance.
  3. Resource Depletion (% of GNI). Cost of depleting natural resources like oil, minerals, or forests. Indicates whether economic growth is coming at the expense of natural capital; lower is better.
  4. Particulate Emission Damage (% of GNI). Cost of health and productivity losses from air pollution.
  5. Net Forest Depletion (% of GNI). Economic cost of unsustainable forest harvesting. Reflects deforestation beyond natural regrowth; tied to logging, agriculture, and regulatory capacity.
  6. Renewable Energy Consumption (% of Total Final Energy). Share of renewables in overall energy use. Higher values signal cleaner national energy portfolios.
  7. Renewable Electricity Output (% of Total Electricity).Percentage of electricity generated from renewable sources. Measures green energy development.

Now let’s dive into some analysis!

1. Secondary School Enrollment v. Rewnewable Energy Consumption: Bubble Scatter Plot

Do More Educated Populations Adopt More Renewable Energy?

This bubble scatter plot shows the relationship between secondary school enrollment and renewable energy consumption with the bubble size correlating with population size. There is an inverse relationship between secondary school enrollment and rewnewable energy consumption as we see from the downward sloping trendline. This suggests that countries with higher education levels are associated with a lower use of renewable energy. Countries with the highest renewable energy consumption tend to be less populous, while low renewable energy consumers span all sizes of population. So, do more educated populations adopt more renewable energy? No, this data suggests that more educated populations don’t show a strong tendency to consume more renewable energy. This could potentially be because more industrialized (and educated) countries still rely heavily on fossil fuels.

2. Envrionmental & Edcuational Variable Correlation Matrix Heatmap

What environmental and educational variables are highly correlated?

This is a Lower Triangle Correlation Heatmap that displays the correlation coefficients between educational and environmental variables in 2015. Color intensity and numeric labels represent the strength and direction of relationships. Red signifies a strong positive correlation while blue signifies a strong negative correlation.

Three key positive relationships are evident. The 0.84 correlation between particulate damage & pupil-teacher ratio suggests that countries with more crowded classrooms (lower teaching quality) experience greater economic damage from air pollution. The 0.71 correlation between particulate damage & out-of-school rate implies that places where more children are out of school, air pollution tends to be worse economically. The 0.55 forest depletion & pupil-teacher ratio suggests that poorer education quality may be associated with the cost of unsustainable forest harvesting.

Five key negative relationships are evident. The -0.84 correlation between particulate damage & secondary enrollment implies that countries with higher secondary school enrollment suffer less from particulate pollution. The -0.81 correlation between particulate damage & primary completion as well as the -0.70 correlation between particulate damage & pre-primary enrollment show a similar trend, when more students complete primary school (or are enrolled in pre-primary school), the cost due to air pollution damage is lower. The -0.52 correlation between forest depletion & secondary enrollment alongside the -0.52 correlation between forest depletion & primary completion indicate that higher school participation is associated with less unsustainable forest use that negatively impacts savings.

This suggests lack of access to education may reduce environmental awareness or capacity. Thus, education may lead to cleaner energy use and better policies.

3. Pupil-Teacher Ratio vs Environmental Investment: Scatterplot

Is quality of education (smaller class sizes) related to sustainability awareness or investment?

## `geom_smooth()` using formula = 'y ~ x'

This is a scatterplot that analyzes the relationship between pupil-teacher ratio and environmental investment score (negative values, higher the better as it signifies less damage). The environmental investment score is made up of 3 environmental variables: economic damage from natural resource depletion, CO2 emissions, and particulate emissions. The downward-sloping trend line shows that higher pupil-teacher ratios (larger class sizes) are associated with lower environmental investment. This implies that countries with overcrowded classrooms tend to underinvest in environmental sustainability.While the relationship isn’t extremely strong (there’s some spread), the negative slope and confidence band suggest a statistically meaningful pattern. So, is quality of education (smaller class sizes) related to sustainability awareness or investment? Yes, to some extent. This data supports the idea that better quality education (smaller class sizes) is positively associated with higher environmental investment.

4. Compulsory Enrollment and Renewable Energy Consumption by Iccome Level: Box Plot

How do compulsory education levels and renewable energy consumption vary across income groups?

These are side-by-side boxplots that show the relationship of compulsory education and renewable energy consumption by income level. We see a clear increase in the value of compulsory education as income level increases with a plateau between upper-middle and high income. We also can see a clear decrease in renewable energy consumption as income level increases with low-income countries having a far larger median compared to lower-middle, upper-middle, and high income countries. Despite greater wealth and resources, high-income countries rely less on renewables, likely due to legacy fossil fuel infrastructure. So, how do compulsory education levels and renewable energy consumption vary across income groups? Compulsory education tends to increase with income level, reflecting stronger educational institutions and access. Renewable energy consumption decreases with income. Low-income nations use more renewable energy (out of necessity and limited access to fossil fuels/modern grids), while wealthier countries still depend on non-renewables despite having the capacity to invest in clean tech. Thus, wealth brings better education, but not necessarily cleaner energy use.

5. Enrollment Level vs. Renewable Electricity Over Time: Line Chart (Time Series)

Are more educated countries expanding rewnewable power?

These line plots compare renewable electricity output over time by enrollment group (high vs low secondary school enrollment globally). On a global level, high enrollment countries (≥80% secondary enrollment) show a steady increase in renewable electricity output over time, rising from ~25% in 2000 to 28% in 2015. Low enrollment countries consistently produce a higher percentage of renewable electricity, averaging around 43–45%. While less-educated countries may currently use more renewable energy, more-educated countries are either steady in renewable electricity output or slightly increasing. In Africa, South America, and North America, low enrollment countries dominate renewable electricity output, likely due to natural resource availability (hydroelectric in Africa). It’s important to note that Europe doesn’t have a “low enrollment” trend because of all its secondary enrollment rates are greater than 80%. So, are more educated countries expanding rewnewable power? Yes, there is a slight upward shift on a global level, however renewable eletricity output has remained relatively the same across continents in from 2000-2015.

6. CO2 Damage v. Women in Parliment by Secondary Enrollment: Stacked Bar Chart

Does gender equality in education/policy leadership improve environmental sustainability?

This visualization is a stacked bar chart that shows average CO2 economic damage broken down by proportion of women in parliament, with stacks representing girls’ secondary school enrollment levels. As the proportion of women in parliament increases, the total CO2 damage decreases. Across all categories, countries with the lowest female secondary enrollment (<50%) and 50–80% contribute the highest portions to CO2 damage. The 20-30% women in parliament group has the highest economic CO2 damage. These are likely developed and industrial countries who aren’t developed enough to have environmental policy to reduce CO2 economic damage. The 30–50% women in parliament group not only has the lowest total CO2 damage, but also a more balanced share across education levels. Countries where girls’ secondary enrollment is 100%+ consistently contribute smaller portions to CO₂ damage in each group.This implies that when female education and political leadership are both strong, environmental outcomes improve. So, does gender equality in education/policy leadership improve environmental sustainability? Yes. This supports the narrative that investing in girls’ education and empowering women in governance leads to more sustainable environmental policies and outcomes.

7. Primary Children Out of School vs. Particulate Damage: Stacked Area Chart

Do higher out-of-school rates correlate with higher environmental harm?

This is a stacked area chart that shows the change in average particulate damage over time, segmented by percent of children out of school.There is a clear inverse relationship, countries with higher percentages of out-of-school children (especially 30–50% and 50%+ groups) have substantially higher levels of particulate emission damage. Over time, as education improves (fewer children out of school), overall particulate damage declines sharply, especially post-2007. Countries with fewer than 10% of primary-age children out of school consistently show the lowest particulate damage, with little fluctuation over time, these are likely countries with strong infrastructure and effective environmental and educational systems. Countries with better educational access for children tend to experience lower environmental harm, as reflected by lower particulate emission damage. Thus, higher out-of-school rates correlate with higher environmental degradation.

  1. Shiny App
cat('<iframe src="https://kennycodez.shinyapps.io/pshiny1/" width="100%" height="600px"></iframe>')
## <iframe src="https://kennycodez.shinyapps.io/pshiny1/" width="100%" height="600px"></iframe>
tags$iframe(
  src = "https://kennycodez.shinyapps.io/pshiny1/", 
  width = "100%", 
  height = "800px", 
  frameborder = "0"
)
  1. Shiny App
cat('<iframe src="https://yourusername.shinyapps.io/yourappname/" width="100%" height="600px"></iframe>')
## <iframe src="https://yourusername.shinyapps.io/yourappname/" width="100%" height="600px"></iframe>

Conclusion

We have gained key insights on the relationship between environmental and educational variables and their implications globally. From this analysis we have found that more educated populations don’t show a strong tendency to consume more renewable energy and what educational and environmental variables are highly correlated. We see that better quality education (smaller class sizes) is positively associated with higher environmental investment and that wealth brings better education, but not necessarily cleaner energy use. There is a global upward shift towards renewable power, but on a continent level rates have stayed approximately the same. We found that investing in girls’ education and empowering women in governance leads to more lower economic CO2 damage. We also see that higher out-of-school rates correlate with higher environmental degradation.

In creating this visualizations, I kept accessibility in mind and used color palettes and combinations that are made for color-blind individuals. I kept Schwabish’s 5 principles of a good data visualization in mind. 1) Show the data in the clearest/most purposeful way: I chose visualizations that correctly fit the data I was trying to interpret. 2) Reduce the clutter: I filtered the data and ensured graphs were clean. 3) Integrate the graphics and texts: I made sure to analyze each visualzation and make a key takeway from the data. 4) Use a small-multiples approach: I have visualizations that are side by side comparisons and more to easily examine differences in the graphs. 5) Start everything with gray: I also kept the viewer in mind and prioritizes objective data interpretation.

In general, we can see that education in many instances can positively impact the environment and mitigate problems stemming from lower levels of education. However, education is solely one of the multitude of variables that impact the environment. Sometimes there is a correlation, sometimes there isn’t. It’s important to remain critical and aware of the growing environmental crisis and acknowledge the components we can control to help save the earth we know and love.

  1. Forest depletion vs. pupil teacher ratio
# Prepare and clean data
hex_data <- sustainability %>%
  filter(
    !is.na(`Pupil.teacher.ratio..primary...SE.PRM.ENRL.TC.ZS`),
    !is.na(`Adjusted.savings..net.forest.depletion....of.GNI....NY.ADJ.DFOR.GN.ZS`)
  ) %>%
  mutate(
    PupilTeacher = `Pupil.teacher.ratio..primary...SE.PRM.ENRL.TC.ZS`,
    Forest_Depletion = `Adjusted.savings..net.forest.depletion....of.GNI....NY.ADJ.DFOR.GN.ZS`
  )

# Create hexbin plot
ggplot(hex_data, aes(x = PupilTeacher, y = Forest_Depletion)) +
  geom_hex(bins = 30) +
  scale_fill_viridis_c(option = "C") +
  labs(
    title = "Pupil-Teacher Ratio vs Forest Depletion",
    subtitle = "Each hex shows density of countries",
    x = "Pupil-Teacher Ratio (Primary)",
    y = "Forest Depletion (% of GNI)",
    fill = "Country Count"
  ) +
  theme_minimal()

Project Accurately addresses accesibility concerns and makes some efforts towards accesibility even if the work is not perfectly accessible across CVD, screen readers etc