This Post Adresses 2 Main Questions:
What proportion of Sustainable Development Goal (SDG) Indicator Data is missing for each Small Island Developing State (SIDS)?
How recent is the available data?
1. Missing Data
In the na_heat_map.Rmd
- Data is read in, one goal at a time
- The data is cleaned
- The median value is calculated for each SDG Indicator and SIDS
- The data is joined
- A heat map is created visualizing Missing SDG Data for SIDS, (measured in % of Missing Data per Goal)
This is the result
f <- pheatmap(sdg_na_percentages_matrix, treeheight_row = 0, treeheight_col = 0, display_numbers = TRUE, fontsize_number = 6, angle_col = 45, na_col = "red", main = "Missing SDG Data for SIDS, (measured in % of Missing Data per Goal)")

Visualization Summary
- How to interpret
- When looking at the top left box, we see that Trinidad and Tobago is missing 29% of the indicator data for SDG 7
- Goal 7 is to Ensure Access to affordable, reliable, sustainable and modern energy for all
- Goal 7 has six indicators including
- Proportion of population with access to electricity, Renewable energy share in the total final energy consumption, etc..
- Trinidad and Tobago has data for 4 of the 6 indicators
- Overall Picture
- A significant amount of data is missing
- Many SIDS are missing data for over 80% of the indicators for some SDG’s
- Goal 7 has the least amount of missing data
- Goal 5, Gender Equality, has the most amount of missing data
- Goal 3, Good Health and Well-Being, and Goal 11, Sustainable Cities and Communities seem to have the most variance in misssing data among SIDS
- New Caledonia, French Polynesia, American Samoa, Guam, The Norhtern Mariana Islands, Sint Maarten, The U.S Virgin Islands, Bonaire Sint Eustatius and Saba, Aruba Curracao, The British Virgin Islands, Montserrat, Anguilla, Puerto Rico, and Niue have the most missing data across all SDG’s
- Limitations
- This visulaization does not demonstrate:
- The year the data was collected
- If data was collected for different ages, genders, and geographic contexts
- The quality of the data
2. Most Recent Year Data was Collected
In the time.Rmd
- Data is read in, one goal at a time
- Data is cleaned
- The most recent year of data is selected for each SIDS and Indicator
- The data is visualized to show us the quantity of the most recent data, by year, available for each SIDS
This is the result
circular_stacked_bar_chart

Visulization Summary
- How to interpret
- This is not the case, but if every SIDS had data for every indicator goal for 2021, every bar would be the same size (537 Data Points) and same bright yellow color throughout
- Since some SIDS have more missing data than others, and the most recent year of data for each SIDS/Indicator varies, we see variation in colors and lengths of bars
- Each line in the circle represents 50 data points
- One of the SIDS with the least amount of missing indicator data is Mauritius which has data for ~ 350 of the 537 indicators. This can be validated in the heat map above since Mauritius has light blue/yellow colors throughout the goals, representing small percentages of missing data.
- Overall Picture
- Almost every SIDS has some indicators where the most recent data is from ~ 2005 (purple)
- Almost every SIDS has some indicators where the most recent data is from 2021 (bright yellow)
- The majority of the most recent available data is from within the last 5 years
- Limitations
- This visualization does not break down data by goal
Thanks for Reading!
Check out the github repo for this project for more information, and the application of some machine learning techniques to this data.