For this section, I selected the Human Development Index (HDI) dataset from 1990 to 2022, sourced from the worldhdi repository. The data includes HDI values for countries and regions worldwide over time. The goal is to explore trends in human development, identify countries with the greatest improvements, and compare regional progress across Africa, Asia, and Latin America. It can be found in this link: https://zenodo.org/records/14006889?utm_source=chatgpt.com.
We opened the r file with the data in R Studio, after that we created a csv file from this data set which was in R fortmat. We made it so we can upload it to Github.
We read the dataset directly from GitHub using read_csv(). This ensures reproducibility.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(stringr)
library(ggplot2)
hdi_raw <- read_csv("https://raw.githubusercontent.com/arutam-antunish/DATA607/refs/heads/main/world_hdi.csv")
## Rows: 210 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): country, tier_hdi, iso3c
## dbl (14): hdi_rank, hdi_1990, hdi_2000, hdi_2010, hdi_2015, hdi_2019, hdi_20...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(hdi_raw)
We filtered out rows with missing country, HDI, or region values, and ensured that Year and HDI are numeric. We did additional cleaning steps to ensure consistency across country and region names, removed rows with missing HDI or year values, and eliminated any duplicate entries.
hdi_tidy <- hdi_raw %>%
pivot_longer(
cols = matches("^hdi_\\d{4}$"),
names_to = "Year",
names_prefix = "hdi_",
values_to = "HDI") %>%
mutate(Year = as.integer(Year),
HDI = as.numeric(HDI),
Country = str_trim(country)) %>%
filter(!is.na(HDI))
View(hdi_tidy)
We finalized the cleaning process by removing duplicate rows, filtering out entries without ISO country codes, and keeping only valid HDI values between 0 and 1. These steps ensure the dataset is consistent, reliable, and ready for analysis.
hdi_tidy <- hdi_tidy %>%
distinct() %>%
filter(HDI >= 0 & HDI <= 1)
View(hdi_tidy)
We removed unnecessary columns related to average HDI growth, which are not needed for our analysis. We also resolved a duplicate column issue by keeping only one standardized country column. Finally, we filtered out invalid HDI values and removed duplicate rows.
hdi_tidy <- hdi_tidy %>%
select(-starts_with("avg_growth_")) %>%
select(-Country) %>% select(-rank_change_2015_2022) %>% distinct() %>%
filter(HDI >= 0 & HDI <= 1)
View(hdi_tidy)
We selected the top 10 countries with the greatest HDI improvement between 1990 and 2022 and visualized them using a horizontal bar chart. This highlights which nations have made the most progress in human development over the past three decades.
hdi_change <- hdi_tidy %>%
filter(Year %in% c(1990, 2022)) %>%
pivot_wider(names_from = Year, values_from = HDI, names_prefix = "hdi_") %>%
mutate(hdi_change = hdi_2022 - hdi_1990) %>%
arrange(desc(hdi_change))
View(hdi_change)
top10_hdi_change <- hdi_change %>% slice_max(hdi_change, n = 10)
ggplot(top10_hdi_change, aes(x = reorder(country, hdi_change), y = hdi_change)) +
geom_col(fill = "#2E86AB") +
coord_flip() +
labs(title = "Top 10 Countries by HDI Improvement (1990–2022)",
x = "Country", y = "HDI Change") +
theme_classic()
The chart displays the average HDI in 2022 across three selected regions: Africa, Asia, and Latin America. Each bar represents the mean HDI of countries manually grouped by region, allowing a visual comparison of development levels.
hdi_regions <- hdi_tidy %>%
filter(Year == 2022) %>%
mutate(region = case_when(country %in% c("Nigeria", "Kenya", "South Africa", "Ethiopia", "Rwanda") ~ "Africa",
country %in% c("India", "China", "Indonesia", "Vietnam", "Philippines") ~ "Asia",
country %in% c("Brazil", "Mexico", "Argentina", "Colombia", "Chile") ~ "Latin America", TRUE ~ NA_character_)) %>%
filter(!is.na(region))
region_summary <- hdi_regions %>%
group_by(region) %>%
summarise(avg_hdi = mean(HDI, na.rm = TRUE))
ggplot(region_summary, aes(x = region, y = avg_hdi, fill = region)) +
geom_col() + labs(title = "Average HDI by Region (2022)",
x = "Region", y = "Average HDI") +
theme_classic()
### Global HDI Distribution by Country in 2022
We joined HDI data for 2022 to a spatial map of countries using ISO3 codes. The resulting choropleth map uses color intensity to represent HDI levels, allowing a clear visual comparison of global development.
install.packages(c("rnaturalearth", "rnaturalearthdata", "sf"), repos = "https://cloud.r-project.org")
## Installing packages into 'C:/Users/aruta/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## package 'rnaturalearth' successfully unpacked and MD5 sums checked
## package 'rnaturalearthdata' successfully unpacked and MD5 sums checked
## package 'sf' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\aruta\AppData\Local\Temp\RtmpkrNs7l\downloaded_packages
library(rnaturalearth)
library(rnaturalearthdata)
##
## Attaching package: 'rnaturalearthdata'
## The following object is masked from 'package:rnaturalearth':
##
## countries110
library(sf)
## Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE
library(ggplot2)
library(dplyr)
hdi_2022 <- hdi_tidy %>%
filter(Year == 2022)
world <- ne_countries(scale = "medium", returnclass = "sf")
world_hdi <- world %>% left_join(hdi_2022, by = c("iso_a3" = "iso3c"))
ggplot(world_hdi) +
geom_sf(aes(fill = HDI), color = "gray70", size = 0.1) +
scale_fill_viridis_c(option = "C", na.value = "white") +
labs(title = "Human Development Index (HDI) by Country - 2022",
fill = "HDI") + theme_classic()
This project explored global human development trends using HDI data from 1990 to 2022. Through targeted cleaning and structured analysis, we identified countries with the greatest HDI improvement, compared regional averages, and visualized global disparities on a map.