The topic for my project is “USA vs. Latin America: Statistical Overview of Global Development Indicators.” The data set is about World Development Indicators extracted from the World Bank Database (WBD) between 1960 and 2021. WDB is an online platform that has a vast amount of global economic, social, and environmental data from countries and regions from around the world. The data set I chose has information about 268 countries with 12,272 observations and 50 variables, 48 of the variables are numerical, and the other two are categorical. The variables I will be using are as follows: country, date, land type, GDP, population, and intentional homicide per 100,000 people. I also created two more variables to help me analyze this data further. I created GDP_per_capital by dividing the GDP by the population and the year by extracting the year from the date column. I will use these variables to answer the following questions:
If GDP per capita affects political stability?
What do the USA and other Latin countries use their land for?
Once everything was planned out, I cleaned the data set by using filter() to get the countries I would like to work with and to get rid of NAs in the columns to plot. I also used to mutate to create a column for GDP_Per_capita and Year.
I chose this topic because I wanted to compare how Latin countries compare to the United States. As a US-born Nicaraguan and Latino male that has been in the US for the entirety of my life. I wanted to see what I might learn from this assignment about other Latin countries. I wish I could have explored Nicaragua in this assignment, but thanks to this class, I can now do it in my free time. The reason for the data set I picked is that I have recently discovered my interest in indicators that affect the world.
Loading Libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gganimate)library(gifski)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Rows: 17272 Columns: 50
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): country
dbl (48): agricultural_land%, forest_land%, land_area, avg_precipitation, t...
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Changing the name of the columns
colnames(WBDI)[3] <-'agrucultural_land_percentage'colnames(WBDI)[4] <-'forest_land_percentage'head(WBDI) # Checking the first 6 rows
# A tibble: 6 × 50
country date agrucultural_land_pe…¹ forest_land_percentage land_area
<chr> <date> <dbl> <dbl> <dbl>
1 Afghanistan 1960-01-01 NA NA NA
2 Afghanistan 1961-01-01 57.9 NA 652230
3 Afghanistan 1962-01-01 58.0 NA 652230
4 Afghanistan 1963-01-01 58.0 NA 652230
5 Afghanistan 1964-01-01 58.1 NA 652230
6 Afghanistan 1965-01-01 58.1 NA 652230
# ℹ abbreviated name: ¹agrucultural_land_percentage
# ℹ 45 more variables: avg_precipitation <dbl>, `trade_in_services%` <dbl>,
# control_of_corruption_estimate <dbl>, control_of_corruption_std <dbl>,
# `access_to_electricity%` <dbl>, `renewvable_energy_consumption%` <dbl>,
# electric_power_consumption <dbl>, CO2_emisions <dbl>,
# other_greenhouse_emisions <dbl>, population_density <dbl>,
# `inflation_annual%` <dbl>, real_interest_rate <dbl>, …
Cleaning and Filtering data to Desired Countries
V_Countries <- WBDI |>filter(country %in%c("Brazil", "Argentina", "Colombia", "Peru", "Mexico", "United States")) |>filter(!is.na(political_stability_estimate)) |>mutate(GDP_Per_capita = GDP_current_US/population) |># Creating a column for the GDP_Per_capital, by dividing the GDP with the populationmutate(Pop_M = population/1000000) #Simplifying the population
The formula suggests for every increases of GDP_Per_capita for The United State, political_stability_ estimate will decrease by -0.00001384. 1.101 would be the level of political_stability_estimate for The United States if GDP_Per_capita was 0.
According to the summary of the linear model the p-value is 0.01807 which makes the evidence statically significant. The adjusted R-Squared is 0.1937 which means that 19% of the data can be explained by the variable.
The formula suggests for every increases of GDP_Per_capita for Mexico, political_stability_ estimate will decrease by -0.00006061. -0.05193 would be the level of political_stability_estimate for Mexico if GDP_Per_capita was 0.
According to the summary of the linear model the p-value is 0.03582 which makes the evidence statically significant. The adjusted R-Squared is 0.1481 which means that 14% of the data can be explained by the variable.
The formula suggests for every increases of GDP_Per_capita for Brazil, political_stability_ estimate will decrease by -0.00001315. -0.1196 would be the level of political_stability_estimate for Brazil if GDP_Per_capita was 0.
According to the summary of the linear model the p-value is 0.4369 which makes the evidence statically insignificant. The adjusted R-Squared is -0.01649 which means that none of the data can be explained by the variable.
The formula suggests for every increases of GDP_Per_capita for Argentina, political_stability_ estimate will increase by 0.00004655. -0.5023 would be the level of political_stability_estimate for Mexico if GDP_Per_capita was 0.
According to the summary of the linear model the p-value is 0.0001198 which makes the evidence statically significant. The adjusted R-Squared is 0.4741 which means that 47% of the data can be explained by the variable.
The formula suggests for every increases of GDP_Per_capita for Peru, political_stability_ estimate will increase by 0.0001181. -1.2642489 would be the level of political_stability_estimate for Peru if GDP_Per_capita was 0.
According to the summary of the linear model the p-value is 0.00009714 which makes the evidence statically significant. The adjusted R-Squared is 0.4837 which means that 48% of the data can be explained by the variable.
The formula suggests for every increases of GDP_Per_capita for Colombia, political_stability_ estimate will increase by 0.0001814. -2.3953149 would be the level of political_stability_estimate for Colombia if GDP_Per_capita was 0.
According to the summary of the linear model the p-value is 0.0001634 which makes the evidence statically significant. The adjusted R-Squared is 0.4596 which means that 45% of the data can be explained by the variable.
Facet Wrap of Linear Models
IL <- V_Countries |>ggplot(aes(x = GDP_Per_capita, y = political_stability_estimate)) +geom_point() +geom_smooth(color='skyblue2') +facet_wrap(~ country)IL
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Creating a Column For Year
V_Countries$year <-format(V_Countries$date, "%Y")V_Countries$year <-as.numeric(as.character(V_Countries$year)) #Converting the datatype to numerical because when I pulled year it was consider a string.
Animated Bubble Chart
Land_CO <- V_Countries |>ggplot(aes(x = GDP_Per_capita, y = Pop_M, color = country, size = political_stability_estimate)) +geom_point() +scale_x_log10()+labs(title ='Visualizing Global Trends: US vs Latin Countries Year: {frame_time}', x ='GDP per Capita', y ='Population (In Millions)',color ="Countries",size ="Political Stability",caption ="Source: World Bank Database") +transition_time(date) +ease_aes('linear') +theme(plot.title=element_text(size=9))+theme_bw() +scale_color_manual(values =c("United States"='#68228B',"Mexico"='#EE2C2C',"Brazil"='#76EE00',"Peru"='#EE1289',"Colombia"='#EE1',"Argentina"='#00688B'))#I had to comment out the code below, so I can render and upload to Rpubs.#animation <- animate(Land_CO, height = 450, fps = 9, resolution = 1500, renderer = gifski_renderer())#animation#anim_save("Land_CO_Animated_Bubble_Chart.gif", animation = animation)
I created this chart to visualize the change of political stability from 1996 to 2021 for the selected countries. I also wanted to see the changes in GDP per capita and population throughout the years to see any correlation throughout the years. What stood out to me was the political instability in Colombia that is visible for the duration of the animation. Along with how the United States’s political stability is slowing down, decreasing as the GDP per capita and population simultaneously increase.
Creating Data for Stacked Bar Graph
Bar <- V_Countries |>filter(!is.na('agrucultural_land_percentage')) |># Removing NAs to graph.filter(!is.na('forest_land_percentage')) |>filter(!is.na(land_area)) |>filter(year ==2021) |># Focusing on latest available data.mutate(remaining_land_percentage =100- agrucultural_land_percentage - forest_land_percentage) |># Getting the percentage of the reaming unlabeled land.mutate(Total_land =100) #Grouping the land types into one columnBar_Long <-gather(Bar, key ="land_type", value="percentage_land", 'agrucultural_land_percentage','forest_land_percentage', 'remaining_land_percentage')
Interactive Stacked Bar Graph
Stacked <- Bar_Long |>ggplot(aes(x = country, y = percentage_land, fill = land_type,text =paste0("Country: ", country, "<br>","Percentage Used: ", round(percentage_land, 2), "<br>","Type Of Land: ", land_type, "<br>","Population(Millions): ", round(population/1e6, 1), "<br>","Land Area (KM^2): ", land_area, "<br>" ))) +geom_bar(stat ="identity") +labs(title ="Land Use Distribution (%): Agricultural vs. Forest vs. Other Land (2021)",x ="Countries \n Source: World Bank Database",y ="Total Land(%)",fill ="Use Of Land",caption ="Source: World Bank Database") +theme_dark()+scale_fill_manual(values =c('yellow','orange','red'))ggplotly(Stacked, tooltip ="text")
I created this stacked bar chart to compare how the countries’ land is distributed. What caught my eye was how Peru, Brazil, and Colombia are mostly forest. After doing some background research, the reason for that is due to the Amazon forest being present in these three Latin countries. Something else that caught my eye was the high agricultural land in the United States. According to the USDA, Mexico and the United States are agricultural powerhouses. The stacked bar graph helps me gain a glimpse into these countries’ geography.
Line Plot
Line_P <- V_Countries |>filter(!is.na(intentional_homicides))F_plot <- Line_P |>ggplot(aes(x = year,y = intentional_homicides,color = country)) +geom_line(size =1) +labs(x ="Year",y ="Intentional Homicides (Per 100,000 people)",color ="Countries",caption ="Source: World Bank Database",title ="Intentional Homicides Over Time \n (1996 - 2021)")+theme_bw() +scale_color_brewer(palette ="Accent")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ggplotly(F_plot)
The line graph that I created represents the trend of homicide in the US and Latin countries over the time of 1996 to 2021. I used the intentional homicides variable for the y-axis and the year variable for the x-axis. From what I can tell from the plot, Latin countries tend to have higher intentional homicides than the United States.
Essay
The visualizations I created stand for the differences between the United States and Latin countries, comparing political stability, land distribution, and intentional homicides. In the animated bubble chart, two patterns that caught my attention were the slow decrease of the United States’ political stability and the political stability of Colombia. The United States, despite having a high GDP per capita, has experienced a gradual decline in political stability. This trend can be due to income inequality, political polarization, and increased social unrest (IMF). In contrast, Colombia’s political instability has been more severe. A major factor has been widespread political corruption, where many politicians were found to have collaborated with paramilitary groups. This scandal significantly undermined public trust in political institutions and highlights the issues that are affecting Colombia’s governance (Gillin). Then, the stacked bar graph is used to show land distribution. The high percentage of forest in Brazil, Peru, and Colombia is due to the Amazon Rainforest found in those countries. Unlike the United States and Mexico, whose land is mostly agricultural, making them agricultural powerhouses. Both countries are experiencing population growth and actively participate in the global trade market. Finally, I created a line graph to see homicides over time, and what caught my eye at once was the high intentional homicide rate for Colombia in 2002. After conducting some background research, I found out the homicide rate was high in 2002 due to a drug war and conflict with the guerrilla group FRAC (Vallejo). In addition, Mexico’s increase from 2007 all the way to 2021 made me question what was going on at that time. I found that this was due to the fragmentation of criminal organizations following the government’s intensified anti-drug efforts by the Mexican government. This disruption created power vacuums resulting in more violence, where groups are competing for control over territories and illicit markets (dlewis). The contrast in homicide for Latin countries compared to the United States is astonishing, but the high drug and criminal activity in these Latin countries mostly contributes to the increase. In conclusion, while creating these visualizations, I gained valuable insights into the stark contrasts between the United States and Latin American countries in terms of political stability, land use, and intentional homicides. I had hoped to explore Nicaragua further and include variables like the “Doing Business” index, but unfortunately, the data was too limited. I also tried to add population to the tooltip for the line graph, but by doing so, it caused the graph to disappear. Despite these challenges, this project has pushed me to see what else I am capable of and offered valuable insight into the stark contrasts between the United States and Latin American countries in terms of political stability, land use, and intentional homicides.
Bibliography
“How Does Political Instability Affect Economic Growth?” IMF, 2011, www.imf.org/en/Publications/WP/Issues/2016/12/31/How-Does-Political-Instability-Affect-Economic-Growth-24570? Accessed 16 May 2025.
Gillin, Joel. “Understanding the Causes of Colombia’s Conflict: Weak, Corrupt State Institutions.” Colombia News | Colombia Reports, Colombia News | Colombia Reports, 13 Jan. 2015, colombiareports.com/understanding-colombias-conflict-weak-corrupt-state-institutions/.
Vallejo, Katherine, et al. “Trends of Rural/Urban Homicide in Colombia, 1992-2015: Internal Armed Conflict and Hints for Postconflict.” BioMed Research International, vol. 2018, Oct. 2018, pp. 1–11, https://doi.org/10.1155/2018/6120909.
dlewis. “Is Mexico Becoming More Peaceful? Mexico Peace Index 2021.” Vision of Humanity, June 2021, www.visionofhumanity.org/why-is-mexico-becoming-more-peaceful/. Accessed 15 May 2025.