Project_3

Author

Jonathan RH

The topic for my project is “USA vs. Latin America: Statistical Overview of Global Development Indicators.” The data set is about World Development Indicators extracted from the World Bank Database (WBD) between 1960 and 2021. WDB is an online platform that has a vast amount of global economic, social, and environmental data from countries and regions from around the world. The data set I chose has information about 268 countries with 12,272 observations and 50 variables, 48 of the variables are numerical, and the other two are categorical. The variables I will be using are as follows: country, date, land type, GDP, population, and intentional homicide per 100,000 people. I also created two more variables to help me analyze this data further. I created GDP_per_capital by dividing the GDP by the population and the year by extracting the year from the date column. I will use these variables to answer the following questions:

If GDP per capita affects political stability?

What do the USA and other Latin countries use their land for?

Once everything was planned out, I cleaned the data set by using filter() to get the countries I would like to work with and to get rid of NAs in the columns to plot. I also used to mutate to create a column for GDP_Per_capita and Year.

I chose this topic because I wanted to compare how Latin countries compare to the United States. As a US-born Nicaraguan and Latino male that has been in the US for the entirety of my life. I wanted to see what I might learn from this assignment about other Latin countries. I wish I could have explored Nicaragua in this assignment, but thanks to this class, I can now do it in my free time. The reason for the data set I picked is that I have recently discovered my interest in indicators that affect the world.

Loading Libraries

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(gganimate)
library(gifski)
library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Bring in the dataset

WBDI <- read_csv("world_bank_development_indicators.csv")

Rows: 17272 Columns: 50
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (1): country
dbl  (48): agricultural_land%, forest_land%, land_area, avg_precipitation, t...
date  (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Changing the name of the columns

colnames(WBDI)[3] <- 'agrucultural_land_percentage'
colnames(WBDI)[4] <- 'forest_land_percentage'

head(WBDI) # Checking the first 6 rows

# A tibble: 6 × 50
  country     date       agrucultural_land_pe…¹ forest_land_percentage land_area
  <chr>       <date>                      <dbl>                  <dbl>     <dbl>
1 Afghanistan 1960-01-01                   NA                       NA        NA
2 Afghanistan 1961-01-01                   57.9                     NA    652230
3 Afghanistan 1962-01-01                   58.0                     NA    652230
4 Afghanistan 1963-01-01                   58.0                     NA    652230
5 Afghanistan 1964-01-01                   58.1                     NA    652230
6 Afghanistan 1965-01-01                   58.1                     NA    652230
# ℹ abbreviated name: ¹agrucultural_land_percentage
# ℹ 45 more variables: avg_precipitation <dbl>, `trade_in_services%` <dbl>,
#   control_of_corruption_estimate <dbl>, control_of_corruption_std <dbl>,
#   `access_to_electricity%` <dbl>, `renewvable_energy_consumption%` <dbl>,
#   electric_power_consumption <dbl>, CO2_emisions <dbl>,
#   other_greenhouse_emisions <dbl>, population_density <dbl>,
#   `inflation_annual%` <dbl>, real_interest_rate <dbl>, …

Cleaning and Filtering data to Desired Countries

V_Countries <- WBDI |>
  filter(country %in% c("Brazil", "Argentina", "Colombia", "Peru", "Mexico", "United States")) |>
  filter(!is.na(political_stability_estimate)) |>
  mutate(GDP_Per_capita = GDP_current_US/population) |> # Creating a column for the GDP_Per_capital, by dividing the GDP with the population
  mutate(Pop_M = population/1000000) #Simplifying the population

United States Linear Model

US_data <- V_Countries |>
  filter(country=="United States") 

cor(US_data$political_stability_estimate, US_data$GDP_Per_capita)

[1] -0.4783129

US_LM <- lm(political_stability_estimate~ GDP_Per_capita, data = US_data)

US_LM


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = US_data)

Coefficients:
   (Intercept)  GDP_Per_capita  
     1.101e+00      -1.384e-05

summary(US_LM)


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = US_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.75681 -0.09775  0.02043  0.23584  0.48440 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     1.101e+00  2.832e-01   3.889 0.000791 ***
GDP_Per_capita -1.384e-05  5.416e-06  -2.555 0.018066 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3049 on 22 degrees of freedom
Multiple R-squared:  0.2288,    Adjusted R-squared:  0.1937 
F-statistic: 6.526 on 1 and 22 DF,  p-value: 0.01807

The correlation is -0.47 which show a moderate negative relation.

The formula is:

political_stability_estimate = -0.00001384(GDP_Per_capita) + 1.101

The formula suggests for every increases of GDP_Per_capita for The United State, political_stability_ estimate will decrease by -0.00001384. 1.101 would be the level of political_stability_estimate for The United States if GDP_Per_capita was 0.

According to the summary of the linear model the p-value is 0.01807 which makes the evidence statically significant. The adjusted R-Squared is 0.1937 which means that 19% of the data can be explained by the variable.

Mexico Linear Model

MEX_data <- V_Countries |>
  filter(country=="Mexico") 

cor(MEX_data$political_stability_estimate, MEX_data$GDP_Per_capita)

[1] -0.4303164

MEX_LM <- lm(political_stability_estimate~ GDP_Per_capita, data = MEX_data)

MEX_LM


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = MEX_data)

Coefficients:
   (Intercept)  GDP_Per_capita  
    -5.193e-02      -6.061e-05

summary(MEX_LM)


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = MEX_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.58756 -0.11265 -0.01161  0.04914  0.48047 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)  
(Intercept)    -5.193e-02  2.566e-01  -0.202   0.8415  
GDP_Per_capita -6.061e-05  2.711e-05  -2.236   0.0358 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2246 on 22 degrees of freedom
Multiple R-squared:  0.1852,    Adjusted R-squared:  0.1481 
F-statistic:     5 on 1 and 22 DF,  p-value: 0.03582

The correlation is -0.43 which show a moderate negative relation.

The formula is:

political_stability_estimate = -0.00006061(GDP_Per_capita) - 0.05193

The formula suggests for every increases of GDP_Per_capita for Mexico, political_stability_ estimate will decrease by -0.00006061. -0.05193 would be the level of political_stability_estimate for Mexico if GDP_Per_capita was 0.

According to the summary of the linear model the p-value is 0.03582 which makes the evidence statically significant. The adjusted R-Squared is 0.1481 which means that 14% of the data can be explained by the variable.

Brazil Linear Model

BRA_data <- V_Countries |>
  filter(country=="Brazil") 

cor(BRA_data$political_stability_estimate, BRA_data$GDP_Per_capita)

[1] -0.1664585

BRA_LM <- lm(political_stability_estimate~ GDP_Per_capita, data = BRA_data)

BRA_LM


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = BRA_data)

Coefficients:
   (Intercept)  GDP_Per_capita  
    -1.196e-01      -1.315e-05

summary(BRA_LM)


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = BRA_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.47849 -0.14885 -0.08782  0.17501  0.48454 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)
(Intercept)    -1.196e-01  1.400e-01  -0.854    0.402
GDP_Per_capita -1.315e-05  1.661e-05  -0.792    0.437

Residual standard error: 0.2471 on 22 degrees of freedom
Multiple R-squared:  0.02771,   Adjusted R-squared:  -0.01649 
F-statistic: 0.627 on 1 and 22 DF,  p-value: 0.4369

The correlation is -0.16 which show a weak negative relation.

The formula is:

political_stability_estimate = -0.00001315(GDP_Per_capita) - 0.1196

The formula suggests for every increases of GDP_Per_capita for Brazil, political_stability_ estimate will decrease by -0.00001315. -0.1196 would be the level of political_stability_estimate for Brazil if GDP_Per_capita was 0.

According to the summary of the linear model the p-value is 0.4369 which makes the evidence statically insignificant. The adjusted R-Squared is -0.01649 which means that none of the data can be explained by the variable.

Argentina Linear Model

AR_data <- V_Countries |>
  filter(country=="Argentina") 

cor(AR_data$political_stability_estimate, AR_data$GDP_Per_capita)

[1] 0.7049632

AR_LM <- lm(political_stability_estimate~ GDP_Per_capita, data = AR_data)

AR_LM


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = AR_data)

Coefficients:
   (Intercept)  GDP_Per_capita  
    -5.023e-01       4.655e-05

summary(AR_LM)


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = AR_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.39436 -0.06880 -0.01411  0.07347  0.26439 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -5.023e-01  1.004e-01  -5.002 5.25e-05 ***
GDP_Per_capita  4.655e-05  9.986e-06   4.662  0.00012 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1693 on 22 degrees of freedom
Multiple R-squared:  0.497, Adjusted R-squared:  0.4741 
F-statistic: 21.74 on 1 and 22 DF,  p-value: 0.0001198

The correlation is 0.7 which show a strong positive relation.

The formula is:

political_stability_estimate = 0.00004655(GDP_Per_capita) - 0.5023

The formula suggests for every increases of GDP_Per_capita for Argentina, political_stability_ estimate will increase by 0.00004655. -0.5023 would be the level of political_stability_estimate for Mexico if GDP_Per_capita was 0.

According to the summary of the linear model the p-value is 0.0001198 which makes the evidence statically significant. The adjusted R-Squared is 0.4741 which means that 47% of the data can be explained by the variable.

Peru Linear Model

PE_data <- V_Countries |>
  filter(country=="Peru") 

cor(PE_data$political_stability_estimate, PE_data$GDP_Per_capita)

[1] 0.7114107

PE_LM <- lm(political_stability_estimate~ GDP_Per_capita, data = PE_data)

PE_LM


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = PE_data)

Coefficients:
   (Intercept)  GDP_Per_capita  
    -1.2642489       0.0001181

summary(PE_LM)


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = PE_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.40760 -0.12905  0.01211  0.14464  0.48963 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -1.264e+00  1.275e-01  -9.919 1.40e-09 ***
GDP_Per_capita  1.180e-04  2.486e-05   4.748 9.71e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2354 on 22 degrees of freedom
Multiple R-squared:  0.5061,    Adjusted R-squared:  0.4837 
F-statistic: 22.54 on 1 and 22 DF,  p-value: 9.714e-05

The correlation is 0.71 which shows a strong positive relation.

The formula is:

political_stability_estimate = 0.0001181(GDP_Per_capita) - 1.2642489

The formula suggests for every increases of GDP_Per_capita for Peru, political_stability_ estimate will increase by 0.0001181. -1.2642489 would be the level of political_stability_estimate for Peru if GDP_Per_capita was 0.

According to the summary of the linear model the p-value is 0.00009714 which makes the evidence statically significant. The adjusted R-Squared is 0.4837 which means that 48% of the data can be explained by the variable.

Colombia Linear Model

CO_data <- V_Countries |>
  filter(country=="Colombia") 

cor(CO_data$political_stability_estimate, CO_data$GDP_Per_capita)

[1] 0.6950712

CO_LM <- lm(political_stability_estimate~ GDP_Per_capita, data = CO_data)

CO_LM


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = CO_data)

Coefficients:
   (Intercept)  GDP_Per_capita  
    -2.3953149       0.0001814

summary(CO_LM)


Call:
lm(formula = political_stability_estimate ~ GDP_Per_capita, data = CO_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.47537 -0.32269 -0.09816  0.32043  0.73189 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -2.395e+00  2.245e-01 -10.670 3.66e-10 ***
GDP_Per_capita  1.814e-04  4.001e-05   4.535 0.000163 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3804 on 22 degrees of freedom
Multiple R-squared:  0.4831,    Adjusted R-squared:  0.4596 
F-statistic: 20.56 on 1 and 22 DF,  p-value: 0.0001634

The correlation is 0.69 which show a strong positive relation.

The formula is:

political_stability_estimate = 0.0001814(GDP_Per_capita) - 2.3953149

The formula suggests for every increases of GDP_Per_capita for Colombia, political_stability_ estimate will increase by 0.0001814. -2.3953149 would be the level of political_stability_estimate for Colombia if GDP_Per_capita was 0.

According to the summary of the linear model the p-value is 0.0001634 which makes the evidence statically significant. The adjusted R-Squared is 0.4596 which means that 45% of the data can be explained by the variable.

Facet Wrap of Linear Models

IL <- V_Countries |>
  ggplot(aes(x = GDP_Per_capita, y = political_stability_estimate)) +
  geom_point() +
  geom_smooth(color='skyblue2') +
  facet_wrap(~ country)
IL

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Creating a Column For Year

V_Countries$year <- format(V_Countries$date, "%Y")
V_Countries$year <-as.numeric(as.character(V_Countries$year)) #Converting the datatype to numerical because when I pulled year it was consider a string.

Animated Bubble Chart

Land_CO <- V_Countries |>
  ggplot(aes(x = GDP_Per_capita, y = Pop_M, color = country, size = political_stability_estimate)) +
  geom_point() +
  scale_x_log10()+
   labs(title = 'Visualizing Global Trends: US vs Latin Countries  Year: {frame_time}', 
       x = 'GDP per Capita', 
       y = 'Population (In Millions)',
       color = "Countries",
       size = "Political Stability",
       caption = "Source: World Bank Database") +
  transition_time(date) +
  ease_aes('linear') +
  theme(plot.title=element_text(size= 9))+
  theme_bw() +
  scale_color_manual(values = c("United States" = '#68228B',
  "Mexico" = '#EE2C2C',
  "Brazil" = '#76EE00',
  "Peru" = '#EE1289',
  "Colombia" = '#EE1',
  "Argentina" = '#00688B'))

#I had to comment out the code below, so I can render and upload to Rpubs.

#animation <- animate(Land_CO, height = 450, fps = 9, resolution = 1500, renderer = gifski_renderer())
#animation
#anim_save("Land_CO_Animated_Bubble_Chart.gif", animation = animation)

I created this chart to visualize the change of political stability from 1996 to 2021 for the selected countries. I also wanted to see the changes in GDP per capita and population throughout the years to see any correlation throughout the years. What stood out to me was the political instability in Colombia that is visible for the duration of the animation. Along with how the United States’s political stability is slowing down, decreasing as the GDP per capita and population simultaneously increase.

Creating Data for Stacked Bar Graph

Bar <- V_Countries |>
  filter(!is.na('agrucultural_land_percentage')) |> # Removing NAs to graph.
  filter(!is.na('forest_land_percentage')) |>
  filter(!is.na(land_area)) |>
  filter(year == 2021) |> # Focusing on latest available data.
  mutate(remaining_land_percentage = 100 - agrucultural_land_percentage - forest_land_percentage) |> # Getting the percentage of the reaming unlabeled land.
  mutate(Total_land = 100) 

#Grouping the land types into one column
Bar_Long <- gather(Bar, key = "land_type", value= "percentage_land", 'agrucultural_land_percentage','forest_land_percentage', 'remaining_land_percentage')

Interactive Stacked Bar Graph

Stacked <- Bar_Long |>
  ggplot(aes(x = country, 
             y = percentage_land, 
             fill = land_type,
             text = paste0("Country: ", country, "<br>",
             "Percentage Used: ", round(percentage_land, 2), "<br>",
             "Type Of Land: ", land_type, "<br>",
             "Population(Millions): ", round(population/1e6, 1), "<br>",
             "Land Area (KM^2): ", land_area, "<br>"
               ))) +
  geom_bar(stat = "identity") +
  labs(title = "Land Use Distribution (%): Agricultural vs. Forest vs. Other Land (2021)",
       x = "Countries \n
       Source: World Bank Database",
       y = "Total Land(%)",
       fill = "Use Of Land",
       caption = "Source: World Bank Database") +
  theme_dark()+
  scale_fill_manual(values = c('yellow','orange','red'))

ggplotly(Stacked, tooltip = "text")

I created this stacked bar chart to compare how the countries’ land is distributed. What caught my eye was how Peru, Brazil, and Colombia are mostly forest. After doing some background research, the reason for that is due to the Amazon forest being present in these three Latin countries. Something else that caught my eye was the high agricultural land in the United States. According to the USDA, Mexico and the United States are agricultural powerhouses. The stacked bar graph helps me gain a glimpse into these countries’ geography.

Line Plot

Line_P <- V_Countries |>
  filter(!is.na(intentional_homicides))

F_plot <- Line_P |>
  ggplot(aes(x = year,
         y = intentional_homicides,
         color = country)) +
  geom_line(size = 1) +
  labs(x = "Year",
       y = "Intentional Homicides (Per 100,000 people)",
       color = "Countries",
       caption = "Source: World Bank Database",
       title = "Intentional Homicides Over Time \n (1996 - 2021)")+
  theme_bw() +
  scale_color_brewer(palette = "Accent")

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

ggplotly(F_plot)

The line graph that I created represents the trend of homicide in the US and Latin countries over the time of 1996 to 2021. I used the intentional homicides variable for the y-axis and the year variable for the x-axis. From what I can tell from the plot, Latin countries tend to have higher intentional homicides than the United States.

Essay

The visualizations I created stand for the differences between the United States and Latin countries, comparing political stability, land distribution, and intentional homicides. In the animated bubble chart, two patterns that caught my attention were the slow decrease of the United States’ political stability and the political stability of Colombia. The United States, despite having a high GDP per capita, has experienced a gradual decline in political stability. This trend can be due to income inequality, political polarization, and increased social unrest (IMF). In contrast, Colombia’s political instability has been more severe. A major factor has been widespread political corruption, where many politicians were found to have collaborated with paramilitary groups. This scandal significantly undermined public trust in political institutions and highlights the issues that are affecting Colombia’s governance (Gillin). Then, the stacked bar graph is used to show land distribution. The high percentage of forest in Brazil, Peru, and Colombia is due to the Amazon Rainforest found in those countries. Unlike the United States and Mexico, whose land is mostly agricultural, making them agricultural powerhouses. Both countries are experiencing population growth and actively participate in the global trade market. Finally, I created a line graph to see homicides over time, and what caught my eye at once was the high intentional homicide rate for Colombia in 2002. After conducting some background research, I found out the homicide rate was high in 2002 due to a drug war and conflict with the guerrilla group FRAC (Vallejo). In addition, Mexico’s increase from 2007 all the way to 2021 made me question what was going on at that time. I found that this was due to the fragmentation of criminal organizations following the government’s intensified anti-drug efforts by the Mexican government. This disruption created power vacuums resulting in more violence, where groups are competing for control over territories and illicit markets (dlewis). The contrast in homicide for Latin countries compared to the United States is astonishing, but the high drug and criminal activity in these Latin countries mostly contributes to the increase. In conclusion, while creating these visualizations, I gained valuable insights into the stark contrasts between the United States and Latin American countries in terms of political stability, land use, and intentional homicides. I had hoped to explore Nicaragua further and include variables like the “Doing Business” index, but unfortunately, the data was too limited. I also tried to add population to the tooltip for the line graph, but by doing so, it caused the graph to disappear. Despite these challenges, this project has pushed me to see what else I am capable of and offered valuable insight into the stark contrasts between the United States and Latin American countries in terms of political stability, land use, and intentional homicides.

Bibliography

“How Does Political Instability Affect Economic Growth?” IMF, 2011, www.imf.org/en/Publications/WP/Issues/2016/12/31/How-Does-Political-Instability-Affect-Economic-Growth-24570? Accessed 16 May 2025.

Gillin, Joel. “Understanding the Causes of Colombia’s Conflict: Weak, Corrupt State Institutions.” Colombia News | Colombia Reports, Colombia News | Colombia Reports, 13 Jan. 2015, colombiareports.com/understanding-colombias-conflict-weak-corrupt-state-institutions/.

Vallejo, Katherine, et al. “Trends of Rural/Urban Homicide in Colombia, 1992-2015: Internal Armed Conflict and Hints for Postconflict.” BioMed Research International, vol. 2018, Oct. 2018, pp. 1–11, https://doi.org/10.1155/2018/6120909.

dlewis. “Is Mexico Becoming More Peaceful? Mexico Peace Index 2021.” Vision of Humanity, June 2021, www.visionofhumanity.org/why-is-mexico-becoming-more-peaceful/. Accessed 15 May 2025.

Sources

gather() code : https://tidyr.tidyverse.org/reference/gather.html

Animated bubble chart: https://r-graph-gallery.com/package/gganimate.html

Rendering the animated plot: https://gganimate.com/reference/animate.html