Project 2: World GDP and Population Growth, 2012-2016
Author
Michael Desir
Visualizing Growth in Population and GDP for 31 Countries, 2012-2016
What is this all about?
A country’s place on the world stage is largely dependent on what it produces and how consistently it can do so. One popular measure of production is a nation’s gross domestic product (GDP going forward). This is a comprehensive measure of the value of all final goods and services produced in a country or financed by said country. This includes personal consumption, business investment, government spending and net exports. In sum, GDP measures how much a country brings to the table. This project makes use of the following variables from a World Bank World Economic Indicators dataset.
year: Ranging between 2012 and 2016
country_name: From a group of 31 countries
country_code: A nation’s unique 3-letter ISO3C designation
region: World Bank regional classifications
population_total_millions: A nation’s population, divided by 1 million
gdp_current_us_dollars_billions: A nation’s GDP in 2024 US dollars
military_spending_gdp_percentage: The percentage of a nation’s GDP that is attributed to military spending
Dataset: World Bank World Economic Indicators, https://databank.worldbank.org/embed/Population-and-GDP-by-Country/id/29c4df41
This dataset required mild cleaning. The columns needed to be stripped of economic codes and simplified for processing. To add on a regional classification for each country, I merged data from the Nations dataset (also from World Bank) using a left-join that matched all instances of each country in my dataset to the region it belonged to according to the Nations dataset. I then deleted collective data from my dataset, so I could focus on individual countries. Because my dataset came in with metadata in the first and second columns, my post-cleaned columns had to be recategorized as numeric. After that, I could start designing models.
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
library(lubridate)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(extrafont)
Registering fonts with R
library(leaflet)library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
New names:
Rows: 191 Columns: 16
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(15): Time Code, Country Name, Country Code, ...5, GDP (current US$) [NY... dbl
(1): Time
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...5`
Rows: 5275 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso2c, iso3c, country, region, income
dbl (5): year, gdp_percap, population, birth_rate, neonat_mortal_rate
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Dataset Management
Restructure columns
gdp2 <- gdp1 %>%select(-"...5",-"Time Code",-"GDP per capita (constant LCU) [NY.GDP.PCAP.KN]") %>%rename("Year"="Time") %>%rename("GDP current US dollars billions"="GDP (current US$) [NY.GDP.MKTP.CD]") %>%rename("GDP per capita thousands"="GDP per capita (current US$) [NY.GDP.PCAP.CD]") %>%rename("Population total millions"="Population, total [SP.POP.TOTL]") %>%rename("GDP per capita growth percentage"="GDP per capita growth (annual %) [NY.GDP.PCAP.KD.ZG]") %>%rename("GDP per capita PPP intnl dollars thousands"="GDP per capita, PPP (current international $) [NY.GDP.PCAP.PP.CD]") %>%rename("GDP PPP intnl dollars billions"="GDP, PPP (current international $) [NY.GDP.MKTP.PP.CD]") %>%rename("GDP growth percentage"="GDP growth (annual %) [NY.GDP.MKTP.KD.ZG]") %>%rename("GDP current local currency billions"="GDP (current LCU) [NY.GDP.MKTP.CN]") %>%rename("Government spending GDP percentage"="General government final consumption expenditure (% of GDP) [NE.CON.GOVT.ZS]") %>%rename("Military spending GDP percentage"="Military expenditure (% of GDP) [MS.MIL.XPND.GD.ZS]") %>%select("Year","Country Name","Country Code","Population total millions", everything())names(gdp2) <-tolower(names(gdp2))names(gdp2) <-gsub(" ","_",names(gdp2))gdp2 <- gdp2[-c(1),]#select("Time","Country Name","Country Code")head(gdp2,10)
# A tibble: 10 × 13
year country_name country_code population_total_mil…¹ gdp_current_us_dolla…²
<dbl> <chr> <chr> <chr> <chr>
1 2012 China CHN 1354.2 8532
2 2012 India IND 1274.5 1828
3 2012 Indonesia IDN 250.2 918
4 2012 Korea, Rep. KOR 50.2 1278
5 2012 Saudi Arabia SAU 30.8 742
6 2012 Turkiye TUR 75.2 881
7 2012 United King… GBR 63.7 2707
8 2012 United Stat… USA 313.9 16254
9 2012 Brunei Daru… BRN 0.4 19
10 2012 Israel ISR 7.9 262
# ℹ abbreviated names: ¹population_total_millions,
# ²gdp_current_us_dollars_billions
# ℹ 8 more variables: gdp_per_capita__thousands <chr>,
# gdp_per_capita_growth_percentage <chr>,
# gdp_per_capita_ppp_intnl_dollars_thousands <chr>,
# gdp_ppp_intnl_dollars_billions <chr>, gdp_growth_percentage <chr>,
# gdp_current_local_currency_billions <chr>, …
Call:
lm(formula = gdp_mill ~ population_total_millions, data = main_2016)
Residuals:
Min 1Q Median 3Q Max
-5.9499 -0.6233 -0.5638 -0.2956 16.3759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.578671 0.658537 0.879 0.38678
population_total_millions 0.005727 0.001829 3.131 0.00396 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.353 on 29 degrees of freedom
Multiple R-squared: 0.2526, Adjusted R-squared: 0.2268
F-statistic: 9.8 on 1 and 29 DF, p-value: 0.00396
Equation: GDP = 0.005(population) + 0.578
(Please note that all values passed to the model were in millions, so the results should be interpreted accordingly.)
The relationship between GDP and population has a p-value of 0.004, which indicates that any observed correlation is most likely not coincidental.
This linear regression model provides insights into the relationship between a country’s population and its gross domestic product (GDP). The adjusted R-squared value 0.2268 indicates that there is a weak yet existent correlation between a country’s population and its GDP. The p-value of 0.00396 would seem to suggest that this model is statistically significant. It should be noted however that most points live in a relatively small region of the graph, with the notable outliers being the United States, China, and India. In essence, my linear model found a weak correlation between a country’s population and its gross domestic product.
reg <-ggplot(temp,aes(population_total_millions, gdp_current_us_dollars_billions,color=region,label = country_name,#text = paste("<b>Population (Millions):</b>",population_total_millions,"<br><b>GDP ($ Billions):</b>",gdp_current_us_dollars_billions,"<br><b>Region:</b>",region,"<br></b>Country:</b>",country_name) ) ) +geom_smooth(color ="purple4", fill="lightgray") +geom_point(size =2) +labs(x ="Country Population in Millions",y ="Country GDP in Billions $",title ="Relating Population to GDP, 2016",subtitle ="The nations of the United States, China and India were removed from this plot because they were outliers such that all other points were indistiguishable.",caption ="World Bank World Economic Indicators",color ="Region") +theme_minimal() +theme(plot.background =element_rect(fill ="aliceblue"),plot.title =element_text(color ="maroon",family="Times New Roman",face="bold"),panel.background =element_rect(fill ="#696969"),axis.text =element_text(color ="#3c4142"),axis.title =element_text(color ="#3c4142", size =12),legend.text =element_text(color ="#3c4142"),legend.title =element_text(color ="#3c4142") ) +scale_color_manual(values=c("lightblue3", "maroon","green4","orange2","pink","purple"))ggplotly(reg) %>%## labs caption broke when ggplot activated, so using plotly captionlayout(margin =list(l =50, r =50, b =100, t =50),annotations =list(x =1, y =-0.3, text ="From World Bank World Economic Indicators",xref='paper', yref='paper',xanchor='right', yanchor='auto', xshift=0, yshift=0,font =list(size =11),showarrow=F) )
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
The nations of the United States, China and India were removed from this plot because they were outliers such that all other points were indistiguishable.
This plot confirms the findings of the linear regression model, as it shows little correlation between a country’s population and its gross domestic product. What it does show, however, is that most of the countries in this dataset had relative similar GDP values across populations ranging between 1 and 265 million people. Also, there is an observable correlation between variables in the range of 0-100 million people.
Assuming "longitude" and "latitude" are longitude and latitude, respectively
map_w_popup
Firstly, this map exposes the bias in this dataset. What I didn’t notice from the get-go was that the providers of this dataset did not note why/how this data was collected or how it should be used. This map, however, makes it evident that virtually all of the data points are in Asia and the Middle East, with only a few notable world powers (and Monaco) outside these regions. This is not representative of the world GDP rankings at any time in history.
What is notably missing from this map is variations in the size of the bubbles based on some variable, like population or GDP. This is because any formula I could create that the leaflet would accept made the United State’s area approximately the size of the plot and rendered the rest of the pop-ups useless.
Plot Middle East military spending
military <- main %>%filter(region =="Middle East & North Africa")military <- military[order(military$year),]sau <- military %>%filter(country_code =="SAU")omn <- military %>%filter(country_code =="OMN")are <- military %>%filter(country_code =="ARE")kwt <- military %>%filter(country_code =="KWT")isr <- military %>%filter(country_code =="ISR")lbn <- military %>%filter(country_code =="LBN")
paints <-c("#87cefa", "red","green","orange","pink","purple")text_col <-list(color ="black",fontWeight ="bold")highchart() |>hc_title(text ="Middle East Military Spending % of GDP",style =list(color ="black",fontWeight ="bold",fontSize=20)) |>hc_subtitle(text ="This plot visualizes changes in military spending as a percentage of total GDP <br>for the Middle East's biggest spenders between 2012 and 2016.",align ="right") |>hc_yAxis(title =list(text ="Military Spending % of GDP",style = text_col)) |>hc_caption(text ="From World Bank World Economic Indicators",align="right") |>hc_add_series(data = sau$military_spending_gdp_percentage,name ="Saudi Arabia",type ="line",yAxis =0) |>hc_add_series(data = omn$military_spending_gdp_percentage,name ="Oman",type ="line",yAxis =0) |>hc_add_series(data = are$military_spending_gdp_percentage,name ="United Arab Emirates",type ="line",yAxis =0) |>hc_add_series(data = kwt$military_spending_gdp_percentage,name ="Kuwait",type ="line",yAxis =0) |>hc_add_series(data = isr$military_spending_gdp_percentage,name ="Israel",type ="line",yAxis =0) |>hc_add_series(data = lbn$military_spending_gdp_percentage,name ="Lebanon",type ="line",yAxis =0) |>hc_xAxis(categories = sau$year,tickInterval =1,title =list(text ="Year")) |>hc_colors(paints) |>hc_legend(backgroundColor ="black",borderRadius =11,itemStyle =list(color ="snow" ) ) |>hc_add_theme(hc_theme(chart =list(backgroundColor ='snow')))
This plot shows that the militaries of Saudi Arabia and Oman led their region in their shares in their respective national GDP. Saudi Arabia’s spending peaked above the rest in 2015 when they led a military intervention into Yemen. Oman was forced to invest a large amount of money relative to their gross because of Yemen’s civil war and the resulting unstability on their shared border.
In conclusion: This project brought me into the study of economic growth. Though this dataset was not as comprehensive as I hoped it would be, I was able to find a weak-yet-existent correlation between a country’s population and its GDP. I wish that I had been able to find a formula for my GIS plot that could accurately and intuitively represent national GDPs. I was also able to study changes in the Middle East’s military spending as a function of its gross domestic product throughout a tumultuous period.