library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.4.1
## Warning: package 'dplyr' was built under R version 4.4.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)

Load the dataset

military <- read.csv("military_expenditure.csv")

Introduction

Is there any correlation between a country’s military expenditure in US dollars and their military expenditure as a percentage of overall government expenditure, or their military expenditure as a percentage of GDP (gross domestic product)? Military expenditure is the amount of money a country spends and allocates towards their military. GDP is a single numerical unit that factors in consumer and government spending as a way of measuring the value of goods and economic strength within a country (IMF, 2019). This dataset from worldbank.org lists all countries’ military expenditure in US dollars, as a percentage of general government expenditure and as a percentage of GDP from 1970-2020. Thus, we are looking at a raw figure (military expenditure in US dollars) and two proportions (military expenditure as percentages of general government expenditure and GDP). There are a few categorical variables as well, including region and income level (low, upper middle, high, etc.).

This dataset can be found here: https://data.worldbank.org/indicator/MS.MIL.XPND.CD?end=2022&start=1960&view=chart

Clean the dataset and create sub datasets

military_new <- na.omit(military) # Remove missing values from entire dataset
military_exp <- military_new[!is.na(military$Military.expenditure..current.USD.), ] #From here, remove missing values from only the military expenditure in US dollars column
military_cor <- military_exp[!is.na(military_exp$Military.expenditure....of.general.government.expenditure.), ] # From here, remove missing values from only the military expenditure of general government expenditure column
military2 <- military_cor[military_cor$country != "World", ] # Remove all rows with world as the country

Filter the cleaned dataset by income level and most recent year

low <- military_new |>
  filter(incomeLevel == "Low income",
         year == "2020")
sample(low$country, 15) # Sample 15 random low income countries from the low dataset
##  [1] "Central African Republic" "Congo, Dem. Rep."        
##  [3] "Rwanda"                   "Mozambique"              
##  [5] "Guinea"                   "Ethiopia"                
##  [7] "Malawi"                   "Niger"                   
##  [9] "Sierra Leone"             "Guinea-Bissau"           
## [11] "Togo"                     "Liberia"                 
## [13] "Burundi"                  "Madagascar"              
## [15] "Burkina Faso"
lower_middle <- military2 |>
  filter(incomeLevel == "Lower middle income",
         year == "2020")
sample(lower_middle$country, 15) # Sample 15 random lower middle income countries from the lower_middle dataset
##  [1] "Senegal"            "Cameroon"           "Ukraine"           
##  [4] "Nepal"              "Nicaragua"          "Zambia"            
##  [7] "Philippines"        "Cote d'Ivoire"      "Cabo Verde"        
## [10] "Morocco"            "Sri Lanka"          "Pakistan"          
## [13] "Mauritania"         "Iran, Islamic Rep." "Bolivia"
upper_middle <- military2 |>
  filter(incomeLevel == "Upper middle income",
         year == "2020")
sample(upper_middle$country, 15) # Sample 15 random upper middle income countries from the upper_middle dataset
##  [1] "Serbia"                 "Belarus"                "Jordan"                
##  [4] "Moldova"                "Bosnia and Herzegovina" "Malaysia"              
##  [7] "Fiji"                   "Albania"                "South Africa"          
## [10] "Jamaica"                "Kazakhstan"             "North Macedonia"       
## [13] "Ecuador"                "Guatemala"              "Argentina"
high <- military2 |>
  filter(incomeLevel == "High income",
         year == "2020")
sample(high$country, 15) # Sample 15 random high income countries from the high dataset
##  [1] "Poland"            "Finland"           "Portugal"         
##  [4] "France"            "Greece"            "Oman"             
##  [7] "Latvia"            "Israel"            "United States"    
## [10] "Cyprus"            "Norway"            "Australia"        
## [13] "Croatia"           "United Kingdom"    "Brunei Darussalam"

Create new dataset from sampled countries

military_income <- military2 |> filter(country %in% c("Liberia", "Central African Republic", "Niger", "Mozambique", "Guinea", "Afghanistan", "Chad", "Malawi", "Togo", "Gambia, The", "Guinea-Bissau", "Madagascar", "Sierra Leone", "Congo, Dem. Rep.", "Sudan", "Morocco", "Nigeria", "Pakistan", "Papua New Guinea", "Eswatini", "Tunisia", "Haiti", "Tajikistan", "Egypt, Arab Rep.", "Timor-Leste", "Zambia", "Tanzania", "Cote d'Ivoire", "Kenya", "El Salvador", "Colombia", "Azerbaijan", "Namibia", "Montenegro", "Jordan", "Romania", "Peru", "Guyana", "Dominican Republic", "Thailand", "North Macedonia", "Iraq", "Mexico", "Jamaica", "Armenia"))

Boxplots of Military Expenditure by Country Income Level

boxplot(Military.expenditure..current.USD. ~ incomeLevel, data = military_income,
        ylab = "Military Expenditure (in US dollars)", xlab = "Income Level")

*Please note that I did not include high income countries, because the amount they spend on military is so insurmountably high that it condensed the other income level boxplots and made the entire boxplot visualization unreadable.

Scatterplot Military Expenditure as % of GDP vs % of General Government Spending by Income Level

ggplot(military_income, aes(x = Military.expenditure....of.GDP., y = Military.expenditure....of.general.government.expenditure., color = incomeLevel)) +
  geom_point(alpha = 0.7) +
  scale_color_manual(values = c("red", "yellow", "green")) +
  labs(title = "World Military Expenditure as % of GDP vs % of General Government Spending",
       caption = "Source: Data.worldbank.org",
       x = "Military Expenditure (% of GDP)",
       y = "Military Expenditure (% of General Government Spending)",
       color = "Country Income Level") 

Summary of new dataset

summary(military_income)
##    country             iso3c              iso2c                year     
##  Length:619         Length:619         Length:619         Min.   :1988  
##  Class :character   Class :character   Class :character   1st Qu.:1999  
##  Mode  :character   Mode  :character   Mode  :character   Median :2007  
##                                                           Mean   :2006  
##                                                           3rd Qu.:2013  
##                                                           Max.   :2020  
##  Military.expenditure..current.USD.
##  Min.   :5.351e+04                 
##  1st Qu.:5.110e+07                 
##  Median :1.758e+08                 
##  Mean   :1.264e+09                 
##  3rd Qu.:1.389e+09                 
##  Max.   :1.250e+10                 
##  Military.expenditure....of.general.government.expenditure.
##  Min.   : 0.000                                            
##  1st Qu.: 4.238                                            
##  Median : 6.423                                            
##  Mean   : 7.672                                            
##  3rd Qu.:10.176                                            
##  Max.   :27.396                                            
##  Military.expenditure....of.GDP. adminregion        incomeLevel       
##  Min.   :0.00062                 Length:619         Length:619        
##  1st Qu.:1.16856                 Class :character   Class :character  
##  Median :1.56807                 Mode  :character   Mode  :character  
##  Mean   :1.91512                                                      
##  3rd Qu.:2.50366                                                      
##  Max.   :6.42731

One interesting statistic from this summary is the minimum military expenditure percentage of general government expenditure is 0%, while the maximum is 27.4%. This indicates the wide array of percentages in military spending around the world and how some countries spend drastically more on their national defense.

Military Expenditure mean

mean(military_income$Military.expenditure..current.USD.)
## [1] 1264237026

The mean, or average, amount of money in US dollars each country in this new dataset spends on their military is $1,264,237,026.

#Let’s create a dataset from just 2020 now

military_2020 <- military2 |>
  filter(year == "2020")

Linear Regressions

library(ggplot2)
ggplot(military_2020, aes(x = Military.expenditure..current.USD., y = Military.expenditure....of.general.government.expenditure.)) +
  geom_point() + 
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(x = "Military Expenditure (in US Dollars)",
        y = "% of General Government Expenditure",
        title = "Linear Regression: 2020 Military Spending vs Military % of Government Spending") +
  theme_test()
## `geom_smooth()` using formula = 'y ~ x'

lm_model <- lm(Military.expenditure..current.USD. ~ Military.expenditure....of.general.government.expenditure., data = military_2020)

summary(lm_model)
## 
## Call:
## lm(formula = Military.expenditure..current.USD. ~ Military.expenditure....of.general.government.expenditure., 
##     data = military_2020)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -6.047e+10 -5.482e+10 -5.333e+10 -4.122e+10  1.227e+12 
## 
## Coefficients:
##                                                             Estimate Std. Error
## (Intercept)                                                5.413e+10  2.849e+10
## Military.expenditure....of.general.government.expenditure. 2.331e+08  3.506e+09
##                                                            t value Pr(>|t|)  
## (Intercept)                                                  1.900   0.0599 .
## Military.expenditure....of.general.government.expenditure.   0.066   0.9471  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.813e+11 on 117 degrees of freedom
## Multiple R-squared:  3.777e-05,  Adjusted R-squared:  -0.008509 
## F-statistic: 0.004419 on 1 and 117 DF,  p-value: 0.9471
library(ggplot2)
ggplot(military_2020, aes(x = Military.expenditure..current.USD., y = Military.expenditure....of.GDP.)) +
  geom_point() + 
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(x = "Military Expenditure (in US Dollars)",
        y = "Military % of GDP",
        title = "Linear Regression: 2020 Military Spending vs Military Percentage of GDP") +
  theme_test()
## `geom_smooth()` using formula = 'y ~ x'

lm_model_2 <- lm(Military.expenditure..current.USD. ~ Military.expenditure....of.GDP., data = military_2020)

summary(lm_model_2)
## 
## Call:
## lm(formula = Military.expenditure..current.USD. ~ Military.expenditure....of.GDP., 
##     data = military_2020)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.537e+11 -5.176e+10 -4.264e+10 -3.381e+10  1.222e+12 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)
## (Intercept)                     2.997e+10  2.769e+10   1.082    0.281
## Military.expenditure....of.GDP. 1.200e+10  1.037e+10   1.156    0.250
## 
## Residual standard error: 1.802e+11 on 117 degrees of freedom
## Multiple R-squared:  0.0113, Adjusted R-squared:  0.002852 
## F-statistic: 1.337 on 1 and 117 DF,  p-value: 0.2498
ggplot(military_income, aes(x = Military.expenditure..current.USD., y = Military.expenditure....of.GDP.)) +
  geom_point() + 
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(x = "Military Expenditure (in US Dollars)",
        y = "Military % of GDP",
        title = "Linear Regression: Military Spending vs Military Percentage of GDP") +
  theme_test()
## `geom_smooth()` using formula = 'y ~ x'

lm_model_3 <- lm(Military.expenditure..current.USD. ~ Military.expenditure....of.GDP., data = military_income)

summary(lm_model_3)
## 
## Call:
## lm(formula = Military.expenditure..current.USD. ~ Military.expenditure....of.GDP., 
##     data = military_income)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.974e+09 -9.249e+08 -4.373e+08  1.899e+08  9.961e+09 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -536782646  152143556  -3.528  0.00045 ***
## Military.expenditure....of.GDP.  940422191   67674710  13.896  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.983e+09 on 617 degrees of freedom
## Multiple R-squared:  0.2384, Adjusted R-squared:  0.2371 
## F-statistic: 193.1 on 1 and 617 DF,  p-value: < 2.2e-16

Let’s try the larger dataset now

upper_middle_total <- military2 |>
  filter(incomeLevel == "Upper middle income")

Statistical Analysis: The Final Linear Regression

ggplot(military_income, aes(x = Military.expenditure....of.general.government.expenditure., y = Military.expenditure....of.GDP.)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE, color = "green") +
  labs(x = "Military Expenditure (% of General Government Spending)",
       y = "Military Expenditure (% of GDP)",
       title = "Linear Regression: Military Expenditure % of Government Expenditure vs % of GDP",
       caption = "Source: Data.worldbank.org") +
  theme_test()
## `geom_smooth()` using formula = 'y ~ x'

Something that stands out to me about this scatterplot is how once the points on the x-axis increase past about 7%, they begin to disperse more and more, both above and below the regression line. This suggests that once a country’s military expenditure as a percentage of general government spending reaches the 7% threshold, their military expenditure as a percentage of GDP begins to reveal more volatility and become more unpredictable in all directions. Another way to convey this finding is that the left side of the scatterplot depicts a stronger correlation, and it gets weaker as the points are plotted across the x-axis.

lm_model_final <- lm(Military.expenditure....of.GDP. ~ Military.expenditure....of.general.government.expenditure., data = military_income)

summary(lm_model_final)
## 
## Call:
## lm(formula = Military.expenditure....of.GDP. ~ Military.expenditure....of.general.government.expenditure., 
##     data = military_income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6305 -0.3788 -0.0899  0.3651  3.2029 
## 
## Coefficients:
##                                                            Estimate Std. Error
## (Intercept)                                                 0.38176    0.04881
## Military.expenditure....of.general.government.expenditure.  0.19986    0.00536
##                                                            t value Pr(>|t|)    
## (Intercept)                                                  7.822 2.27e-14 ***
## Military.expenditure....of.general.government.expenditure.  37.285  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6539 on 617 degrees of freedom
## Multiple R-squared:  0.6926, Adjusted R-squared:  0.6921 
## F-statistic:  1390 on 1 and 617 DF,  p-value: < 2.2e-16

When analyzing low, lower middle and upper middle income countries within this dataset, there is a statistically significant correlation within military expenditure as a percentage of GDP and as a percentage of general government spending between 1970-2020. As the percentage of military spending from general government spending from a country in any of these income levels increases, so does the military spending as a percentage of GDP from the country.

The linear regression equation is: military spending as a percentage of GDP = 0.20 (military spending as a percentage of general government expenditure)+ 0.38.

An example would be the following: In 2001, Colombia spent 12.07% of their total government expenditures on their military. Plugging 12.07 into our equation, we obtain: 0.20(12.07)+ 0.38 = 2.79. So from our model, we would estimate that Colombia’s military expenditure as a percentage of their GDP was 2.79% in 2001. That’s the predicted value. The observed value is actually 3.32%, leaving us with a residual of 0.3 for that data point.

The estimated intercept is 0.38, meaning that when the military expenditure as a percentage of general government spending is zero, the predicted military expenditure as a percentage of GDP would be 0.38, assuming of course that the model applies in this range. The coefficient of 0.20 means that for each additional unit increase in military expenditure as a percentage of general government expenditure, the country’s military expenditure as a percentage of GDP increases by approximately 0.20 units, on average. Both the intercept and military expenditure as a percentage of general government expenditure have extremely small p-values that are well below 0.001 (***), indicating they are statistically significant predictors of military expenditure as a percentage of GDP at a high confidence interval.

The residual standard error is 0.65, which gives an idea of the average distance between the observed and predicted percentage of GDP values. Also, the adjusted R-squared value is 0.6921, meaning that about 69.21% of the variation in military expenditure as a percentage of GDP can be explained by military expenditure as a percentage of general government expenditure. Lastly, the F-statistic is 1390 with a p-value of < 2.2e-16, meaning that the model as a whole is statistically significant and that a country’s military expenditure as a percentage of general government expenditure is a useful predictor of that country’s military expenditure as a percentage of GDP.

Residual Histogram

hist(residuals(lm_model_final))

This histogram of the residuals is distributed very closely to a normal distribution around zero with constant variance. Residuals +1 and -1 have very similar frequencies around 75, for instance.

Conclusion

The first finding from the boxplot visualization is that a country’s income level is a very prominent variable pertaining to its connection to military spending. Low income countries spend so much less on their military than any other income level, while high income countries spend incredibly more. Lower middle and upper middle income countries are somewhere in between with their military spending and are roughly the same.

When looking at data from only one year, there is no correlation between any of the variables I investigated. However, when looking at the data over a time period of several decades, there does appear to be a slight positive correlation between military expenditure in US dollars and as a percentage of gross domestic product, perhaps suggesting that a strong or well-funded military is correlated to a strong economy. This also reminds us of the importance of a large sample and variance.

It then occurred to me to try something different. I realized I was trying to find a correlation between a raw figure (US dollars) and a proportion, or percentage. But just because a country may spend a lot or a little on their military does not mean that that would then reflect a high percentage of their total government spending or GDP, nor render any kind of trend. So instead, I performed a linear regression to see if there was a correlation between two proportions, or two percentages, and in this case, the military expenditure percentages of general government expenditure and of GDP.

This approach resulted in the strongest correlation yet. Regardless if a country is low, lower middle or upper middle income, there is a statistically significant correlation between military expenditure as a percentage of general government expenditure and as a percentage of GDP.

Implications

The implication of this positive correlation is that the higher percentage of spending a country distributes to their military, the higher percentage of their GDP will be from their military. It’s not necessarily the amount of money a government spends on their military that matters, but the proportion of their total expenditures. Furthermore, there are countries that may spend more money on their military than other countries, but it’s still a smaller percentage of their general government spending, because they have more money to spend overall. This is important to remember as we compare and analyze raw figures against proportions, no matter what we are studying.

Future Analysis

The United States really skewed this initial dataset, because of how much we spend on our military. This made creating visualizations challenging since the rest of the data would shrink visually. Still, there should be a way to include our country in a productive way. The other main challenge with this data analysis was filtering several times to find an appropriately-sized dataset to explore. I frequently filtered by income level, year, and country to try to find some kind of a significant result that wasn’t too small or too big in size. By randomly sampling 15 countries from low, lower middle and upper middle income levels, I was able to create boxplots from my selections.

I am curious about what other economic factors and metrics we could study to see if they are correlated to military spending. Variables such as employment rate, poverty rate, inflation rate, or even murder rate, may be related to how much a country allocates to their military. After all, those statistics factor in to GDP. Additionally, fear, terrorism and self-defense play a role in influencing political decisions such as resource allocation. It would also be interesting to see if other categorical variables such as governmental system (capitalism, socialism, etc.) or sex of government leaders are correlated to military expenditures. Then we could conclude if male or female-lead countries are more likely to have larger militaries.

So, why is it difficult to find a strong correlate to military spending that proves to be true across the world? There are probably an abundance of influencing and cultural factors that make this challenging to answer. Every country prioritizes their goals differently. In America for instance, we are the world’s elite military industrial complex with bases across the world, so it makes sense that our military budget is so high. On the other hand, a country like Japan is more health-conscious and focuses more on preventative measures to cease the spread of viruses. One thing is more apparent than anything, however, and that is that high income countries dominate all other income levels in military spending.

References

“Military Expenditure (Current USD).” World Bank Open Data, data.worldbank.org/indicator/MS.MIL.XPND.CD?end=2022&start=1960&view=chart. Accessed 13 Dec. 2024.

Tim Callen is a former Assistant Director in the IMF’s Communication Department. “Gross Domestic Product: An Economy’s All.” IMF, 15 June 2019, www.imf.org/en/Publications/fandd/issues/Series/Back-to-Basics/gross-domestic-product-GDP.