U.S. Foreign Aid by Funding Agency

Author

Cheslav Lukashanets

The dataset for this project is a summary of U.S. foreign aid by funding agency (2001–present time). I plan to focus on which funding agencies/accounts spent/contributed the most money, and where these funds were spent around the world.

variables: - funding agency ID/acronym/name - funding account ID/name - country code/name - transaction type ID/name - fiscal year - current/constant amount (in U.S. dollars, “constant”: adjusted for inflation)

source: foreignassistance.gov/data, one of six pre-made subsets of the cumulative dataset. I considered using the complete dataset, but thought it too big (3.66GB) to comfortably clean.

# load libraries
library(tidyverse)
library(dplyr)
library(dslabs)
library(treemap)
library(RColorBrewer)

# load dataset
setwd("C:/Users/chesl/Desktop/DATA110")
csv <- read_csv("us_foreign_aid_funding_agency.csv")

# create dataset for cleaning
df <- csv

# clean var names
names(df) <- tolower(names(df))
names(df) <- gsub(" ", "_", names(df))

# rename important vars
names(df)[names(df) == "funding_agency_name"] <- "funding_agency"
names(df)[names(df) == "funding_account_name"] <- "funding_account"
names(df)[names(df) == "country_name"] <- "country"
names(df)[names(df) == "transaction_type_name"] <- "transaction_type"

# filter for important vars
df <- df[, c("funding_agency", "funding_account", "country", "transaction_type", "fiscal_year", "current_amount", "constant_amount")]

# remove NA values from important vars
df <- df %>%
  filter(!is.na(funding_agency),
         !is.na(funding_account),
         !is.na(country),
         !is.na(transaction_type),
         !is.na(fiscal_year),
         !is.na(current_amount),
         !is.na(constant_amount))

options(scipen = 20)

# subset data
subset1 <- df |>
  group_by(funding_agency) |>
  mutate(sum_amount = sum(constant_amount)) |> # total funds spent by U.S. agency
  filter((funding_agency != "Department of State and U.S. Agency for International Development") & (sum_amount > 2.50e10)) |> # filter out joint funds, and agencies with comparatively low total funds [for visual clarity]
  mutate(n_accounts = length(unique(funding_account))) # number of funding accounts per U.S. agency

treemap(subset1,
        index = "funding_agency",
        vSize = "sum_amount",
        vColor = "n_accounts",
        type = "dens",
        palette = "Blues",
        title = "U.S. government agencies by foreign aid funded & number of funding accounts (2001–present day)",
        title.legend = "Number of funding accounts (per agency)")

# subset data
subset2 <- df |>
  group_by(country) |>
  filter((country != "World") & (sum(constant_amount) > 2.00e10)) |> # total funds received by country
  summarize(country_total = sum(constant_amount))

plot2 <- ggplot(subset2, aes(y = country, x = country_total/1.00e9)) +
  geom_bar(stat = "identity") +
  labs(title = "U.S. foreign aid received by countries", x = "Total aid ($ billions)", y = "Top countries")
plot2

The dataset required sparse cleaning: I renamed its vars (tolower(), gsub(), names()[] indexes), selected the most important vars (filter()), and removed all NA from them (filter(!is.na())). The first treemap represents which government agencies have spent the most funds on foreign aid in total, as well as how many funding accounts were involved. Per the data, the Department of State has funded almost half of all foreign aid (since 2000), and the U.S.A.I.D. funded most of the other, with smaller contributions by the Dept.s of Agriculture, Defense, and Treasury. The second graph shows in which countries the most amount of money was spent (since 2000). Per the data, >$400B was spent in Afghanistan, followed by Israel, Iraq, Ukraine, and Egypt. I did try to arrange the graph in order, but I was unsuccessful.