Sourced from https://datacatalog.worldbank.org/search/dataset/0066219/Contract-Awards-in-Investment-Project-Financing
“The data available here now includes all contract awards financed by
The World Bank under Investment Project Financing (IPF) operations. The
data source is STEP (Systematic Tracking of Exchanges in Procurement),
which is required to be used by Borrowers in all IPF operations subject
to the World Bank’s Procurement Regulations. Data is entered by
Borrowers.”Supplier Country / Economy” represents the place of supplier
registration, which may or may not be the supplier’s actual country /
economy of origin. Information does not include awards to
subcontractors, nor does it account for cofinancing.
Please note that for contracts awarded to joint-ventures of multiple
companies, the total contract value was split equally amongst the
members of the joint-venture.”
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(Hmisc)
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:dplyr':
##
## src, summarize
##
## The following objects are masked from 'package:base':
##
## format.pval, units
library(readr)
library(ggplot2)
contract_awards <- readr::read_csv("Contract_Awards_in_Investment_Project_Financing.csv", show_col_types = FALSE)
#Hmisc::describe(contract_awards)
#detach("package:psych", unload = TRUE)
# Simplify column names
contract_awards <- contract_awards |> rename(review_type = `Review type`, project_id = `Project ID`, borrower_country = `Borrower Country / Economy`, supplier_country = `Supplier Country / Economy`)
Using the World Bank Contract Awards data set, lets determine
whether a contract in which the supplier and borrower are from the same
country positively correlates with a higher financing approval,
negatively correlates, or has no correlation.
Assumptions: Limiting the data set where Review type = Post only to
narrow the scope to decisively awarded and financed projects.
post_contracts <- contract_awards |>
dplyr::filter(contract_awards$review_type == "Post")
nrow(contract_awards)
## [1] 247762
nrow(post_contracts)
## [1] 224654
Reduce the data set size to just the subset of columns necessary
to make the determination: Borrower country, Supplier country, Project
ID
post_contracts <- post_contracts |>
dplyr::select(project_id, borrower_country, supplier_country, Region)
#head(bor_sup_cnty, n=20)
# Count unique project Ids grouping by Region, Borrower Country, and Supplier Country
result <- post_contracts |>
dplyr::group_by(Region, borrower_country, supplier_country) |>
dplyr::summarize(proj_count = dplyr::n_distinct(project_id))
## `summarise()` has grouped output by 'Region', 'borrower_country'. You can
## override using the `.groups` argument.
# Add a column that indicates that a borrower and supplier country are the same
result <- result |>
mutate(local = (borrower_country == supplier_country))
result
## # A tibble: 3,522 × 5
## # Groups: Region, borrower_country [149]
## Region borrower_country supplier_country proj_count local
## <chr> <chr> <chr> <int> <lgl>
## 1 AFRICA Africa Burundi 1 FALSE
## 2 AFRICA Africa Congo, Democratic Re… 1 FALSE
## 3 AFRICA Africa France 1 FALSE
## 4 AFRICA Africa Kenya 1 FALSE
## 5 AFRICA Africa Uganda 1 FALSE
## 6 AFRICA Africa United Kingdom 1 FALSE
## 7 EAST ASIA AND PACIFIC Cambodia Australia 3 FALSE
## 8 EAST ASIA AND PACIFIC Cambodia Bangladesh 2 FALSE
## 9 EAST ASIA AND PACIFIC Cambodia Cambodia 25 TRUE
## 10 EAST ASIA AND PACIFIC Cambodia Cameroon 1 FALSE
## # ℹ 3,512 more rows
What would the distribution look like on a weighted plot.
The expectation is that a positive correlation would suggest a tight
grouping around the 45 degree line.
AFRICA <- result |> dplyr::filter(Region == "AFRICA")
E_ASIA <- result |> dplyr::filter(Region == "EAST ASIA AND PACIFIC")
ES_AFRICA <- result |> dplyr::filter(Region == "Eastern and Southern Africa")
EC_ASIA <- result |> dplyr::filter(Region == "EUROPE AND CENTRAL ASIA")
LATIN <- result |> dplyr::filter(Region == "LATIN AMERICA AND CARIBBEAN")
#lowest : AFRICA EAST ASIA AND PACIFIC Eastern and Southern Africa EUROPE AND CENTRAL ASIA LATIN AMERICA AND CARIBBEAN
#highest: LATIN AMERICA AND CARIBBEAN MIDDLE EAST AND NORTH AFRICA OTHER SOUTH ASIA Western and Central Africa
ggplot(AFRICA, aes(x = borrower_country, y = supplier_country, size = proj_count, color = local)) + geom_point()
ggplot(E_ASIA, aes(x = borrower_country, y = supplier_country, size = proj_count, color = local)) + geom_point()
ggplot(ES_AFRICA, aes(x = borrower_country, y = supplier_country, size = proj_count, color = local)) + geom_point()
ggplot(EC_ASIA, aes(x = borrower_country, y = supplier_country, size = proj_count, color = local)) + geom_point()
ggplot(LATIN, aes(x = borrower_country, y = supplier_country, size = proj_count, color = local)) + geom_point()
From the scatter plots there is a loose but not convincing
correlation suggesting the relationship is nothing more than
coincidence. I would conclude there is no correlation between local
borrower and supplier financing.