The data that I’m using is from the Stockholm International Peace Research Institute (SIPRI) and among other things it tracks weapons transfers internationally from 1975 to present day. SIPRI is used the source used by the UN and linked in their website. Because there’s so much information in their databases I had to narrow down the data I would downloads. On their website I filtered to look at only weapons sold by the United States. The the variables in the data set are “ recipient” (the nation receiving weapons), “supplier” (the nation selling weapons, which in my case was always the United States), the year the weapons were ordered, number ordered, the weapon designation (which is the name of the weapon(s) they were buying), the year the weapon was delivered, the “status” (which in this case refers to if the weapon was new or secondhand), comments (additional information about weapons), and then they had three more complicated variables. These variables SIPRI TIV per unit, SIPRI TIV for total order, and SIPRI TIV of delivered weapons. SIPRI TIV stands for Stockholm International Peace Research Institute trend indicator value. This measures the military capability of the weapon/total purchase and its value to the country. Part of why it exists is to account for changes in the market and weapons capability. The variables I ended up using were the total value of the weapon sold (measured by the SPIRI TIV), the years they were sold, and what nation was receiving them.
Load in data
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)setwd("/Users/marieadelegrosso/Desktop/Desktop - Marie’s MacBook Air (2)/Data")weapons <-read_csv("weapons register 2000s.csv")
Rows: 3006 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Recipient, Supplier, Weapon designation, Weapon description, Year(s...
dbl (6): Year of order, Number ordered, Number delivered, SIPRI TIV per unit...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(weapons)
# A tibble: 6 × 13
Recipient Supplier `Year of order` `Number ordered` `Weapon designation`
<chr> <chr> <dbl> <dbl> <chr>
1 Afghanistan United Stat… 2014 12 MD-500E
2 Afghanistan United Stat… 2015 12 MD-500E
3 Afghanistan United Stat… 2012 4 C-130H Hercules
4 Afghanistan United Stat… 2016 1673 HMMWV-UA
5 Afghanistan United Stat… 2014 222 MaxxPro
6 Afghanistan United Stat… 2016 433 HMMWV-UA
# ℹ 8 more variables: `Weapon description` <chr>, `Number delivered` <dbl>,
# `Year(s) of delivery` <chr>, status <chr>, Comments <chr>,
# `SIPRI TIV per unit` <dbl>, `SIPRI TIV for total order` <dbl>,
# `SIPRI TIV of delivered weapons` <dbl>
Look at totals of weapons delivered by country
byrecipient <- weapons |>group_by(Recipient, `Year of order`) |># group all recipientsummarise(SIPRI_total_value =sum(`SIPRI TIV of delivered weapons`),delivery_total_count =sum(`Number delivered`), # combine value of all delivered weapons each year for each countryavg_count =mean(`Number delivered`), # mean value per unitavg_SIPRI =mean(`SIPRI TIV of delivered weapons`), # average number of weapons ordered.groups ="drop") |># remove the grouping structure after summarizingarrange(Recipient) #sort alphabetically by "recipient" or countrybyrecipient
years <- byrecipient |># create new factor/column to combine years into decades and increase legibilitymutate(`Decade of order`= dplyr::case_when (`Year of order`<=1980~"1975-1980",`Year of order`>1980&`Year of order`<=1990~"1980-1990",`Year of order`>1990&`Year of order`<=2000~"1990-2000",`Year of order`>2000&`Year of order`<=2010~"2000-2010",`Year of order`>2010&`Year of order`<=2020~"2010-2020",`Year of order`>2020&`Year of order`<=2025~"2020-2025",)) |>mutate(`Decade of order`= factor (`Decade of order`,level =c("1975-1980","1980-1990","1990-2000","2000-2010","2010-2020","2020-2025"))) # Used this post to remember how to do this https://forum.posit.co/t/dplyr-way-s-and-base-r-way-s-of-creating-age-group-from-age/89226/4 years
linear <-lm(SIPRI_total_value ~ avg_SIPRI, delivery_total_count,data = years) # looking to see if estimated number of weapons purchased is related to the value of the purchase. I realized that the correlation is not particularly consistant, I would have had to catagorize weapon type to make this interesting summary(linear)
Call:
lm(formula = SIPRI_total_value ~ avg_SIPRI, data = years, subset = delivery_total_count)
Residuals:
Min 1Q Median 3Q Max
-590.16 -46.48 -31.14 -3.11 1417.02
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.35470 6.16434 4.762 2.15e-06 ***
avg_SIPRI 1.91107 0.04059 47.079 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 191.1 on 1202 degrees of freedom
(76 observations deleted due to missingness)
Multiple R-squared: 0.6484, Adjusted R-squared: 0.6481
F-statistic: 2216 on 1 and 1202 DF, p-value: < 2.2e-16
What the model means
The P value was 2.2e-16 which is 0.00000000000000022 meaning there is essentially no correlation between these two variables.This is notable because it shows that high value purchases are not inherently going towards a high value weapons they are often also going to a high volume of weapons.
avg_SIPRI means the average per unit cost
SIPRI_total_value is the sum of every purchase
The model equation is: SIPRI_total_value = 1.91107 (avg_SIPRI) + 29.35470
Veiw Linear Regression Graphs
plot(linear) #visualizing this to see if I potentially want to use this data
Single out 10 countries receiving the most weapons from the US in total over all time
topten <- years |># Narrow the table down to top 10 recipients of US weaponsfilter((Recipient=="Saudi Arabia"| Recipient=="United Arab Emirates"| Recipient=="Qatar"| Recipient=="United Kingdom"| Recipient=="Japan"| Recipient=="Australia"| Recipient=="Israel"| Recipient=="United Arab Emirates"| Recipient=="Taiwan"| Recipient=="Egypt")) |>arrange(SIPRI_total_value)topten
# A tibble: 233 × 7
Recipient `Year of order` SIPRI_total_value delivery_total_count avg_count
<chr> <dbl> <dbl> <dbl> <dbl>
1 United Arab… 2004 1 50 50
2 Japan 1977 1.42 2 2
3 Saudi Arabia 1990 2.19 73 73
4 Japan 1982 4.28 1 1
5 Taiwan 1982 5 2 2
6 Japan 2008 7.2 24 24
7 United King… 2023 7.63 763 763
8 United King… 2019 8 4 4
9 Australia 1999 8.97 299 299
10 Australia 2023 9 105 52.5
# ℹ 223 more rows
# ℹ 2 more variables: avg_SIPRI <dbl>, `Decade of order` <fct>
Create stacked bar graph
plot1 <- topten |>ggplot() +geom_bar(aes(x=Recipient, y=SIPRI_total_value, fill=`Decade of order`), #specify stacked bar graph and fillposition ="stack", stat ="identity" ) +labs(fill ="Year of Weapons Order", # specify labels for graph, title and sourcex ="Country Recieving Weapons",y ="Total Value of Weapons Sold (in SIPRI TIV)",title ="Top 10 Countries Receiving Weapons from the United States from 1975-2025",caption ="Source: Stockholm International Peace Research Institute") +scale_fill_manual(values =c( #change to pretty colors "1975-1980"="#615055","1980-1990"="#946e83","1990-2000"="#b4a6ab","2000-2010"="#cdd5d1","2010-2020"="#B6CBBA","2020-2025"="#9EC1A3"))+theme_minimal() +theme(axis.text.x =element_text(angle =30, hjust =1, vjust =1)) # Used The R Graph Gallery to remember how to do this because I couldn't find it in my notes plot1
Methods/Mini Essay
Question A
I started out pretty simple with data cleaning. Because I was able to specify what data I wanted before I downloaded the CSV, my data didn’t include very much information that I wanted to remove. Initially, I filtered out NA values, but the variables I ended up using did not have any and filtering it out on other columns was unnecessary. I did change some formatting, I grouped the data by recipient and year of order an also added columns for several data points that I found useful in deciding what I would do with the data. My last step was adding a column with year ranges because I was working with a large range of dates and needed fewer categories.
Question B
The visualization shows the value of the weapons sold by the United States to the top 10 receiving countries divided up by decades. That means that we not only have information on the value of weapons in the United States, has sold to each country in the last 50 years, but we also can see when that happened. It’s important to know that because we are SIPRI TIV The value is measured adjusted for inflation and what weapons were available. It’s interesting to see the degree to which weapons sales increased in the 2000s-2010s.
Question C
Showing the most commonly sold weapons in each time period would have been really interesting with interactivity. Unfortunately, it was too complicated to measure that in an empirical way without using other databases or using more complex grouping.