The War and The Data
One of the most significant geopolitical crises of the twenty-first century is the war between Russia and Ukraine, which is characterized by a complex interplay of historical, cultural, and strategic variables. In this project, we will use data analysis to learn more about the state of the conflict.
The data 2022 Russia Ukraine War was obtained from Kaggle and it describes the equipment losses, death toll, military wounded, and prisoners of war from Russians during the lasts years.
Personnel Losses
The personnel losses is the darkest facet of war, affecting the lives of thousands young men and their relatives.
Now, we will evaluate the losses of Russian personnel utilizing the dataset russia_losses_personnel.csv
, which contains 703 rows and 5 columns and encompasses information on personnel casualties throughout the conflict. Let’s start the data analysis by loading the dataset and checking its header.
Show the code
# Load packages
library(bslib)
library(tidyverse)
library(kableExtra)
library(plotly)
# Load data
<- read_csv("russia_losses_personnel.csv")
personnel
# Show header
kable(head(personnel))
date | day | personnel | personnel* | POW |
---|---|---|---|---|
2022-02-25 | 2 | 2800 | about | 0 |
2022-02-26 | 3 | 4300 | about | 0 |
2022-02-27 | 4 | 4500 | about | 0 |
2022-02-28 | 5 | 5300 | about | 0 |
2022-03-01 | 6 | 5710 | about | 200 |
2022-03-02 | 7 | 5840 | about | 200 |
Let’s summarize the data on a monthly basis to facilitate analysis and visualization of personnel losses over time.
Show the code
# Calculate daily personnel losses
<- personnel %>%
personnel_daily mutate(personnel = personnel - lag(personnel, default = 0))
# Aggregate daily data by month and year
<- personnel_daily %>%
personnel_monthly mutate(month_year = paste(year = year(date), month(date), sep = "-")) %>%
group_by(month_year) %>%
summarise(personnel = sum(personnel)) %>%
mutate(month_year_date = ym(month_year))
We will create a data visualization to check the monthly personnel losses during the war. The graph includes the losses and a trend line to help us understand the overall trend of the losses over time. The data available is from February 2022 to January 2024.
Show the code
# Create the personnel losses plot
<- personnel_monthly %>%
personnel_plot ggplot(aes(
x = month_year_date,
y = personnel,
text = paste0(
"losses: ",
::number(personnel, big.mark = ","),
scales"<br>",
"date: ",
month_year
)+
)) geom_bar(stat = "identity", fill = "skyblue") +
geom_smooth(
aes(group = 1,
text = paste0(
"trend: ", scales::number(..y.., big.mark = ",", accuracy = 1)
)),method = "auto",
se = FALSE,
color = "black"
+
) labs(x = "date",
y = "losses") +
theme_minimal()
# Create the title and subtitle
<- "<b>Russia Personnel Losses</b>"
plot_title <- "<sup>Monthly data and trend of losses over time</sup>"
plot_subtitle
# Add title and interactivity
ggplotly(personnel_plot, tooltip = "text") %>%
layout(margin = list(t = 100),
title = list(text = paste0(plot_title,
"<br>",
%>%
plot_subtitle))) card(full_screen = TRUE)
The graph shows that the personnel losses and the trend line have increased over time; this indicates that the losses are still significant and the war is far from over.
Equipment Losses
Equipment losses are also a crucial aspect of the war. For example, losing a dozen aircraft can significantly impact a country’s military capabilities.
Now, let’s analyze the equipment losses of the Russian military. We will use the dataset named russia_losses_equipment.csv
. The data contains 703 rows, 19 columns and information on the equipment losses throughout the conflict. We will start by loading the dataset and checking out its header.
Show the code
# Load data
<- read_csv("russia_losses_equipment.csv")
equipment
# Show header
kable(head(equipment))
date | day | aircraft | helicopter | tank | APC | field artillery | MRL | military auto | fuel tank | drone | naval ship | anti-aircraft warfare | special equipment | mobile SRBM system | greatest losses direction | vehicles and fuel tanks | cruise missiles | submarines |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022-02-25 | 2 | 10 | 7 | 80 | 516 | 49 | 4 | 100 | 60 | 0 | 2 | 0 | NA | NA | NA | NA | NA | NA |
2022-02-26 | 3 | 27 | 26 | 146 | 706 | 49 | 4 | 130 | 60 | 2 | 2 | 0 | NA | NA | NA | NA | NA | NA |
2022-02-27 | 4 | 27 | 26 | 150 | 706 | 50 | 4 | 130 | 60 | 2 | 2 | 0 | NA | NA | NA | NA | NA | NA |
2022-02-28 | 5 | 29 | 29 | 150 | 816 | 74 | 21 | 291 | 60 | 3 | 2 | 5 | NA | NA | NA | NA | NA | NA |
2022-03-01 | 6 | 29 | 29 | 198 | 846 | 77 | 24 | 305 | 60 | 3 | 2 | 7 | NA | NA | NA | NA | NA | NA |
2022-03-02 | 7 | 30 | 31 | 211 | 862 | 85 | 40 | 355 | 60 | 3 | 2 | 9 | NA | NA | NA | NA | NA | NA |
Let’s summarizes the data in monthly basis to facilitate analysis and visualization of personnel losses over time.
Show the code
# Calculate daily personnel losses
<- equipment %>%
equipment_daily select(c("date", "aircraft", "helicopter", "tank", "MRL")) %>%
mutate(across(-date, ~. - lag(., default = 0)))
# Aggregate daily data by month and year
<- equipment_daily %>%
equipment_monthly mutate(month_year = paste(year = year(date), month(date), sep = "-")) %>%
group_by(month_year) %>%
summarise(
aircraft = sum(aircraft),
helicopter = sum(helicopter),
tank = sum(tank),
MRL = sum(MRL)
%>%
) mutate(month_year_date = ym(month_year))
Before creating the plot, we will create a table to show the unit cost of the main Russian equipment.
Show the code
# Create cost table
<-
equipment_cost data.frame(
"equipment" = c("aircraft",
"helicopter",
"tank",
"MRL"),
"cost in USD millions" = c(50,
10,
6,
4.6),
"unit" = c(
"[Sukhoi Su-34](https://en.wikipedia.org/wiki/Sukhoi_Su-34)",
"[Kamov Ka-50](https://en.wikipedia.org/wiki/Kamov_Ka-50)",
"[T-14 Armata](https://en.wikipedia.org/wiki/T-14_Armata)",
"[M270 Multiple Launch Rocket System](https://en.wikipedia.org/wiki/M270_Multiple_Launch_Rocket_System)"
),check.names = FALSE
)
# Display table
kable(equipment_cost)
equipment | cost in USD millions | unit |
---|---|---|
aircraft | 50.0 | Sukhoi Su-34 |
helicopter | 10.0 | Kamov Ka-50 |
tank | 6.0 | T-14 Armata |
MRL | 4.6 | M270 Multiple Launch Rocket System |
Let’s apply the unit cost to the equipment monthly losses to calculate the total cost.
Show the code
# Multiply the equipment monthly losses by the unit cost
<- equipment_monthly %>%
equipment_monthly mutate(
aircraft = aircraft * equipment_cost$`cost in USD millions`[equipment_cost$equipment == "aircraft"],
helicopter = helicopter * equipment_cost$`cost in USD millions`[equipment_cost$equipment == "helicopter"],
tank = tank * equipment_cost$`cost in USD millions`[equipment_cost$equipment == "tank"],
MRL = MRL * equipment_cost$`cost in USD millions`[equipment_cost$equipment == "MRL"]
)
# Pivot the monthly data
<- pivot_longer(
equipment_long
equipment_monthly,cols = -c(month_year, month_year_date),
names_to = "equipment",
values_to = "cost"
)
Using the main equipment (aircraft, helicopter, tank, and MRL) losses and their respective costs, we can plot the monthly equipment cost losses over time.
Show the code
# Create the equipment plot
<-
equipment_plot ggplot(
equipment_long,aes(
x = month_year,
y = cost,
color = equipment,
group = equipment,
text = paste0(
"equipment: ",
equipment,"<br>",
"cost: ",
::number(cost, big.mark = ",", accuracy = 1),
scales"<br>",
"date: ",
month_year
)
)+
) geom_line() +
scale_color_brewer(palette = "Set1") +
labs(x = "date",
y = "cost in USD millions") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Create the title and subtitle
<- "<b>Russia Equipment Losses</b>"
plot_title <- "<sup>Monthly cost of equipment losses over time</sup>"
plot_subtitle
# Add title and interactivity
ggplotly(equipment_plot, tooltip = "text") %>%
layout(
margin = list(b = 100,
t = 100),
legend = list(
x = 0.2,
y = -0.3,
orientation = "h"
),title = list(text = paste0(plot_title,
"<br>",
plot_subtitle))%>%
) card(full_screen = TRUE)
To conclude, let’s plot the cumulative and monthly total cost of the main equipment (aircraft, helicopter, tank and MRL) losses over time.
Show the code
# Get the total cost of equipment losses
<- equipment_monthly %>%
equipment_monthly arrange(month_year_date) %>%
mutate(total = aircraft + helicopter + tank + MRL) %>%
mutate(cumtotal = cumsum(total))
# Create the total cost plot
<-
total_cost_plot ggplot(equipment_monthly, aes(x = month_year_date)) +
geom_bar(aes(
y = total,
text = paste0(
"cost: ",
::number(total, big.mark = ",", accuracy = 1),
scales"<br>",
"date: ",
month_year
)
),stat = "identity",
fill = "skyblue") +
geom_line(aes(y = cumtotal,
text = round(cumtotal))) +
labs(x = "date",
y = "cost in USD millions") +
theme_minimal()
# Create the title and subtitle
<- "<b>Russia Equipment Total Losses</b>"
plot_title <- "<sup>Monthly total and cumulative cost of equipment losses over time</sup>"
plot_subtitle
# Add interactivity and title
ggplotly(total_cost_plot, tooltip = "text") %>%
layout(
margin = list(t = 100),
title = list(text = paste0(plot_title,
"<br>",
plot_subtitle))%>%
) card(full_screen = TRUE)
The plots above demonstrate the significant impact of only 4 military equipment monthly cost of equipment losses over the last 2 years of conflict. Furthermore, the cumulative cost reaches an impressive amount of more than 61 billion USD.
Losses Relationship
We will fit a linear model to the data to understand the relationship between personnel and equipment losses. Consequently, this will allow us to determine if there is a significant relationship between the two losses and to what extent the equipment losses can explain the personnel losses.
Show the code
# Filter columns that have less than 80%
<- equipment[, colMeans(is.na(equipment)) < 0.8]
equipment
# Aggregate data
<- equipment %>%
all_data left_join(personnel, by = "date") %>%
select(-c(1,2,13,16,18,19))
# Fit linear model
<- lm(personnel ~ ., data = all_data)
model
# Show summary
summary(model)
Call:
lm(formula = personnel ~ ., data = all_data)
Residuals:
Min 1Q Median 3Q Max
-14306.7 -3419.1 -491.8 3036.8 13393.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 56990.636 11797.946 4.831 1.71e-06 ***
aircraft 386.151 90.337 4.275 2.21e-05 ***
helicopter -1159.190 81.359 -14.248 < 2e-16 ***
tank 86.975 8.704 9.992 < 2e-16 ***
APC -8.061 5.260 -1.533 0.12588
`field artillery` -9.851 3.497 -2.817 0.00501 **
MRL 515.522 29.299 17.595 < 2e-16 ***
drone -11.006 4.283 -2.569 0.01042 *
`naval ship` 97.860 471.118 0.208 0.83552
`anti-aircraft warfare` -250.008 35.635 -7.016 5.96e-12 ***
`special equipment` 160.307 23.460 6.833 1.97e-11 ***
`vehicles and fuel tanks` -42.171 4.056 -10.398 < 2e-16 ***
`cruise missiles` 127.362 3.959 32.170 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5050 on 625 degrees of freedom
(65 observations deleted due to missingness)
Multiple R-squared: 0.9979, Adjusted R-squared: 0.9978
F-statistic: 2.42e+04 on 12 and 625 DF, p-value: < 2.2e-16
The linear model summary shows the significant relationship between personnel and equipment losses. The adjusted R-squared value of 0.99 indicates that equipment losses can explain 99% of the variation in personnel losses. This is a very strong relationship.
Conclusion
The Russo-Ukrainian war has resulted in significant losses for the Russian military. The data analysis has provided us with a better understanding of the number of personnel and equipment losses throughout the conflict and their relationship.
Let’s hope for a peaceful resolution to the conflict and end the suffering of the people involved.
To financially support Ukraine and get more info on the conflict, check StandWithUkraine.