Russo-Ukrainian War Analysis

Created by apps.hodatascience.com.br

Published

February 8, 2024

The War and The Data

fig source: tinyurl.com/ycy27djz

One of the most significant geopolitical crises of the twenty-first century is the war between Russia and Ukraine, which is characterized by a complex interplay of historical, cultural, and strategic variables. In this project, we will use data analysis to learn more about the state of the conflict.

The data 2022 Russia Ukraine War was obtained from Kaggle and it describes the equipment losses, death toll, military wounded, and prisoners of war from Russians during the lasts years.

Personnel Losses

The personnel losses is the darkest facet of war, affecting the lives of thousands young men and their relatives.

Now, we will evaluate the losses of Russian personnel utilizing the dataset russia_losses_personnel.csv, which contains 703 rows and 5 columns and encompasses information on personnel casualties throughout the conflict. Let’s start the data analysis by loading the dataset and checking its header.

Show the code
# Load packages
library(bslib)
library(tidyverse)
library(kableExtra)
library(plotly)

# Load data
personnel <- read_csv("russia_losses_personnel.csv")

# Show header
kable(head(personnel))
date day personnel personnel* POW
2022-02-25 2 2800 about 0
2022-02-26 3 4300 about 0
2022-02-27 4 4500 about 0
2022-02-28 5 5300 about 0
2022-03-01 6 5710 about 200
2022-03-02 7 5840 about 200

Let’s summarize the data on a monthly basis to facilitate analysis and visualization of personnel losses over time.

Show the code
# Calculate daily personnel losses
personnel_daily <- personnel %>%
  mutate(personnel = personnel - lag(personnel, default = 0))

# Aggregate daily data by month and year
personnel_monthly <- personnel_daily %>%
  mutate(month_year = paste(year = year(date), month(date), sep = "-")) %>%
  group_by(month_year) %>%
  summarise(personnel = sum(personnel)) %>%
  mutate(month_year_date = ym(month_year))

We will create a data visualization to check the monthly personnel losses during the war. The graph includes the losses and a trend line to help us understand the overall trend of the losses over time. The data available is from February 2022 to January 2024.

Show the code
# Create the personnel losses plot
personnel_plot <- personnel_monthly %>%
  ggplot(aes(
    x = month_year_date,
    y = personnel,
    text = paste0(
      "losses: ",
      scales::number(personnel, big.mark = ","),
      "<br>",
      "date: ",
      month_year
    )
  )) +
  geom_bar(stat = "identity", fill = "skyblue") +
  geom_smooth(
    aes(group = 1,
        text = paste0(
          "trend: ", scales::number(..y.., big.mark = ",", accuracy = 1)
        )),
    method = "auto",
    se = FALSE,
    color = "black"
  ) +
  labs(x = "date",
       y = "losses") +
  theme_minimal()

# Create the title and subtitle
plot_title <- "<b>Russia Personnel Losses</b>"
plot_subtitle <- "<sup>Monthly data and trend of losses over time</sup>"

# Add title and interactivity
ggplotly(personnel_plot, tooltip = "text") %>%
  layout(margin = list(t = 100),
         title = list(text = paste0(plot_title,
                                    "<br>",
                                    plot_subtitle))) %>%
  card(full_screen = TRUE)

The graph shows that the personnel losses and the trend line have increased over time; this indicates that the losses are still significant and the war is far from over.

Equipment Losses

Equipment losses are also a crucial aspect of the war. For example, losing a dozen aircraft can significantly impact a country’s military capabilities.

Now, let’s analyze the equipment losses of the Russian military. We will use the dataset named russia_losses_equipment.csv. The data contains 703 rows, 19 columns and information on the equipment losses throughout the conflict. We will start by loading the dataset and checking out its header.

Show the code
# Load data
equipment <- read_csv("russia_losses_equipment.csv")

# Show header
kable(head(equipment))
date day aircraft helicopter tank APC field artillery MRL military auto fuel tank drone naval ship anti-aircraft warfare special equipment mobile SRBM system greatest losses direction vehicles and fuel tanks cruise missiles submarines
2022-02-25 2 10 7 80 516 49 4 100 60 0 2 0 NA NA NA NA NA NA
2022-02-26 3 27 26 146 706 49 4 130 60 2 2 0 NA NA NA NA NA NA
2022-02-27 4 27 26 150 706 50 4 130 60 2 2 0 NA NA NA NA NA NA
2022-02-28 5 29 29 150 816 74 21 291 60 3 2 5 NA NA NA NA NA NA
2022-03-01 6 29 29 198 846 77 24 305 60 3 2 7 NA NA NA NA NA NA
2022-03-02 7 30 31 211 862 85 40 355 60 3 2 9 NA NA NA NA NA NA

Let’s summarizes the data in monthly basis to facilitate analysis and visualization of personnel losses over time.

Show the code
# Calculate daily personnel losses
equipment_daily <- equipment %>%
  select(c("date", "aircraft", "helicopter", "tank", "MRL")) %>%
  mutate(across(-date, ~. - lag(., default = 0)))

# Aggregate daily data by month and year
equipment_monthly <- equipment_daily %>%
  mutate(month_year = paste(year = year(date), month(date), sep = "-")) %>%
  group_by(month_year) %>%
  summarise(
    aircraft = sum(aircraft),
    helicopter = sum(helicopter),
    tank = sum(tank),
    MRL = sum(MRL)
  ) %>%
  mutate(month_year_date = ym(month_year))

Before creating the plot, we will create a table to show the unit cost of the main Russian equipment.

Show the code
# Create cost table
equipment_cost <-
  data.frame(
    "equipment" = c("aircraft",
                    "helicopter",
                    "tank",
                    "MRL"),
    "cost in USD millions" =  c(50,
                                  10,
                                  6,
                                  4.6),
    "unit" = c(
      "[Sukhoi Su-34](https://en.wikipedia.org/wiki/Sukhoi_Su-34)",
      "[Kamov Ka-50](https://en.wikipedia.org/wiki/Kamov_Ka-50)",
      "[T-14 Armata](https://en.wikipedia.org/wiki/T-14_Armata)",
      "[M270 Multiple Launch Rocket System](https://en.wikipedia.org/wiki/M270_Multiple_Launch_Rocket_System)"
    ),
    check.names = FALSE
  )

# Display table
kable(equipment_cost)
equipment cost in USD millions unit
aircraft 50.0 Sukhoi Su-34
helicopter 10.0 Kamov Ka-50
tank 6.0 T-14 Armata
MRL 4.6 M270 Multiple Launch Rocket System

Let’s apply the unit cost to the equipment monthly losses to calculate the total cost.

Show the code
# Multiply the equipment monthly losses by the unit cost
equipment_monthly <- equipment_monthly %>%
  mutate(
    aircraft = aircraft * equipment_cost$`cost in USD millions`[equipment_cost$equipment == "aircraft"],
    helicopter = helicopter * equipment_cost$`cost in USD millions`[equipment_cost$equipment == "helicopter"],
    tank = tank * equipment_cost$`cost in USD millions`[equipment_cost$equipment == "tank"],
    MRL = MRL * equipment_cost$`cost in USD millions`[equipment_cost$equipment == "MRL"]
  )

# Pivot the monthly data
equipment_long <- pivot_longer(
  equipment_monthly,
  cols = -c(month_year, month_year_date),
  names_to = "equipment",
  values_to = "cost"
)

Using the main equipment (aircraft, helicopter, tank, and MRL) losses and their respective costs, we can plot the monthly equipment cost losses over time.

Show the code
# Create the equipment plot
equipment_plot <-
  ggplot(
    equipment_long,
    aes(
      x = month_year,
      y = cost,
      color = equipment,
      group = equipment,
      text = paste0(
        "equipment: ",
        equipment,
        "<br>",
        "cost: ",
        scales::number(cost, big.mark = ",", accuracy = 1),
        "<br>",
        "date: ",
        month_year
      )
    )
  ) +
  geom_line() +
  scale_color_brewer(palette = "Set1") +
  labs(x = "date",
       y = "cost in USD millions") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Create the title and subtitle
plot_title <- "<b>Russia Equipment Losses</b>"
plot_subtitle <- "<sup>Monthly cost of equipment losses over time</sup>"

# Add title and interactivity
ggplotly(equipment_plot, tooltip = "text") %>%
  layout(
    margin = list(b = 100,
                  t = 100),
    legend = list(
      x = 0.2,
      y = -0.3,
      orientation = "h"
    ),
    title = list(text = paste0(plot_title,
                               "<br>",
                               plot_subtitle))
  ) %>%
  card(full_screen = TRUE)

To conclude, let’s plot the cumulative and monthly total cost of the main equipment (aircraft, helicopter, tank and MRL) losses over time.

Show the code
# Get the total cost of equipment losses
equipment_monthly <- equipment_monthly %>%
  arrange(month_year_date) %>%
  mutate(total = aircraft + helicopter + tank + MRL) %>%
  mutate(cumtotal = cumsum(total))

# Create the total cost plot
total_cost_plot <-
  ggplot(equipment_monthly, aes(x = month_year_date)) +
  geom_bar(aes(
    y = total,
    text = paste0(
      "cost: ",
      scales::number(total, big.mark = ",", accuracy = 1),
      "<br>",
      "date: ",
      month_year
    )
  ),
  stat = "identity",
  fill = "skyblue") +
  geom_line(aes(y = cumtotal,
                text = round(cumtotal))) +
  labs(x = "date",
       y = "cost in USD millions") +
  theme_minimal()

# Create the title and subtitle
plot_title <- "<b>Russia Equipment Total Losses</b>"
plot_subtitle <- "<sup>Monthly total and cumulative cost of equipment losses over time</sup>"

# Add interactivity and title
ggplotly(total_cost_plot, tooltip = "text") %>%
  layout(
    margin = list(t = 100),
    title = list(text = paste0(plot_title,
                               "<br>",
                               plot_subtitle))
  ) %>%
  card(full_screen = TRUE)

The plots above demonstrate the significant impact of only 4 military equipment monthly cost of equipment losses over the last 2 years of conflict. Furthermore, the cumulative cost reaches an impressive amount of more than 61 billion USD.

Losses Relationship

We will fit a linear model to the data to understand the relationship between personnel and equipment losses. Consequently, this will allow us to determine if there is a significant relationship between the two losses and to what extent the equipment losses can explain the personnel losses.

Show the code
# Filter columns that have less than 80%
equipment <- equipment[, colMeans(is.na(equipment)) < 0.8]

# Aggregate data
all_data <- equipment %>%
  left_join(personnel, by = "date") %>%
  select(-c(1,2,13,16,18,19))

# Fit linear model
model <- lm(personnel ~ ., data = all_data)

# Show summary
summary(model)

Call:
lm(formula = personnel ~ ., data = all_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-14306.7  -3419.1   -491.8   3036.8  13393.9 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)               56990.636  11797.946   4.831 1.71e-06 ***
aircraft                    386.151     90.337   4.275 2.21e-05 ***
helicopter                -1159.190     81.359 -14.248  < 2e-16 ***
tank                         86.975      8.704   9.992  < 2e-16 ***
APC                          -8.061      5.260  -1.533  0.12588    
`field artillery`            -9.851      3.497  -2.817  0.00501 ** 
MRL                         515.522     29.299  17.595  < 2e-16 ***
drone                       -11.006      4.283  -2.569  0.01042 *  
`naval ship`                 97.860    471.118   0.208  0.83552    
`anti-aircraft warfare`    -250.008     35.635  -7.016 5.96e-12 ***
`special equipment`         160.307     23.460   6.833 1.97e-11 ***
`vehicles and fuel tanks`   -42.171      4.056 -10.398  < 2e-16 ***
`cruise missiles`           127.362      3.959  32.170  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5050 on 625 degrees of freedom
  (65 observations deleted due to missingness)
Multiple R-squared:  0.9979,    Adjusted R-squared:  0.9978 
F-statistic: 2.42e+04 on 12 and 625 DF,  p-value: < 2.2e-16

The linear model summary shows the significant relationship between personnel and equipment losses. The adjusted R-squared value of 0.99 indicates that equipment losses can explain 99% of the variation in personnel losses. This is a very strong relationship.

Conclusion

The Russo-Ukrainian war has resulted in significant losses for the Russian military. The data analysis has provided us with a better understanding of the number of personnel and equipment losses throughout the conflict and their relationship.

Let’s hope for a peaceful resolution to the conflict and end the suffering of the people involved.

To financially support Ukraine and get more info on the conflict, check StandWithUkraine.