This is an analysis of state-sponsored cyber operations.
The data presented here was taken from the Cyber Operations Tracker of the Council of Foreign Relations, a database of publicly known state-sponsored incidents since 2005.
The analysis was done using R, as a part of an excellent MOOC by the Knight Center I participated in.
As always, a bunch of libraries were used:
library(dplyr)
library(lubridate)
library(ggplot2)
library(anytime)
library(ggthemes)
library(gghighlight)
library(dplyr)
library(ggplot2)
library(sf)
library(readr)
library(rnaturalearth)
library(leaflet)
library(tidyr)
library(DT)
source: https://www.cfr.org/interactive/cyber-operations/export-incidents?_format=csv
Cyber_operations <- read.csv("https://www.cfr.org/interactive/cyber-operations/export-incidents?_format=csv", stringsAsFactors = F)
source: rnaturalearth library
world <- st_as_sf(countries110) %>%
filter(sovereignt!="Antarctica") %>%
filter(type != "Dependency") %>%
filter(type != "Disputed") %>%
filter(type != "Indeterminate")
I created several dataframes to play around with, and to make the plotting and mapping easier later on. I changed the names of some states (nothing personal) to make sure they will fit into the rnaturalearth dataframe later on. Also, I adjusted the total number of operations per state, in order to include dual-sponsored operations.
This is the only real chunk of code in this document, so take a minute to enjoy it.
Lean_database <- Cyber_operations %>%
filter(Sponsor!="") %>%
filter(Date!="") %>%
select(Sponsor, Victims, Type, Title, Description, Date) %>%
arrange(Sponsor,Date)
UnknownSponsor<- Cyber_operations %>%
filter(Sponsor=="") %>%
select(Victims, Type, Description, Title, Date)
SingleSponsor <- Cyber_operations %>%
select(Sponsor, Type, Date) %>%
filter (!Sponsor %in% c(grep(",", Cyber_operations$Sponsor, value=TRUE)),
Sponsor!="",
Type!="") %>%
mutate(Year = year(anydate(Date))) %>%
mutate(Sponsor=case_when(
Sponsor=="Iran (Islamic Republic of)" ~ "Iran",
Sponsor=="Korea (Democratic People's Republic of)" ~ "North Korea",
Sponsor=="Korea (Republic of)" ~ "South Korea",
Sponsor=="Russian Federation" ~ "Russia",
Sponsor=="United States" ~ "United States of America",
TRUE ~ Sponsor)) %>%
select(-Date) %>%
arrange(Sponsor, Type, Year)
Tidy_by_year <- SingleSponsor %>%
group_by(Sponsor, Year) %>%
summarize(Cases=n())
Tidy_total <- SingleSponsor %>%
group_by(Sponsor) %>%
summarize(Cases=n()) %>%
mutate(Cases= case_when(
Sponsor == "United States of America" ~ Cases+5,
Sponsor == "Israel" ~ Cases + 3,
Sponsor == "China" ~ Cases+1,
Sponsor == "Russia" ~ Cases+1,
Sponsor == "United Kingdom" ~ Cases+1,
Sponsor == "Taiwan" ~ Cases+1,
TRUE ~ Cases+0))
Geo_Cyber <- left_join(world, Tidy_total, by=c("admin"="Sponsor"))
As a beginning, I filtered the database down to only operations with known sponsors and dates (which were the vast majority of operations - 233 out of 262). Then, an interactive datatable was produced using the DT library.
datatable(Lean_database,
extensions = 'Buttons',
options = list(dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'print')))
This chart represents cyber operations by state per year.
The dominance of China and Russia is clearly visible.
This chart allows to clearly the evolution of involvement for each state overtime.
It is visible that China is the most consistent actor, while Russia heavily increased its involvement since 2013.
Another possible way of illustarting the data is by using a map.
In this case, an interactive map was utilized, using rnaturalearth geometry and the leaflet library.
As expected, China, Russia - and to a lesser extent, Iran - immediately pop out.
#pallette
pal <- colorNumeric("Greens", domain=Geo_Cyber$Cases)
#popup note
popup_spend <- paste0("<strong>", Geo_Cyber$admin,
"</strong><br /> Known Cyber Operations Since 2005: ", Geo_Cyber$Cases)
Geo_Cyber %>%
leaflet() %>%
addTiles() %>%
setView(0, 0, zoom = 2) %>%
addPolygons(data = Geo_Cyber,
fillColor = ~pal(Geo_Cyber$Cases),
fillOpacity = 0.5,
weight = 0.2,
smoothFactor = 0.2,
popup = ~popup_spend) %>%
addLegend(pal = pal,
values = Geo_Cyber$Cases,
na.label = "NA",
bins= 6,
position = "bottomright",
title = "Cyber Operations")
It is important to emphasize that cyber operations can mean different things. The CFR database lists 6 types of attack.
Out of the six, espionage is by far the most common goal.
However, it’s important to remember that different states pursue different cyber strategies.
Examning the four leading actors in the database, we can clearly see that while all actors are heavily invested in espionage, some actors pursue more diverse cyber goals, mainly using sabotage and DDoS.
The database allows us to examine collaborations in the cyber realm.
Using the grep function, it’s even pretty easy. Just search for that comma:
MultipleSponsors <- Cyber_operations %>%
select(Sponsor, Type, Date) %>%
filter(Sponsor %in% c(grep(",", Cyber_operations$Sponsor, value=TRUE)))
It is evident that the US is a clear leader in cyber cooperation with allies: namely Israel, the UK, and Taiwan. Interestingly, 2018 is the first year which features a joint Chinese-Russian cyber operation.
The database also contains many cyber operations with unknown sponsors. It’s possible that a decent analysis of the victims and the types, coupled with a comparison to known methods of actors, will shed some statistical light at the instigators.
Perhaps in a future project.