Introduction

This is an analysis of state-sponsored cyber operations.
The data presented here was taken from the Cyber Operations Tracker of the Council of Foreign Relations, a database of publicly known state-sponsored incidents since 2005.

The analysis was done using R, as a part of an excellent MOOC by the Knight Center I participated in.

This is the 2nd version of the analysis, updated with some tips from other participans in the MOOC (Thanks!)

Libraries

As always, a bunch of libraries were used:

library(lubridate)
library(ggplot2)
library(stringr)
library(anytime)
library(ggthemes)
library(gghighlight)
library(dplyr)
library(ggplot2)
library(sf)
library(readr)
library(rnaturalearth)
library(leaflet)
library(tidyr)
library(DT)
library(plotly)

Data

CFR database of cyber operations

source: https://www.cfr.org/interactive/cyber-operations/export-incidents?_format=csv

Cyber_operations <- read.csv("https://www.cfr.org/interactive/cyber-operations/export-incidents?_format=csv", stringsAsFactors = F)

World map data

source: rnaturalearth library

world <- st_as_sf(countries110) %>% 
  filter(sovereignt!="Antarctica") %>% 
  filter(type != "Dependency")  %>% 
  filter(type != "Disputed") %>% 
  filter(type != "Indeterminate")

Tidying and creating different dataframes

I created several dataframes to play around with, and to make the plotting and mapping easier later on. I changed the names of some states (nothing personal) to make sure they will fit into the rnaturalearth dataframe later on. Also, I adjusted the total number of operations per state, in order to include dual-sponsored operations.
This is the only real chunk of code in this document, so take a minute to enjoy it.

#adding the number of attacks sponsored and suffered, to be plotted and mapped later
world <- world %>%
  mutate(attacked=ifelse(str_count(Spon_List, admin)>0,str_count(Spon_List, admin),NA)) %>% 
  mutate(was.attacked=ifelse(str_count(Vic_List, admin)>0,str_count(Vic_List, admin),NA))

#creating a lean dataframe to be used as datatable 
Lean_database <- Cyber_operations %>% 
  filter(Sponsor!="") %>% 
  filter(Date!="") %>% 
  select(Sponsor, Victims, Type, Title, Description, Date) %>% 
  arrange(Sponsor,Date)

#dataframe of attacks with unknown sposnors
UnknownSponsor<- Cyber_operations %>% 
  filter(Sponsor=="") %>% 
  select(Victims, Type, Description, Title, Date)

#dataframe of single-sponsored attacks
SingleSponsor <- Cyber_operations %>% 
  select(Sponsor, Type, Date) %>%
  filter (!Sponsor %in% c(grep(",", Cyber_operations$Sponsor, value=TRUE)), 
          Sponsor!="",
          Type!="") %>% 
  mutate(Year = year(anydate(Date))) %>% 
  mutate(Sponsor=case_when(
    Sponsor=="Iran (Islamic Republic of)" ~ "Iran",
    Sponsor=="Korea (Democratic People's Republic of)" ~ "North Korea",
    Sponsor=="Korea (Republic of)" ~ "South Korea",
    Sponsor=="Russian Federation" ~ "Russia",
    Sponsor=="United States" ~ "United States of America",
    TRUE ~ Sponsor)) %>% 
  select(-Date) %>% 
  arrange(Sponsor, Type, Year)

#attacks by state per year
Tidy_by_year <- SingleSponsor %>% 
  group_by(Sponsor, Year) %>% 
  summarize(Cases=n())

#attacks with multiple known sponsors
MultipleSponsors <- Cyber_operations %>% 
  select(Sponsor, Type, Date) %>%
  filter(Sponsor %in% c(grep(",", Cyber_operations$Sponsor, value=TRUE)))

Cyber operations with known sponsors

As a beginning, I filtered the database down to only operations with known sponsors and dates (which were the vast majority of operations - 233 out of 262). Then, an interactive datatable was produced using the DT library.

Plotting out cyber Operations

Operations per year

This chart represents cyber operations by state per year.
The dominance of China and Russia is clearly visible.

States involvement overtime

This chart allows to clearly the evolution of involvement for each state overtime.
It is visible that China is the most consistent actor, while Russia heavily increased its involvement since 2013.

Global cyber Operations, 2005-2018

Another possible way of illustarting the data is by using a map.
In this case, an interactive map was utilized, using rnaturalearth geometry and the leaflet library.
As expected, China, Russia - and to a lesser extent, Iran - immediately pop out.

Targets of cyber operations

The data reveals not only the main sponsors of cyber operations, but also the main targets.
The United States is revealed to be the biggest taget of state-sponsored cyber operations.

Different types of cyber operations

Global distribution

It is important to emphasize that cyber operations can mean different things. The CFR database lists 6 types of attack.
Out of the six, espionage is by far the most common goal.

Different states, different actions

However, it’s important to remember that different states pursue different cyber strategies.
Examning the four leading actors in the database, we can clearly see that while all actors are heavily invested in espionage, some actors pursue more diverse cyber goals, mainly using sabotage and DDoS.

Multiple-sponsors

The database allows us to examine collaborations in the cyber realm.
It is evident that the US is a clear leader in cyber cooperation with allies: namely Israel, the UK, and Taiwan. Interestingly, 2018 is the first year which features a joint Chinese-Russian cyber operation.

Known Unknowns

The database also contains many cyber operations with unknown sponsors. It’s possible that a decent analysis of the victims and the types, coupled with a comparison to known methods of actors, will shed some statistical light at the instigators.
Perhaps in a future project.