Factor Analysis of Information risk, FAIR

Overview

FAIR is a methodology for analyzing cybersecurity risk. Here, we will refere to risk as the total dollar amount of expected loss for a given timeframe. In a general sense, FAIR methodology works by breaking risk into its individual components. These components can then be measured or estimated numerically, allowing for a quantitative calculation of risk as a whole.

The purpose of factor analysis of Information risk is to enable the effective and efficient management of loss event probability and the probable loss associated with such events. Information risk occurs at the intersection of two probabilities, the probability that an action will occur that has the potential of inflict harm on an asset, and the probable loss associated with the harmful event. Threat agents are specialized types of objects that have the ability to inflict harm upon objects (e.g. humans, animals, environmental elements, such as wind and temperature, human-made objects).

We will be analysing different factors influencing risk. FAIR methodology decomposes the computation of risk at various levels, starting from the first level: Loss Event Frequency and Probable Loss Magnitude, going on examining the asset, the threat agent capability compared to the vulnerability and the security control strength, the probability that the agent get in contact and actually act against the asset, the organization capability to react to the event and the impact on stakeholders.

The actual calculation for risk often takes the form of a Monte Carlo method. This Monte Carlo method supplies random inputs for our model. The model then transforms the inputs in accordance with FAIR calculation rules, and provides outputs. The outputs can then be analyzed to determine what the potential range of risk values are.

Monte Carlo experiments within the context of FAIR are used to estimate loss by performing calculations on random inputs. We generate random inputs in accordance with a particular distribution, and we then run these inputs through complex or arbitrary formula we cannot analyze otherwise. The output can then be used to infer what an expected output population looks like. We will be using the pyfair python package in order to build the FAIR models.

Nodes

Risk in FAIR is broken down into a series of what pyfair calls nodes for calculation. The user supplies two or more of these nodes to generate random data, which in turn will allow us to calculate the empirical distribution of the risk and other nodes. The nodes are as follows:

The general rule with pyfair is that to properly calculate any node, the node’s child need to either be calculable or supplied. The nodes can be described as follows:

Contact Frequency (C): A vector with elements representing the number of threat actor contacts that could potentially yield a threat within a given timeframe. All elements must be a positive number. this must be supplied, not calculated.

The frequency at which a threat agent encounters an object is derived from two differents types of contact: random and targeted. Contact frecuency is determined by a combination of threat, object, and environmental characteristics.

Probability of Action (A): A vector with elements representing the probability that a threat actor will proceed after coming into contact with an organization. This must be supplied, not calculated. Example: The probability that a contact results in action being taken against a resource.

Once contact takes place, the desicion of whether or not to act agains an object depends of how valuable the target is in the eyes of the threat agent, how vulnerable il appears to be, and how likely it is that the theat agent would be detected.

Threat Capability (TC): A vector of unitless elements that describe the relative level of expertise and resources of a threat actor (relative to a Control Strength). All elements must be number between 0 to 1. This must be supplied, not calculated.

Factors that drive threat capability include a combination of the knowladge, experience and resources of the threat agent. FAIR provides a set of scales that a human analyst can use to define the capability distribution of threat communities. For example:

0.02 - Very low (e.g. absolute novices)
0.16 - Low (e.g. must use simple tools and follow a cook book)
0.5 - Average (e.g. able to use common tools and techniques)
0.84 - Above the average (e.g. able to use advanced tools and techniques)
0.98 - True Experts (e.g. able to create new exploits and techniques)

Control Strength (CS): A vector of unitless elements that describe the relative strength of a given control (relative to the Threat Capability of a given actor). All elements must be number between 0 to 1. This must be supplied, not calculated.

Consider password strength as a simple example of how we can approach this. We can estimate that a password eight characters long, comprised of a mixture of upper and lowercase letters, numbers, and special characters, will resist the cracking attempts of some percentage of the general threat agent.

Threat Event Frequency (TEF): A vector of elements representing the number of times a particular threat occurs, whether or not it results in a loss. All elements must be positive. Supplied directly, or multiply the Contact Frequency vector and the Probability of Action vector.

\[TEF = C \times A \] 6. Vulnerability (V): A vector of elements with each value representing the probability that a potential threat actually results in a loss. Supplied directly, or via the following operation:

\[\begin{equation} \bar{V} \mbox{where} V_i = \left\{ \begin{array}{ll} 1 & \mbox{si $TC_i \geq CS_i$},\\ 0 & \mbox{si $TC_i < CS_i$}. \end{array} \right. \end{equation}\]

For each element of the vector, we determine if Threat Capability is greater than Control Strength. In other words, 1 is where the threat overwhelms the control, and 0 is where the control withstands the threat.

We then analyze this intermediate array of ones and zeros, and obtain its average. The represents the percent of times in our simulations that the threat overcame the control. This scalar is then assigned to a vector for the sake of computational consistency.

Loss Event Frequency or Exposure (LEF): A vector of elements which represent the number of times a particular loss occurs during a given time frame (generally one year). All elements must be positive. It is supplied directly, or multiply the Threat Event Frequency (TEF) vector by the Vulnerability vector (V).

\[LEF = TEF \times V\]

Secondary Loss Event Frequency (SLEF): A matrix of probabilities with each row representing a single simulation, and each column represents the probability that a particular secondary loss type will occur. All matrix elements must be between 0 and 1. This must be supplied, not calculated.

In other words, the SLEF is the percent of time you can expect to see the secondary loss materialize

What percent of the time do you anticipate a cost caused by a secondary stakeholder?

FAIR defines secondary loss event frequency but measures it as a percentage.

Secondary Loss Event Magnitude (SLEM) : A matrix of currency amounts with each row representing a single simulation, and each column represents the amount of loss for a particular loss type. All matrix elements must be positive. This must be supplied, not calculated.

What losses, in terms of dollars and cents, could you experience from a secondary stakeholder?

Secondary Loss (SL): A vector of currency losses attributable to secondary factors such as:

Reputational (Public relations costs)
Legal (Legal defense costs)
Civil, criminal, or contractual fines and judgements
Notification costs
Credit monitoring
Covering secondary stakeholder monetary loss
The effects of regulatory sanctions
Lost market share
Diminished stock price
Increased cost of capital

Supplied directly, or via the following operation:

\[\begin{equation} \textbf{SL} = \textbf{SLEF} \circ \textbf{SLEM} \end{equation}\]

Let’s write $\textbf{SL}$ as

\[ \textbf{SL} = \begin{pmatrix} & SL_{1,1} & \cdots & SL_{1,n} & \\ & \vdots & \ddots & \vdots & \\ & SL_{m,1} & \cdots & SL_{m,n} & \\ \end{pmatrix} \]

So that, the vector of seconday losses is

\[ SL = \begin{pmatrix} & \sum_{j=1}^n SL_{1,j} & \\ & \vdots & \\ & \sum_{j=1}^n SL_{m,j} & \\ \end{pmatrix} \]

Primary Loss (PL): A vector of currency losses directly attributable to the threat. All elements must be positive. This must be supplied, not calculated.
Loss Magnitude (LM): A vector of currency values describing the total loss for a single Loss Event. All elements must be positive. Supplied directly, or the sum of the Primary Loss vector (PL) and Secondary Loss vector (SL).

\[LM = PL + SL\]

If supplied dierectly, the BetaPert would be appropriated to simulate this quantity.

Risk: a vector of currency values/elements, which represent the ultimate loss for a given time period. All elements must be positive. It is the result of multiply the Loss Event Frequency (LEF) vector by the Loss Magnitude vector (LM).

\[ R = LEF \times LM\]

In order to use the dashboard presented here, It is needed to supply the targets and the distribution types. These distributions are:

PERT distribution:

In probability and statistics, the PERT distribution is a family of continuous probability distributions defined by the minimum (low), most likely (mode) and maximum (high) values that a variable can take. It is a transformation of the four-parameter Beta distribution.

BetaPert: low, mode, and high (and optionally gamma)

Restrictions: - High parameter must be equal to or greater than Mode parameter - Mode parameter must be equal to or greater than Low parameter

Normal distribution:

Normal: mean, stdev
Constant: constant

Remark also that the following nodes must have values from 0 to 1:

TC: Threat Capability
CS: Control Strength
A: Probability of Action
V: Vulnerability

Simulated cases

Step 1: Generate random values to supply
Step 2: Perform the mathematical operations followinf the FAIR rules to calculate Risk
Step 3: Analyze the risk outputs

Factor Analysis of Information risk, FAIR

Danae Martinez

Overview

Nodes

Simulated cases

References