Anomaly Detection in Cybersecurity

2026-06-08

What is Anomaly Detection?

Anomaly detection is the process of identifying data points that deviate significantly from expected behavior.

In Cybersecurity, anomalies can represent:

Unusual login attempts
Sudden spikes in network traffic
Unauthorized data access
Potential intrusion events

The main purpose is to catch and prevent threats before they cause damage

Why Statistics?

Statistical methods gives a systematical and repeatable way to differentiate between “normal” and “abnormal” flag outliers.

Key reasons statistics can be useful in cybersecurity:

works without labels to identify data that was attacked
Adapts to changing baselines over time
Fast, scalable, and interpretable
Widely used in Security Operations Centers

The Z-Score Method

The Z-score measures how far a data point is from the mean in units of standard deviation.

\[z = \frac{x - \mu}{\sigma}\]

Where:

\(x\) = the observed value (e.g. number of packets)
\(\mu\) = the mean of all values
\(\sigma\) = the standard deviation
A common threshold: |z| > 2 (flags a potential anomaly)

Visualizing the Data

Z-Score Distribution

Interactive Network Traffic Explorer

R Code: Data Preparation

df <- read.csv("Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv")

df <- df %>%
  rename(logins = 'Flow.Packets.s', label = 'Label') %>%
  select(logins, label) %>%
  filter(is.finite(logins)) %>%
  mutate(
    logins  = abs(logins),
    time    = row_number(),
    zscore  = (logins - mean(logins)) / sd(logins),
    anomaly = ifelse(abs(zscore) > 2, "Anomaly", "Normal")
  ) %>%
  slice_sample(n = 500)

Statistical Basis of Anomaly Detection

Under a normal distribution, we can quantify the probability of extreme values using the empirical rule:

\[P(\mu - 2\sigma \leq X \leq \mu + 2\sigma) \approx 0.9545\]

This means approximately 95.45% of normal traffic falls within 2 standard deviations. Points outside this range satisfy:

\[P(|Z| > 2) \approx 0.0455\]

So only about 4.55% of observations are expected to be anomalies by chance, making Z-score a reliable detection threshold.

Conclusion

Z-score analysis is a simple but powerful method for detecting network anomalies without requiring labeled data.
Applied to real network traffic data, Z-scores successfully identified unusual packet behavior consistent with DDoS attack patterns.
Points beyond \(|Z| > 2\) accounted for approximately 4.55% of observations, aligning with statistical expectations.
Statistical anomaly detection offers a fast, scalable, and interpretable first line of defense in Cybersecurity.

Z-score methods are most effective as part of a broader security pipeline, combined with machine learning and human review.