2026-06-08

What is Anomaly Detection?

  • Anomaly detection is the process of identifying data points that deviate significantly from expected behavior.

In Cybersecurity, anomalies can represent:

  • Unusual login attempts
  • Sudden spikes in network traffic
  • Unauthorized data access
  • Potential intrusion events

The main purpose is to catch and prevent threats before they cause damage

Why Statistics?

  • Statistical methods gives a systematical and repeatable way to differentiate between “normal” and “abnormal” flag outliers.

Key reasons statistics can be useful in cybersecurity:

  • works without labels to identify data that was attacked
  • Adapts to changing baselines over time
  • Fast, scalable, and interpretable
  • Widely used in Security Operations Centers

The Z-Score Method

The Z-score measures how far a data point is from the mean in units of standard deviation.

\[z = \frac{x - \mu}{\sigma}\]

Where:

  • \(x\) = the observed value (e.g. number of packets)
  • \(\mu\) = the mean of all values
  • \(\sigma\) = the standard deviation
  • A common threshold: |z| > 2 (flags a potential anomaly)

Visualizing the Data

Z-Score Distribution

Interactive Network Traffic Explorer

R Code: Data Preparation

df <- read.csv("Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv")

df <- df %>%
  rename(logins = 'Flow.Packets.s', label = 'Label') %>%
  select(logins, label) %>%
  filter(is.finite(logins)) %>%
  mutate(
    logins  = abs(logins),
    time    = row_number(),
    zscore  = (logins - mean(logins)) / sd(logins),
    anomaly = ifelse(abs(zscore) > 2, "Anomaly", "Normal")
  ) %>%
  slice_sample(n = 500)

Statistical Basis of Anomaly Detection

Under a normal distribution, we can quantify the probability of extreme values using the empirical rule:

\[P(\mu - 2\sigma \leq X \leq \mu + 2\sigma) \approx 0.9545\]

This means approximately 95.45% of normal traffic falls within 2 standard deviations. Points outside this range satisfy:

\[P(|Z| > 2) \approx 0.0455\]

So only about 4.55% of observations are expected to be anomalies by chance, making Z-score a reliable detection threshold.

Conclusion

  • Z-score analysis is a simple but powerful method for detecting network anomalies without requiring labeled data.

  • Applied to real network traffic data, Z-scores successfully identified unusual packet behavior consistent with DDoS attack patterns.

  • Points beyond \(|Z| > 2\) accounted for approximately 4.55% of observations, aligning with statistical expectations.

  • Statistical anomaly detection offers a fast, scalable, and interpretable first line of defense in Cybersecurity.

Z-score methods are most effective as part of a broader security pipeline, combined with machine learning and human review.