Introduction

Benford’s Law states that in many naturally occurring datasets, the leading digit d (1–9) follows: \[ P(d) = \log_{10}(1 + \frac{1}{d}) \]

This pattern helps detect anomalies or fraud in data such as accounting records, tax data, or election results.

The Formula (in LaTeX)

Mathematically, the probability that a number has the first digit \(d\) is:

\[ P(d) = \log_{10}\left(1 + \frac{1}{d}\right) \]

for \(d = 1, 2, 3, \ldots, 9\).

Generating Benford’s Law Data

Generates and displays the theoretical probability for each first digit (1–9) according to Benford’s Law. This “benford” object will be used throughout the presentation to generate graphs.

library(ggplot2)
library(plotly)
library(dplyr)
benford <- data.frame(
  digit = 1:9,
  probability = log10(1 + 1 / (1:9))
)
benford
##   digit probability
## 1     1  0.30103000
## 2     2  0.17609126
## 3     3  0.12493874
## 4     4  0.09691001
## 5     5  0.07918125
## 6     6  0.06694679
## 7     7  0.05799195
## 8     8  0.05115252
## 9     9  0.04575749

Simulated Dataset Example

Displays the simulated counts and proportions of digits (1–9) from 1,000 random draws using Benford probabilities.

##   digit count  prop
## 1     1   312 0.312
## 2     2   183 0.183
## 3     3   130 0.130
## 4     4    82 0.082
## 5     5    77 0.077
## 6     6    64 0.064
## 7     7    46 0.046
## 8     8    67 0.067
## 9     9    39 0.039

Plotly 3D — Digit vs Probability vs Random Variation

An interactive 3D plot comparing digits, their theoretical probabilities, and simulated random variation. This helps visualize how real-world data might slightly deviate from the perfect Benford curve.

ggplot 1: Observed vs Expected Distribution

A 2D visualization comparing observed frequencies (from simulated data) with expected probabilities (from Benford’s Law).

ggplot 2: Cumulative Distribution

Shows cumulative probabilities for both observed and expected distributions. This highlights whether the simulated data follows the same cumulative pattern as Benford’s prediction — useful for detecting subtle deviations or anomalies.