Randomness in collected data is impossible to remove, so instead we use standard deviation as a metric to track this variability
Similar metrics in Statistics include:
- Mean/ Average
- Range
- Interquartile Range
- Variance
2025-11-09
Randomness in collected data is impossible to remove, so instead we use standard deviation as a metric to track this variability
Similar metrics in Statistics include:
Standard deviation measures the average distance of data values from the mean.
\[ \sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2} \]
Where:
Standard Deviation provides great value by:
However, standard deviation can struggle since it:
\[ s = \sqrt{ \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2 } \]
Why use \(n - 1\)?
Below is a simulated 3D scatterplot illustrating spread across three features:
The graph below uses 100 variables, each observed 50 times to showcase non-uniform variability.
Different data sets also have different variations, as shown below.
set.seed(123)
## Simulate Data
n <- 1000
df <- data.frame(
x = rnorm(n),
y = rnorm(n),
z = rnorm(n)
)
## Plot Data
plot_ly(df, x = ~x, y = ~y, z = ~z,
type = "scatter3d",
mode = "markers",
marker = list(size = 3, color = ~z))