Variability
- Also called dispersion, variability refers to “how spread out” data are.
- Range is one measure of variability, but it doesn’t capture how data is dispersed within the range.
- Variance is better measure of variability.
- It measures how far the values are dispersed around the mean.
- Standard deviation is an even better measure.
- It's like variance but in the original scale of the variable, so one can relate it to real values.
Variance = \( s^2 \)
- Variance is the mean of the squared deviations from the mean.
- A deviation from the mean is: \( Y_i - \bar{Y} \)
- The simple sum of deviations from the mean is worthless:
- \( \sum\limits_{i=1}^N Y_i - \bar{Y} = 0 \)
- That’s why we square the deviations first.
- Add them up, and divide by 1 - n (or divide by 1/1-n)
- \( \frac{1}{n-1} \sum\limits_{i=1}^N (Y_i - \bar{Y})^2 \)
Standard deviation = s
- Standard deviation, then, is just the square root of the variance.
- \( \sqrt{ \frac{1}{n-1} \sum\limits_{i=1}^N (Y_i - \bar{Y})^2 } \)
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2
## 1st Qu.:12.0 1st Qu.: 26
## Median :15.0 Median : 36
## Mean :15.4 Mean : 43
## 3rd Qu.:19.0 3rd Qu.: 56
## Max. :25.0 Max. :120
You can also embed plots, for example:
plot(cars)