Mean
The sample mean of a numerical variable is computed as the sum of all of the observations divided by the number of observations:
The mean follows the tail
Median: the number in the middle
The median splits an ordered data set in half. If there are an even number of observations, the median is the average of the two middle values. If there are an odd number of observations, the median is the middle value.
[1] 0 0 0 0 0 0 1 1 1 1 1 2 2 3 3 3 4 4 5 5 5 6 6 [24] 7 7 7 9 9 9 10 10 10 11 11 12 14 14 16 17 22 25 25 25 26 26 27 [47] 29 42 43 64
1. Sort the series in ascending order.
2. If the series has odd number \((n)\) of entries, the median is at position \(\frac{n+1}{2}.\)
3. Find the median of the series: \(2,4,5,(6),7,9,9\)
4. The median is \(6.\)
1. Sort the series in ascending order.
2. If the series has even number \((n)\) of entries, the median is the average of the two middle numbers: \(\frac{n}{2},\frac{n+1}{2}.\)
3. Find the median of the numbers: \(2,2,4,6,7,8\)
4. Median is the average of the third and the fourth numbers: \(\frac{4+6}{2}=5\)
The median is unaffected by outlier.
The weighted mean is the same as the mean, except that it is influenced more by some observations than others. We assign weights to observations as a sort of way of describing its relative importance.
The weighted mean of observations \(x_1, x_2,...,x_n\) using weights \(w_1, w_2,...,w_n\) is given by
The simple mean is a weighted mean where all the weights are 1.
The midrange of a data set is the measure of center that is the value midway between the maximum and minimum values in the original data set. It is found by adding the maximum data value to the minimum data value and then dividing the sum by \(2\), as the following formula:
\[ Midrange = \frac{\text{maximum data value + minimum data value}}{2} \]
The range of a set of data is the difference between the maximum and the minimum data values.
\(range = maximum - minimum\)
The range is sensitive to outliers. A single high or low value will affect the range significantly.
\[ \text{Percentile of value x} = \frac{\text{number of values less than x}}{\text{total number of values}} \times 100 \]
Three Quartiles \((Q_1, Q_2, Q_3)\)
Outliers in the context of a box plot
When in the context of a box plot, define an outlier as an observation that is more than \(1.5 \times IQR\) above \(Q_3\) or \(1.5 \times IQR\) below \(Q_1\). Such points are marked using a dot or asterisk in a box plot.
Data: \([5, 5, 9, 10, 15, 16, 20, 30, 40]\)
Min. 1st Qu. Median Mean 3rd Qu. Max. 5.00 9.00 15.00 16.67 20.00 40.00
Calculating the Standard Deviation
The standard deviation is the square root of the variance. It is roughly the average distance of the observations from the mean.
\[ \bbox[yellow,5px] { \color{black}{s= \sqrt{\frac{1}{n-1}\sum(x_i-\bar x)^2}} } \]
\(Calculate \space SD \space of \space [0,1]\)
Notice the spread of the distributions.
\[ \bbox[yellow,5px] { \color{black}{\sigma= \sqrt{\frac{1}{N}\sum(x_i-\mu)^2}} } \] Variance of a Sample and Population
The variance of a set of values is a measure of variation equal to the square of the standard variation.
Coefficient of Variation
The coefficient of variation (CV) for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean, and is given by the following:
\[ Sample: CV = \frac{s}{x}.100 \\ Population: CV = \frac{\sigma}{x}.100 \]
Probabilities for falling 1, 2, and 3 standard deviations of the mean in a normal distribution.
Consider a normally distributed random variable \(x\) with mean \(\mu\) and sd \(\sigma\): \(x \tilde \space N(\mu, \sigma)\)
Two-step linear transformation of \(x\)
The Z-score of an observation is defined as the number of standard deviations it falls above or bemow the mean. If the observation is one standard deviation above the mean, its Z-score is 1. If it is 1.5 standard deviations below the mean, then its Z-score is -1.5.
The normal distribution model describes a symmetric, unimodal, bell-shaped curve. It can be adjusted using two parameters; mean \((\mu)\) and standard deviation \((\sigma)\).
\[ \bbox[yellow,5px]
{
\color{black}{{\text {Density at z}} = \frac {1}{\sqrt {2\pi}}\exp{-\frac{1}{2}z^2}, -\infty<z<+\infty}
}
\]
Comparing distributions of median household income for counties by population gain status
Source: OpenIntroOrg