Independent events
Example
- What is the probability of getting two consecutive heads?
- \( A = \{\mbox{Head on flip 1}\} \) ~ \( P(A) = .5 \)
- \( B = \{\mbox{Head on flip 2}\} \) ~ \( P(B) = .5 \)
- \( A \cap B = \{\mbox{Head on flips 1 and 2}\} \)
- \( P(A \cap B) = P(A)P(B) = .5 \times .5 = .25 \)
Example
- Volume 309 of Science reports on a physician who was on trial for expert testimony in a criminal trial
- Based on an estimated prevalence of sudden infant death syndrome of \( 1 \) out of \( 8,543 \), Dr Meadow testified that that the probability of a mother having two children with SIDS was \( \left(\frac{1}{8,543}\right)^2 \)
- The mother on trial was convicted of murder
Example: continued
- For the purposes of this class, the principal mistake was to assume that the probabilities of having SIDs within a family are independent
- That is, \( P(A_1 \cap A_2) \) is not necessarily equal to \( P(A_1)P(A_2) \)
- Biological processes that have a believed genetic or familiar environmental component, of course, tend to be dependent within families
- (There are many other statistical points of discussion for this case.)
Useful fact
We will use the following fact extensively in this class:
If a collection of random variables \( X_1, X_2, \ldots, X_n \) are independent, then their joint distribution is the product of their individual densities or mass functions
That is, if \( f_i \) is the density for random variable \( X_i \) we have that
\[
f(x_1,\ldots, x_n) = \prod_{i=1}^n f_i(x_i)
\]
IID random variables
- Random variables are said to be iid if they are independent and identically distributed
- iid random variables are the default model for random samples
- Many of the important theories of statistics are founded on assuming that variables are iid
Example
- Suppose that we flip a biased coin with success probability \( p \) \( n \) times, what is the join density of the collection of outcomes?
- These random variables are iid with densities \( p^{x_i} (1 - p)^{1-x_i} \)
- Therefore
\[
f(x_1,\ldots,x_n) = \prod_{i=1}^n p^{x_i} (1 - p)^{1-x_i} = p^{\sum x_i} (1 - p)^{n - \sum x_i}
\]
Correlation
- The covariance between two random variables \( X \) and \( Y \) is defined as
\[
Cov(X, Y) = E[(X - \mu_x)(Y - \mu_y)] = E[X Y] - E[X]E[Y]
\]
- The following are useful facts about covariance
- \( Cov(X, Y) = Cov(Y, X) \)
- \( Cov(X, Y) \) can be negative or positive
- \( |Cov(X, Y)| \leq \sqrt{Var(X) Var(y)} \)
Correlation
- The correlation between \( X \) and \( Y \) is
\[
Cor(X, Y) = Cov(X, Y) / \sqrt{Var(X) Var(y)}
\]
- \( -1 \leq Cor(X, Y) \leq 1 \)
- \( Cor(X, Y) = \pm 1 \) if and only if \( X = a + bY \) for some constants \( a \) and \( b \)
- \( Cor(X, Y) \) is unitless
- \( X \) and \( Y \) are uncorrelated if \( Cor(X, Y) = 0 \)
- \( X \) and \( Y \) are more positively correlated, the closer \( Cor(X,Y) \) is to \( 1 \)
- \( X \) and \( Y \) are more negatively correlated, the closer \( Cor(X,Y) \) is to \( -1 \)
Some useful results
Let \( \{X_i\}_{i=1}^n \) be a collection of random variables
- When the \( \{X_i\} \) are uncorrelated \[ Var\left(\sum_{i=1}^n a_i X_i + b\right) = \sum_{i=1}^n a_i^2 Var(X_i) \]
A commonly used subcase from these properties is that if a collection of random variables \( \{X_i\} \) are uncorrelated, then the variance of the sum is the sum of the variances
\[
Var\left(\sum_{i=1}^n X_i \right) = \sum_{i=1}^n Var(X_i)
\]
Therefore, it is sums of variances that tend to be useful, not sums of standard deviations; that is, the standard deviation of the sum of bunch of independent random variables is the square root of the sum of the variances, not the sum of the standard deviations
The sample mean
Suppose \( X_i \) are iid with variance \( \sigma^2 \)
\[
\begin{eqnarray*}
Var(\bar X) & = & Var \left( \frac{1}{n}\sum_{i=1}^n X_i \right)\\ \\
& = & \frac{1}{n^2} Var\left(\sum_{i=1}^n X_i \right)\\ \\
& = & \frac{1}{n^2} \sum_{i=1}^n Var(X_i) \\ \\
& = & \frac{1}{n^2} \times n\sigma^2 \\ \\
& = & \frac{\sigma^2}{n}
\end{eqnarray*}
\]
Some comments
- When \( X_i \) are independent with a common variance \( Var(\bar X) = \frac{\sigma^2}{n} \)
- \( \sigma/\sqrt{n} \) is called the standard error of the sample mean
- The standard error of the sample mean is the standard deviation of the distribution of the sample mean
- \( \sigma \) is the standard deviation of the distribution of a single observation
- Easy way to remember, the sample mean has to be less variable than a single observation, therefore its standard deviation is divided by a \( \sqrt{n} \)
The sample variance
- The sample variance is defined as
\[
S^2 = \frac{\sum_{i=1}^n (X_i - \bar X)^2}{n-1}
\]
- The sample variance is an estimator of \( \sigma^2 \)
- The numerator has a version that's quicker for calculation
\[
\sum_{i=1}^n (X_i - \bar X)^2 = \sum_{i=1}^n X_i^2 - n \bar X^2
\]
- The sample variance is (nearly) the mean of the squared deviations from the mean
The sample variance is unbiased
\[
\begin{eqnarray*}
E\left[\sum_{i=1}^n (X_i - \bar X)^2\right] & = & \sum_{i=1}^n E\left[X_i^2\right] - n E\left[\bar X^2\right] \\ \\
& = & \sum_{i=1}^n \left\{Var(X_i) + \mu^2\right\} - n \left\{Var(\bar X) + \mu^2\right\} \\ \\
& = & \sum_{i=1}^n \left\{\sigma^2 + \mu^2\right\} - n \left\{\sigma^2 / n + \mu^2\right\} \\ \\
& = & n \sigma^2 + n \mu ^ 2 - \sigma^2 - n \mu^2 \\ \\
& = & (n - 1) \sigma^2
\end{eqnarray*}
\]
Hoping to avoid some confusion
- Suppose \( X_i \) are iid with mean \( \mu \) and variance \( \sigma^2 \)
- \( S^2 \) estimates \( \sigma^2 \)
- The calculation of \( S^2 \) involves dividing by \( n-1 \)
- \( S / \sqrt{n} \) estimates \( \sigma / \sqrt{n} \) the standard error of the mean
- \( S / \sqrt{n} \) is called the sample standard error (of the mean)
Example
data(father.son)
## Warning: data set 'father.son' not found
x <- father.son$sheight
## Error: object 'father.son' not found
n <- length(x)
## Error: object 'x' not found
## Error: object 'father.son' not found
round(c(sum((x - mean(x))^2)/(n - 1), var(x), var(x)/n, sd(x), sd(x)/sqrt(n)),
2)
## Error: object 'x' not found