Dfn
Theorem
Let \(Y_1 < Y_2 < ... < Y_n\) be the order statistics of \(n\) independent observations from a continuous cummulative distribution function \(F(x)\), and pdf \[f(x) = F'(x)\] Then the density function of the \(rth\) order statistic is given by: \[g_r(y) = \frac{n!}{(r-1)!(n-r)!}\left[F(y)\right]^{r-1}\left[ 1 - F(y) \right]^{n-r}f(y)\]
Example
Let \(Y_1 < ... < Y_6\) be the order statistics associated with \(n = 6\) independent random samples from \[f(x) = \frac{1}{2}x\] What is the pdf of the 1st 2nd and 4th order statistics?
Solution
First thing we need to do in order to use the theorem above is find the cdf of the function. \[F(x) = \frac{1}{2}\int_0^{x_0}xdx = \frac{1}{4}x^2\]
When \(r = 1\)
\[g_1(y) = \frac{6!}{5!}\left[ y^2/4 \right]^0\left[1 - y^2/4\right]^5 \frac{y}{2}\] \[ = 3y\left[ 1 - \frac{y^2}{4} \right]^5\]
When \(r = 2\)
\[g_2(y) = \frac{6!}{4!}\left[ y^2/4 \right]^1\left[1 - y^2/4\right]^4 \frac{y}{2}\] \[ = 15\left[ y^2/4 \right]\left[1 - y^2/4\right]^4\left[ y/2 \right]\]
When \(r = 4\)
\[\frac{15}{32}y^7\left[ 1 - \frac{y^2}{4} \right]^2\]
The Basic Idea
The basic idea here is that we want to estimate the parameter from which the data came from. An intuitive approach is to find the parameter, \(\theta\), that maximizes the likelihood of obtaining the observed data. Given a random sample of data \(X_1, X_2, ..., X_n\), each with pdf \(f(x; \theta)\), the joint is given by the product of the marginals, and then we can set up the following likelihood function all while considering it as a function \(\theta\) we wish to maximize,
\[L(\theta) = P(X_1 = x_1, X_2 = x_2, ..., X_n = x_n) = \prod_i f(x_i;\theta)\]
We then do usual maximization process to find the MLE of \(\theta\).
Example
Let \(X_1, X_2 ,..., X_n\) be a random sample from a normal distribution with uknown mean \(\mu\) and variance \(\sigma^2\). Find the MLE of \(\mu\) and \(\sigma^2\).
Solution
Recall the pdf of the normal
\[f(x; \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left[ -\frac{(x - \mu)^2}{2\sigma} \right]\]
In our context, we will let \(\mu = \theta_1\) and \(\sigma^2 = \theta_2\), and so have that
\[f(x_i; \theta_1, \theta_2) = \frac{1}{\sqrt{\theta_2}\sqrt{2\pi}}\exp\left[ -\frac{(x - \theta_1)^2}{2\theta_2} \right]\]
and so we can write the likelihood function as:
\[L(\theta) = f(x_1)\cdot f(x_2)\cdots f(x_n) = \prod_i \frac{1}{\sqrt{2\pi\theta_2}}\exp\left[ -\frac{(x_i - \theta_1)^2}{2\theta_2} \right]\] \[ = \left(\frac{1}{\sqrt{2\pi\theta_2}}\right)^n \prod_i \exp\left[ -\frac{(x_i - \theta_1)^2}{2\theta_2} \right]\] \[ = \frac{1}{(\theta_2)^{n/2}(2\pi)^{n/2}}\exp\left[ -\frac{1}{2\theta_2}\sum_i(x_i - \theta_1)^2 \right]\]
Now to simplify the maximization mathematically we can take the natural log of this, called the log-likelihood,
\[l(\theta) = -\frac{n}{2}\ln(\theta_2) - \frac{n}{2}\ln(2\pi) - \frac{\sum_i (x_i - \theta_1)^2}{2\sigma_2}\]
At this point we can finish the maximization by taking the partial derivatives, and solving for each of the parameters.
\[\frac{\partial l}{\partial \theta_1} =\frac{\partial}{\partial \theta_1}\left[ -\frac{n}{2}\ln(\theta_2) - \frac{n}{2}\ln(2\pi) - \frac{\sum X_i}{2\theta_2} + \frac{2\theta_1\sum X_i}{2\theta_2} - \frac{n\theta_1^2}{2\theta_2}\right]\] \[ 0 - 0 - 0 - \frac{\sum X_i}{\theta_2} - \frac{2n\theta_1}{2\theta_2} = 0\] \[\Rightarrow \theta_1 = \sum X_i / n \;\;\therefore \;\; \hat{\mu} = \bar{X}\]
And now we use this to solve and find the MLE for \(\theta_2\)
\[\frac{\partial l}{\partial \theta_2} = -\frac{n}{2}\frac{1}{\theta_2} + \frac{\sum_i (x_i - \bar{x})^2}{2\theta_2^2} = 0\] \[ = \hat{\theta}_2 = \hat{\sigma}^2 = \frac{\sum (x_i - \hat{x})^2}{n}\]
Note that we should be checking the second derivative of these to check that indeed have achieved a maximum.
Definitions
\(E(X^k)\) is the \(kth\) theoretical moment about the origin.
\(E[(X - \mu)^k]\) is the \(kth\) moment about the mean.
\(M_k = \frac{1}{n}\sum X_i^k\) is the \(kth\) sampled moment about the origin
\(M_K^* = \frac{1}{n}\sum (X_i - \bar{X})^k\) is the \(kth\) sampled moment about the mean.
The basic idea is to set sampled and theoretical moments equal to each other, such that the system of linear equations that results will allow us to solve for all of the parameters we need.
Example
Let’s estimate for \(\mu\) and \(\sigma^2\) like we did with the MLE but instead use the MME.
First let us derive the theoretical moments, we are interested in estimating two parameters, thus we need two equations.
\[E(X) = \mu\]
and using: \(Var(X) = E(X^2) - [E(X)]^2 \Rightarrow E(X^2) = Var(X) + [E(X)]^2\)
\[E(X^2) = \sigma^2 + \mu^2\]
And now we compute the theoretical moments \(M_1\) and \(M_2\),
\[M_1 = \frac{1}{n}\sum_i X_i = \bar{X}\]
Setting it equal to the first theoretical moment we just get \[\hat{\mu} = \bar{X}\]
and now
\[M_2 = \frac{1}{2}\sum X_i^2\]
Setting it equal to the theoretical moment we get
\[\frac{1}{n}\sum X_i^2 = \mu^2 + \sigma^2\] \[\frac{1}{n}\sum X_i^2 = \bar{X}^2 + \sigma^2\] \[\frac{1}{n}\sum X_i^2 - \bar{X}^2 = \sigma^2\] \[\therefore \;\; \hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2\]
Both these estimators coincide with their MLE counterparts.
Definition
Let \(X_1, X_2, ..., X_n\) be a random sample from a probability distribution with uknown paramter \(\theta\). Then the statistic, \[Y = u(X_1, X_2, ..., X_n)\] is said to be sufficient for \(\theta\) if the conditional distribution of \(X_1, ..., X_n\) given the statistic \(Y\) does not depend of the parameter \(\theta\).
Theorem
Let \(X_1, X_2, ... ,X_n\) denote random variables with joint density function \(f(x_1, x_2, ..., x_n; \theta)\) which depends on the parameter \(\theta\). Then the statistic \(Y = u(X_1, ..., X_n)\) is said to be sufficient for \(\theta\) if and only if the p.d.f can be factorized into two components that is: \[f(x_1, x_2, ..., x_n; \theta) = \phi[u(x_1, x_2, ..., x_n); \theta] h(x_1, x_2, ..., x_n)\]
where
\(\phi\) is a function that depends on the data only through \(u\), and
the function \(h(x_1, x_2, ..., x_n)\) does not depend on the parameter \(\theta\).
Example
Let \(X_1, X_2, ..., X_n\) denote a random sample from a Poisson distribution with \(\lambda > 0\). Find a sufficient statistic for \(\lambda\).
Solution
By the factorization theorem, the goal is factorize the joint distribution such that we can express it as product of \(\phi\) and \(h\), moreover, the function \(u\) by which \(\phi\) depends on the data is the sufficient statistic.
First we figure out the joint distribution. We know this is a random sample and so we have independence between our observations. We can write the joint as follows:
\[f(x_1, x_2, ..., x_n; \lambda) = \prod_{i=1}^n \frac{e^{\lambda}\lambda^{x_i}}{x_i!} = \frac{1}{\prod_i x_i !}\left[ e^{n\lambda}\lambda^{\sum_i x_i} \right]\]
And so by the factorization theorem we have that \(Y = \sum X_i\) is a sufficient statistic for \(\lambda\) moreover any one-to-one transformation of this is also sufficient so \(Y = \bar{X}\) is sufficient.
Z-Confidence Interval
Let \(X_1, X_2, ..., X_n\) be a random sample from a normal distribution with mean \(\mu\) and varinace \(\sigma^2\), so that the sampling distribution of \(\mu\) is given by, \[\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\]
and standardizing
\[Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim N\left(0, 1\right)\]
Let’s also assume that the population variance \(\sigma^2\) is known.
Then a \((1 - \alpha)100\%\) cofidence interval for the mean \(\mu\) is given by: \[\bar{X} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right)\]
This is called a Z-interval for the mean, and note that we need that
The sample upon which we are building the interval come from a normal distribution
The population variance \(\sigma^2\) is known.
This is almost never the case, and instead we relly on a t-interval
t-Confidence Interval
Since we generally do not know the population variance we approximate it with the sample variance \[S^2 = \frac{1}{n-1}\sum (X_i - \bar{X})^2\] and obtain \(S\) from \(S = \sqrt{S^2}\)
The next question in hand is figure out how \[T = \frac{\bar{X} - \mu}{s/\sqrt{n}}\] is distributed.
This can be derived by we get to the fact that \(T \sim t_{n-1}\).
And so we can build a \((1-\alpha)100\%\) t-interval for the mean with
\[\bar{X} \pm t_{\alpha/2, n-1}(s/\sqrt{n})\]
General Breakdown of C.I
Check to see if the population variance \(\sigma^2\) is known. If so then
Confidence interval breaks down to two categories, when the populations are independent and when they are independent. The case where they independent further breaks down into whehter the two share a common variance \(\sigma^2\) or not. First we look at the case of dependent samples.
Paired t-Interval
In the case of paired t-intervals we want to build a CI around the mean change of a sample before and after a treatment, hence two depedent populations. We do so with the following: \[\bar{d} \pm t_{n-1, \alpha/2}\frac{S_{\bar{d}}}{\sqrt{n}}\]
Where:
\(\bar{d}\) is the sample mean difference of the paired samples
\(S_{\bar{d}}\) the sampled standard deviation of the differences
\(t_{n-1, \alpha/2}\) is derived from the t-distribution with df = n-1, and \(\alpha\) C.I Level
Example
Suppose we have the following results before and after a treatment.
before <- c(1.94, 1.44, 1.56, 1.58, 2.06, 1.66, 1.75,
1.77, 1.78, 1.92, 1.25, 1.93, 2.04, 1.62, 2.08)
after <- c(1.27, 1.63, 1.47, 1.39, 1.93, 1.26, 1.71,
1.67, 1.28, 1.85, 1.02, 1.34, 2.02, 1.59, 1.97)
Now we calculate a columns for difference
d <- before - after
We now calculate \(\bar{d}\)
d_bar <- mean(d); d_bar
## [1] 0.1986667
Now calculate the sample standard deviation of the differences
s_d <- sd(d); s_d
## [1] 0.2382935
Now simply build the C.I \[\bar{d} \pm t_{n-1, \alpha/2}\frac{S_d}{\sqrt{n}}\]
# first find the t value
t_c <- qt(df = 14, p = .975); t_c
## [1] 2.144787
ci <- d_bar + c(-t_c *(s_d/sqrt(15)), t_c * (s_d/sqrt(15))); ci
## [1] 0.0667041 0.3306292
From this we can deduce a hypothesis test namely:
\[H_0: \mu_d = 0 \;\; vs \;\; H_a: \mu_d \neq 0\]
Since 0 is not in the 95% CI, we can say that at a level \(\alpha = .05\) we reject the null and conclude that there exist a change observed after the treatment. Moroever since we established the difference as \(before - after\) then the C.I being all positive implies that the difference is a negative one.
We can use built in R functions to perform this test
t.test(x = before, y = after, paired = TRUE, var.equal = TRUE)
##
## Paired t-test
##
## data: before and after
## t = 3.2289, df = 14, p-value = 0.006062
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.0667041 0.3306292
## sample estimates:
## mean of the differences
## 0.1986667
Pooled t-interval
Now we suppose the case where the two populations are independent and have a common variance \(\sigma^2\). In this case we have what is called the Pooled t-interval (and associated Pooled t-test). We can build a \((1 - \alpha)100\%\) confidence interval with the following: \[(\bar{X} - \bar{Y}) \pm t_{n + m - 2, \alpha/2}S_p\sqrt{1/n + 1/m}\]
Where:
\(\bar{X}\) and \(\bar{Y}\) are means of the respective independent samples
\(n\) and \(m\) are the sample sizes of the samples respectively
\(S_p\) is the pooled sample variace given by,
\[S_p^2 = \frac{(n-1)S_x^2 + (m-1)S_y^2}{n + m - 2}\]
Assumptions Made
Samples are independent
Populations have equal variance \(\sigma^2\)
Samples display apporximate normal distribution
Example
Let us consider the same data as in the paired t-test but assume independence.
d # difference vector
## [1] 0.67 -0.19 0.09 0.19 0.13 0.40 0.04 0.10 0.50 0.07 0.23
## [12] 0.59 0.02 0.03 0.11
d_bar # difference mean
## [1] 0.1986667
We now must compute the pooled variance
n <- length(before); m <- length(after)
s_pooled <- (((n-1)*var(before)) + ((m-1)*var(after)))/(n + m - 2); s_pooled = sqrt(s_pooled)
Now we simply plug in values to get the C.I
t_val <- qt(df = n+m-2, p = .975)
ci <- d_bar + c(-t_val*s_pooled*sqrt(1/n + 1/m), t_val*s_pooled*sqrt(1/n + 1/m))
cat("Confidence Interval ", ci)
## Confidence Interval -0.005850599 0.4031839
We can test against the test in R
t.test(x = before, y = after, var.equal = TRUE, paired = FALSE)$conf.int
## [1] -0.005850599 0.403183932
## attr(,"conf.level")
## [1] 0.95
Welch’s t-Interval (Independent Samples Non-constant variance)
In this case we still have independent samples, but we do not assume that the variance of these is equal. Instead we build the confidence interval as follows:
\[\bar{X} - \bar{Y} \pm t_{\alpha/2, r}\sqrt{S_x^2/n + S_y^2/m}\]
Where \(r\) are the degrees of freedom apporximated by:
\[r = \frac{\left( S_x^2/n + S_y^2/m \right)^2}{\frac{(S_x^2/n)^2}{n - 1} + \frac{(S_y^2/m)^2}{m - 1}}\]
t.test(x = before, y = after, var.equal = FALSE, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: before and after
## t = 1.9898, df = 26.775, p-value = 0.05691
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.006273094 0.403606427
## sample estimates:
## mean of x mean of y
## 1.758667 1.560000