Review of Probability and Distributions
- The properties of mean, variance, and covariance
- The additivity property of some distributions
- The moment-generating function as an analytic tool
Sampling Distributions and the Central Limit Theorem (Chapter 7)
- The sampling distribution of the mean of a sample from a finite population
- The sampling distribution of the mean of a sample from a normal population
- The sampling distribution of the mean of a sample from a general population with finite variance
- The sampling distribution of the scaled variance of a sample from a normal population
- The independence of the mean and variance of a sample from a normal population: https://www2.stat.duke.edu/courses/Fall18/sta611.01/Lecture/lec12_mean_var_indep.pdf
- The definition of the t-distribution
- The definition of the F-distribution
Do exercises 7.37,38, and 39
Estimation
- Point estimate of a parameter
- Bias and mean square error of a point estimate
- Do 8.1, 8.6, 8.8, 8.10, 8.12, 8.14, 8.15
- Confidence interval of a parameter: one-sided or two-sided, pivotal quantity (a function of data and the parameter, plus its distribution is parameter-free)
- Do 8.43, 8.47
- Large-sample confidence intervals based on (approximate) pivotal quantity \(Z=\frac{\hat{\theta}-\theta}{\sigma_{\hat{\theta}}}\), Applet: https://www.lock5stat.com/StatKey/sampling_1_cat/sampling_1_cat.html
- Small-sample confidence intervals for \(\mu\) and \(\mu_1 -\mu_2\), assuming normal distributions
- Confidence Intervals for \(\sigma^2\) of a normal population: two-sided and one-sided
- Do 8.95, 8.102
- 8.129, 130, 131, 132
Properties of Point Estimators and Methods of Estimation
- Relative efficiency: \(\text{eff}(\hat{\theta_1}, \hat{\theta_2})=\frac{V(\hat{\theta_2})}{V(\hat{\theta_1})}\)
- Do 9.1, 9.2, 9.8 (efficiency)
- Consistency (convergence in probability), properties of consistency (Theorem 9.2, page 451)
- Any estimator whose asymptotic variance is 0 is always consistent. (prove this using Tchebysheff’s theorem)
- Show that the sample mean as an estimator of the population mean is both consistent (the LLN) and efficient.
- Show that for a normal distribution, the sample variance as an estimator of the population variance is consistent (the LLN) but not efficient.
- Do 9.21, 9.30, 9.32
- Sufficiency: Let a sample \(Y_1, Y_2, ..., Y_n\) be a sample from a population with unknown parameter \(\theta\). Then the statistic \(U=g(Y_1, Y_2, ..., Y_n)\) is said to be sufficient for \(\theta\) if the conditional distribution of \(Y_1, Y_2, ..., Y_n\), given \(U\), does not depend on \(\theta\).
- Likelihood of a sample: Let a sample \(Y_1, Y_2, ..., Y_n\) be a sample from a population with unknown parameter \(\theta\). If \(Y_1, Y_2, ..., Y_n\) are discrete random variables, the likelihood of the sample, \(L(\theta)=L(y_1, y_2, ..., y_n|\theta)\), is defined to be the joint probability mass function \(p(y_1, y_2, ..., y_n|\theta)\). If \(Y_1, Y_2, ..., Y_n\) are continuous random variables, the likelihood of the sample, \(L(y_1, y_2, ..., y_n|\theta)\), is defined to be the joint probability density function \(f(y_1, y_2, ..., y_n|\theta)\).
- Let \(U\) be a statistic based on a random sample \(Y_1, Y_2, ..., Y_n\). Then \(U\) is a sufficient statistic for the estimation of a parameter \(\theta\) if and only if the likelihood \(L(y_1, y_2, ..., y_n|\theta)\) can be factored into two nonnegative functions, \[L(y_1, y_2, ..., y_n|\theta)=g(u,\theta)\cdot h(y_1, y_2, ..., y_n)\] where \(g\) is a function only of \(u\) and \(\theta\) and \(h\) is not a function of \(\theta\).
- Example: Given a sample from the exponential distribution with mean \(\theta\), find a sufficient statistic for \(\theta\).
- Do 9.37, 38, 39, 45, 49, 50, 52, 53, 54
- The Rao-Blackwell Theorem (R-BT) and Minimum-Variance Unbiased Estimation (MVUE): (1) The results show how a sufficient statistic can be used to create a more efficient unbiased estimator. Refer to Page 464. (2) If a function of a sufficient statistic is unbiased, then it must also be the MVUE. Refer to Page 466.
- Use a sufficient statistic to form a pivotal quantity which can be used to construct a confidence interval. Example 9.10, pages 468-469
- Do 9.59, 9.60, 9.63, 9.65
- The method of Moments: Equating the \(k\)th sample moment \(\frac{1}{n}\sum_{i=1}^{n} Y_i^k\) and the \(k\)th moment of the population \(E(Y^k)\), for \(k=1, 2, ..., p\), where \(p\) is the number of parameters. Solving the \(p\) equations yields the estimates of the \(p\) parameters.
- The Method of Maximum Likelihood: the value of the parameter that maximizes the likelihood function is called the maximum likelihood estimate (MLE) of the parameter.
- Do 9.80, 82, 92, 93, 104
Hypothesis Testing
- Elements of a Statistical Test: null hypothesis, alternative hypothesis, test statistic, rejection region. For the sake of an introduction, we consider the case of testing hypotheses involving one parameters. The null hypothesis, denoted by \(H_0\) specifies a particular value for the parameter). The alternative hypothesis, denoted by \(H_a\), can be any of the three possibilities: the parameter is greater than, less than, or not equal to the particular value used under the null hypothesis. The test statistic is formed as a function of data on which the statistical decision will be based. The rejection region, denoted by RR, specifies the values of the test statistic for which the null hypothesis is to be rejected in favor of the alternative hypothesis. If for a particular sample, the computed value of the test statistic falls in the rejection region, we reject the null hypothesis. Otherwise, we don’t reject the null hypothesis. Usually, the rejection region has the form: \(RR = \{t: t\le k\}\) or \(RR = \{t: t\ge k\}\), where \(t\) is the test statistic value and \(k\) is the threshold to be determined under some criterion.
- Type I error and type II error: A type I error is made if the null hypothesis is rejected when it is true; A type II error is made if the null hypothesis is not rejected when it is false. Since both are events, we can talk about the probability of each. Use \(\alpha\) to denote the probability of type I error (called the level or significance level of the test) and use \(\beta\) to denote the probability of type II error. The goodness of a test is measured by \(\alpha\) and \(\beta\). The best test would have both values 0 and the worst would have both 1. When data are given, it is not possible to make both smaller at the same time, since \(\alpha\) and \(\beta\) are inversely related (page 493). In practice, we choose to control \(\alpha\) at a given level, say 0.05, and then the rejection region can be determined.
- Example: We are interested in testing whether or not a coin is balanced based on the number of heads \(Y\) on 36 tosses of the coin. If we use the rejection region \(|y-18|\ge4\), what is the value of \(\alpha\)? If the probability that the coin lands on a head is in fact 0.7, what is the value of \(\beta\)?
- Read Examples 10.1, 10.2, and 10.4.
- Do 10.2, 10.3
- Determine the sample size for a lower-tail test for a population mean (with standard deviation known), page 509
- Relationships between hypothesis-testing procedures and confidence intervals: two-tail test and confidence bounds, upper-tail test and lower confidence bound, lower-tail test and upper confidence bound
- Another way to report the results of a statistical test: the attained significance level or \(p\)-value, which is a statistic representing the smallest value of \(\alpha\) for which the null hypothesis can be rejected.
- Do 10.50, 10.51, 10.52, 10.54
- Testing variances: one or two
- Do 10.78, 79, 10.83
- Power of tests
- Neyman-Pearson Lemma for a simple null hypothesis (\(H_0:\theta=\theta_0\)) versus a simple alternative hypothesis (\(H_a:\theta=\theta_1\)): the most powerful \(\alpha\)-level test must have the rejection region \(RR = \{y: \frac{L(\theta_0)}{L(\theta_1)}<k\}\), where \(k\) is determined for a given \(\alpha\).
- Example 10.22, p543
- The uniformly most powerful test (UMPT) for a test that has a composite alternative hypothesis: first, consider any simple alternative. If the rejection region does not depend on the chosen value under the simple alternative, the test is UMPT. Read Example 10.23, p 544.
- likelihood ratio tests: The Lemma fails if there is any additional unknown parameter. In this case, we will use a very general methodology called the likelihood ratio test. The unknown parameter not being tested (called the nuisance parameter) will be first estimated using the maximum likelihood method under null and alternative hypothesis separately. The likelihood ratio test statistic is
\[\lambda = \frac{\text{max}_{\theta} L(\theta|H_0)}{\text{max}_{\theta} L(\theta)}\] with rejection region: \(\lambda < k\), where \(k\) is determined for given \(\alpha\). - Example 10.24, p550 and 10.25 - Do 10.105, 10.106
Introduction to Bayesian Methods for Inference
- Prior and Posterior: In the Bayesian approach, the parameter \(\theta\) is viewed as a random variable with a probability distribution, called the prior distribution of \(\theta\). The conditional distribution of \(\theta\) given the data is called the posterior distribution of \(\theta\), denoted by \(g*(\theta|y_1, y_2, ..., y_n)\). If the prior and posterior distributions have the same functional form (i.e., the same type), then the prior distribution is called the conjugate prior distribution for the data distribution.
- Posterior Bayesian estimator: This is just the mean of the posterior distribution.
- Bayesian credible intervals: If \(P^*(a<\theta<b)=1-\alpha\), the interval \((a,b)\) is called a \(100(1-\alpha)\%\)credible interval for \(\theta\). Such an interval is not unique. It might be of interest to find the interval with the shortest length.
- Bayesian test of hypotheses: Consider \(H_0: \theta \in \Omega_0\) vs. \(H_a: \theta \in \Omega_a\). If \(P^*(\theta \in \Omega_0)< P^*(\theta \in \Omega_a)\), reject \(H_0\).
- Do Exercises 16.8-16.12, 16.17, 16.23,