Inference 3: Interval Estimation

Sometimes providing a point estimate, or testing a hypothesis is not the ideal method of inference. Rather, one may be interested in an interval (or a set), which efficiently captures the underlying parameter. For example, in a diagnostic test for a particular disease, one requires an interval (range) of the possible test results, with efficiently detects the occurrence or non-occurrence of a particular disease. In other words we require a random set which captures the underlying parameter with high probability (rather than a random point which is close to the underlying parameter in appropriate sense). This type of estimates are called interval estimates.

Definition: [Interval Estimate]

An interval estimate for a real valued parameter \(\theta\) is a pair of functions of sample observations \(L({\bf x})=L(x_{1},\ldots, x_{n}))\), \(U({\bf x})= U(x_{1},\ldots, x_{n}))\) that satisfy \(L({\bf x}) \leq U({\bf x})\) for each point \({\bf x}\) in its support. If realization of \({\bf X}\) is \({\bf x}\), then we infer that the interval \([L({\bf x}),U({\bf x})]\) contains \(\theta\) with high probability. The random interval \([L({\bf X}), U({\bf X})]\) is called a interval estimator of \(\theta\).

Note: For some particular examples, \(L({\bf x})\) can be \(-\infty\), or \(U({\bf X})\) can be \(\infty\). In such cases we obtain a one sided interval estimate. Further, instead of closed interval, one may obtain an open, or a semi-closed interval estimate as well.

Definition: [Confidence Coefficient]

The confidence coefficient of an interval estimator \([L({\bf x}),U({\bf x})]\) of a parameter \(\theta\), usually denoted by \((1-\alpha)\), is the probability that the random interval captures the true parameter \(\theta\), for any \(\theta\in \Theta\). Notationally, the confidence coefficient is \((1-\alpha)= \inf_{\theta} P_{\theta \in \Theta} \left( [L({\bf x}),U({\bf x})] \ni \theta\right)\).

Interval estimators together with confidence coefficient are called confidence intervals.

Note: We can generalize the idea of confidence intervals to confidence sets. A random set \(S({\bf X})\) is said to be a confidence set for a parameter vector \(\boldsymbol{\theta}\in \boldsymbol{\Theta} \subseteq \mathbb{R}^{k}\), with confidence coefficient \((1-\alpha)\), if \(P_{\theta} \left(S({\bf X})\ni \theta \right)\geq (1-\alpha)\) for each \(\theta\in\Theta\).

A confidence interval can be interpreted as a special type of confidence set, where \(S({\bf X})\) is an interval.

Interpretation of Confidence Sets:

A confidence set \(S({\bf X})\) with confidence coefficient \((1-\alpha)\) can be interpreted as follows: If repeated random samples, that is, repeated realizations of \({\bf X}\), are taken for a large (theoretically, infinite) number of times, then in \((1-\alpha)100\%\) cases, the realization of the confidence set, \(S({\bf x})\), will contain the true parameter \(\boldsymbol{\theta}\).

Example 1: Let \(X_{1},\ldots, X_{n}\) be a random sample from \(\mathtt{Normal}(\mu,\sigma^{2})\) distribution. Consider the interval estimate \([\bar{X}-c,\bar{X}+c]\) of \(\mu\) for some constant \(c\geq 0\). Find \(c\) such that the confidence coefficient is \((1-\alpha)\).

When \(\sigma^{2}\) is known, then it can be seen that \(c=\sigma\tau_{\alpha/2}/\sqrt{n}\) where \(\tau_{\alpha/2}\) is the upper \(\alpha/2\)-th point of the standard normal distribution.

Note: 1. One could also choose an interval estimate of the type \([\bar{X}-c_{1},\bar{X}+c_{2}]\) where \(c_{1},c_{2}\geq 0\). Then any \(c_{1}, c_{2}\), satisfying \(\Phi \left(c_{1} \sqrt{n}/\sigma \right) - \Phi \left(-c_{2} \sqrt{n}/\sigma \right)= (1-\alpha)\) would provide a valid confidence interval with confidence coefficient \((1-\alpha)\).

When \(\sigma^2\) is unknown then one can use the fact that \(\sqrt{n}(\bar{X}-\mu)/S^{\star} \sim t_{(n-1)}\) to obtain find \(c_{1},c_{2}\).

Example 2: Let \(X_{1},\ldots, X_{n}\) be a random sample from \(\mathtt{Normal}(\mu,\sigma^{2})\) distribution. Consider the interval estimate \([c_{1} S^{\star~2}, c_{2} S^{\star~2}]\) of \(\sigma^{2}\) for some constants \(0<c_{1}\leq c_{2}\). Find \(c_{1},c_{2}\) such that the confidence coefficient is \((1-\alpha)\),

Using the result that \((n-1) \sigma^{-2}S^{\star~2}\sim \chi^{2}_{(n-1)}\), one can show that any \(c_{1},c_{2}\) satisfying \[P\left( \frac{ (n-1)}{c_{2}} < W \leq \frac{ (n-1)}{c_{1}} \mid W\sim \chi^{2}_{(n-1)} \right)= (1-\alpha),\] leads to a valid confidence interval.

In particular, one may choose \(c_{1},c_{2}\) such that \[ P\left( W \geq \frac{ (n-1)}{c_{1}} \mid W\sim \chi^{2}_{(n-1)} \right)= P\left( W \leq \frac{ (n-1)}{c_{2}} \mid W\sim \chi^{2}_{(n-1)} \right) =\frac{\alpha}{2},\] which leads to the interval \(\left[(n-1) S^{\star~2}/ \chi^{2}_{(n-1),1-\alpha/2}, (n-1) S^{\star~2}/ \chi^{2}_{(n-1),\alpha/2} \right]\).

Methods of Finding Confidence Interval

1. Method of Pivots

Definition: [Pivot]

Let \({\bf X}\sim f_{\bf X}(\cdot; \theta)\). A random variable \(T({\bf X}; \theta)\) is called a pivot if the distribution of \(T({\bf X}; \theta)\) does not depend on \(\theta\).

Examples:

Let \(X_{1}, \cdots, X_{n}\) be a random sample from a location family with location parameter \(\theta\), i.e., \(X_{i}=W_{i}+\theta\) where \(W_{i}, ~i=1,\ldots,n\) are iid from a distribution free of \(\theta\). Then any function of \(X_{i}-\theta;~i=1,\ldots,n\) is a pivot.
Let \(X_{1}, \cdots, X_{n}\) be a random sample from a scale family with scale parameter \(\theta\), i.e., \(X_{i}=\theta W_{i}\) where \(W_{i}, ~i=1,\ldots,n\) are iid from a distribution free of \(\theta\). Then any function of \(X_{i}/\theta;~i=1,\ldots,n\) is a pivot.
If \(X\) is distribution as a continuous distribution, then the distribution function \(F_{X}(\cdot ; \theta)\) has a \(\mathtt{Uniform} (0,1)\) distribution. When \(n\) i.i.d. samples \(X_{1},\ldots, X_{n}\) are available, then one may take \(T({\bf X}; \theta) = - \sum_{i=1}^{n}\log F_{X}(X_{i}; \theta)\) as a pivot. It can be shown that \(T({\bf X}; \theta)\) distributed as a \(\mathtt{Gamma}(n,1)\) distribution.

A pivot may yield a confidence interval when supported with some additional features. The following theorem provides a set of sufficient conditions for a pivot to yield a confidence interval.

Theorem 1

Let \(T({\bf X}; \theta)\) be a pivot such that for each \(\theta\), \(T({\bf X}; \theta)\) is a statistic and as a function of \(\theta\), \(T({\bf X}; \theta)\) is strictly monotone at each \({\bf x}\in \mathbb{R}^{n}\). Let \(\Lambda\in \mathbb{R}\) be the range of \(T({\bf X}; \theta)\), and for each \(\lambda\) and \({\bf x}\in \mathbb{R}\) the equation \(\lambda=T({\bf x}; \theta)\) is solvable with respect to \(\theta\). Then one can construct a confidence interval for \(\theta\) at any level.

[Proof of Theorem 1]

Remarks:

A sufficient condition for the equation \(\lambda=T({\bf x}; \theta)\) to be solvable is \(T\) is continuous and strictly monotone w.r.t. \(\theta\).

For example, let \(F_{S}(S({\bf x});\theta)\) be the cdf of a (continuous) statistic \(S({\bf X})\). If \(F_{S}(s;\theta)\) is strictly increasing in \(\theta\) then for any \(\alpha\in (0,1)\) one may choose \(\alpha_{1},\alpha_{2}\) such that \(\alpha_{1}+\alpha_{2}=\alpha\), and the confidence interval \([L({\bf X}), U({\bf X}) ]\) such that \[ F_{S}(S({\bf x}) ; L({\bf x)})=\alpha_{1}, \quad \text{and} \quad F_{S}(S({\bf x}) ; U({\bf x)})=1-\alpha_{2}, \quad \text{for each } {\bf x}.\] 2. The monotonicity assumption in the above theorem ensures that the confidence set obtained from the pivot \(T\) is of interval type. In case all the other assumptions in Theorem 1 are satisfied, except the monotonicity assumption, then one would still obtain a confidence set, but it may not be of interval type.

Example 3: Let \(X_{1}, \cdots, X_{n}\) be a random sample from location exponential distribution, with location parameter \(\theta\) and scale parameter \(1\). Then obtain a \((1-\alpha)\) confidence interval based on the complete sufficient statistic \(X_{(1)}\) of \(\theta\).

Example 1,2: Revisit.

2. Test Inversion

There is a strong correspondence between testing of a hypothesis and interval estimation. From a test one can obtain a confidence set, and conversely, from a confidence interval one can obtain a test. We will see this with an example first.

Example 4: Consider the problem of testing \(H_{0}:\mu=\mu_{0}\) against \(H_{1}:\mu \neq \mu_{0}\) for a \(\mathtt{Normal}(\mu,1)\) population at level \(\alpha\), based on a random sample of size \(n\). Recall that the LR test statistic is of the form \[ \phi({\bf x}) =\begin{cases} 1 & \text{if} \quad |T({\bf x})| >c \\ 0 & \quad \text{otherwise} \end{cases}, \quad \text{where}\quad T({\bf x})=\sqrt{n} (\bar{X}-\mu_{0}), \] and \[ P_{H_{0}}(|T({\bf X})|>c)=\alpha.\] As \(T({\bf X})\sim N(0,1)\) under \(H_{0}\), we have the choice \(c=\tau_{\alpha/2}\), where \(\tau_{\alpha/2}\) is the upper \(\alpha/2\) point of \(N(0,1)\) distribution. This together implies that

\[\begin{aligned} &P_{\mu_{0}}\left( - \tau_{\alpha/2} \leq \sqrt{n} (\bar{X}-\mu_{0}) \leq \tau_{\alpha/2}\right)\\ &\quad =P_{\mu_{0}}\left( \bar{X}-\frac{\tau_{\alpha/2}}{\sqrt{n}}\leq \mu_{0} \leq \bar{X} + \frac{\tau_{\alpha/2}}{\sqrt{n}} \right)=(1-\alpha) \end{aligned}\]

However, observe that the above probability statement is true for any choice of \(\mu_{0}\). Thus we can rewrite the above statement as

\[\begin{aligned} P_{\mu}\left( \bar{X}-\frac{\tau_{\alpha/2}}{\sqrt{n}}\leq \mu \leq \bar{X} + \frac{\tau_{\alpha/2}}{\sqrt{n}} \right)=(1-\alpha), \end{aligned}\]

which yields a confidence interval for \(\mu\).

Note:

In the above example, we are crucially using the fact that, when \(\theta\) takes a particular value \(\theta_{0}\) and the samples are generated from \(f_{\theta_{0}}\), then the probability that the test statistic \(T({\bf X})\) lies in the acceptance region, say \(A(\theta_{0})\), is at least \((1-\alpha)\).
If we fix a realization \({\bf x}\), then (given \(\theta=\theta_{0}\)), the test statistic \(T({\bf x})\) either falls inside or outside of the acceptance region. However, we keep \({\bf x}\) fixed and vary \(\theta\), then the acceptance region varies.
Now, consider the set of possible values of \(\theta\) such that a particular realization \({\bf x}\) belongs to the \(A(\theta)\). In the above example, it is the set of \(\mu\) values such that \(\sqrt{n}(\bar{x}-\mu)\in [-\tau_{\alpha/2},\tau_{\alpha/2}]\), i.e., \(\mu \in \left[\bar{x}-\tau_{\alpha/2}/\sqrt{n} ,\bar{x}+\tau_{\alpha/2}/\sqrt{n}\right]\).

Note that, this set does not depend on a particular \(\theta\), but on \({\bf x}\) only. Let us call this \(C({\bf x})\). One can interpret \(C({\bf x})\) as \(A^{-1}({\bf x})\). Thus \(C({\bf X})\) is a random set, based on \({\bf X}\) only.

The following theorem states that this set \(C({\bf X})\) forms a valid confidence set with confidence coefficient \((1-\alpha)\).

Theorem 2

For each \(\theta_{0}\in \Theta\), let \(A(\theta_{0})\) be the acceptance region of a level-\(\alpha\) test of \(H_{0}:\theta=\theta_{0}\). For each \({\bf x}\), define \(C({\bf x})\subseteq \Theta\) such that \(C({\bf x})=\{\theta: {\bf x} \in A(\theta)\}\). Then the random set \(C({\bf X})\) is a \((1-\alpha)\) confidence set.

[Proof of Theorem 2]

Remark:

If the test under consideration is an UMP test, then it can be shown that the corresponding confidence set is smallest (in appropriate sense).
As we have seen in testing of hypothesis, the alternative hypothesis plays a crucial role in determining the acceptance region of an UMP test. Consequently, the form of the confidence set obtained from the acceptance region of a test is also determined by the type of alternative hypothesis.
The above procedure does not guarantee in general that the confidence set obtained by inverting an acceptance region would be interval type.

Example 5: Let \(X_{1},\ldots,X_{n}\) be a random sample from \(\mathtt{Uniform}(0,\theta)\) distribution. A UMP level \(\alpha\) test for testing \(H_{0}:\theta = \theta_{0}\) against \(H_{1}:\theta>\theta_{0}\) has the form \[ \phi({\bf x})=\begin{cases} 1 & \text{if} \quad x_{(n)}>c \\ 0 & \quad \text{otherwise}. \end{cases}, \quad \text{where}\quad P_{\theta_{0}}(X_{(n)}>c)=\alpha.\] From the size \(\alpha\) condition, we get \(c=\theta_{0} (1-\alpha)^{1/n}\). Now, if we fix \({\bf x}\), then \[c({\bf x})=\{\theta: x_{(n)}\leq \theta (1-\alpha)^{1/n}\}=\left[(1-\alpha)^{-1/n}x_{(n)},\infty\right]. \] Therefore, by the above theorem, \(\left[(1-\alpha)^{-1/n}X_{(n)},\infty\right]\) is a \((1-\alpha)\) confidence interval.

Example 6: Let \(X_{1},\ldots,X_{n}\) be a continuous random sample from an exponential family of distribution with pdf \[f_{\bf X}({\bf x};\theta)=\exp\left[Q(\theta)T({\bf x})+ S({\bf x}) + D(\theta) \right], \] where \(Q(\theta)\) is non-decreasing in \(\theta\). Consider the problem of testing \(H_{0}:\theta=\theta_{0}\) against \(H_{1}:\theta<\theta_{0}\)

We have seen in this situation that \(f_{\bf X}(\cdot;\theta)\) has MLR in \(T({\bf X})\), and the test \[\phi({\bf x})=\begin{cases} 1 & \text{if} \quad T({\bf x})<k \\ 0 & \quad \text{otherwise}. \end{cases}, \quad \text{where}\quad P_{\theta_{0}}(T({\bf X})<k)=\alpha.\] Then, the acceptance region \(A(\theta_{0})=\{{\bf x}: T({\bf x})\geq k\}\). As \(k\) depends on \(\theta_{0}\), we may write \(k=k(\theta_{0})\).

Again from the properties of MLR, the CDF of \(T({\bf X})\) is stochastically increasing in \(\theta\). Therefore, if \(\theta_{1}>\theta_{2}\), \(P_{\theta_{1}}(T({\bf X})\geq k)\geq P_{\theta_{2}}(T({\bf X})\geq k)\) for any fixed \(k\).

Further, as \(P_{\theta_{2}}(T({\bf X})\geq k(\theta_{2}))=P_{\theta_{1}}(T({\bf X})\geq k(\theta_{1}))\geq P_{\theta_{2}}(T({\bf X})\geq k(\theta_{1}))\), which implies \(k(\theta_{2})\leq k(\theta_{1})\). Thus, \(k(\theta)\) can be considered as a non-decreasing function of \(\theta\).

Therefore, \(c({\bf x})=\left\{\theta: T({\bf x})\geq k(\theta) \right\}\) must be of the form \((-\infty, k^{-1}(T({\bf x}))]\) where \(k^{-1}(T({\bf x}))=\text{sup}_{\theta}\left\{\theta: k(\theta)\leq T({\bf x})\right\}\).

Method of Evaluating Confidence Intervals:

For a particular parameter \(\theta\), there may exist many confidence intervals (CIs), due to different approaches, different pivots, or test procedures. Among them, it is desirable to obtain the CI, which has shortest length and largest confidence coefficient. Often the confidence coefficient is set to a pre-assigned level. In that case, it is desirable to obtain the CI, which has shortest length among all CIs with same confidence coefficient.
However, the shortest length CI may not necessarily exist. In Theorem 3, we will demonstrate a procedure for obtaining the shortest interval based on a particular random variable \(T({\bf X},\theta)\) (pivot or test procedure), under suitable conditions. However, this procedure does not guarantee that the CI obtained would be the shortest one among all possible CIs . There may exist some other random variable \(T^{\star}({\bf X},\theta)\), which might lead to a better CI.

Theorem 3

Let \(X\) be a continuous random variable with unimodal pdf \(f_{X}(\cdot)\). If the interval \([a,b]\) satisfies

\(\int_{a}^{b} f_{X}(x) dx=1-\alpha\),
\(f_{X}(b)=f_{X}(a)>0\), and
\(a\leq x^{\star} \leq b\), where \(x^{\star}\) is the mode of \(f_{X}(\cdot)\),

then \([a,b]\) is the shortest among all intervals that satisfy i.

The proof is omitted due to time constraint. Interested students may read the proof of Theorem 9.3.2. in Casella Berger.

Example 1: Revisit.

Example 7: Let \(X_{1}, \ldots, X_{n}\) be a random sample of size \(n\) from double exponential distribution with pdf \[f_{X}(x; \theta)= \frac{\theta}{2} \exp\left\{ -\theta|x| \right\}, \quad x\in \mathbb{R}. \] Observe that the distribution \(W=\theta X\) is free of \(\theta\). We may choose the pivot in such a way that it is a function of the complete sufficient statistic \(\sum_{j=1}^{n}|X_{i}|\). Let \(T({\bf X}; \theta)=\sum_{i=1}^{n}|W_{i}|=\theta \sum_{i=1}^{n}|X_{i}|\) be the chosen pivot. Clearly \(|W_{i}|\) follows an exponential distribution with parameter 1, and \(\sum_{i}|W_{i}|\) follows a \(\mathtt{Gamma}(n,1)\) distribution. We find \(\lambda_{i}, i=1,2\) in such a way that \[\begin{aligned} &P(\lambda_{1} \leq T({\bf X};\theta) \leq \lambda_{2} \mid T({\bf X};\theta)\sim \mathtt{Gamma}(n,1)) \\ & \qquad =P\left(\frac{\lambda_{1}}{\sum_{i} |X_{i}|}\leq \theta \leq \frac{\lambda_{2}}{\sum_{i} |X_{i}|} \right) = (1-\alpha), \end{aligned}\]

and length of the CI \((\lambda_{2}- \lambda_{1})/\sum_{i} |X_{i}|\) is minimized. For a given realization \({\bf x}\), minimizing the length of the CI is equivalent to minimizing \((\lambda_{2}-\lambda_{1})\), and therefore Theorem 3 is applicable here.

Example 8: Let \(X_{1}, \ldots, X_{n}\) be a random sample of size \(n\) from double exponential distribution with pdf \[f_{X}(x; \theta)= \frac{1}{2\theta} \exp\left\{ -\frac{|x|}{\theta} \right\}, \quad x\in \mathbb{R}. \] Observe that, Theorem 3 is not applicable here. As in Example 7, we consider the pivot is \(T({\bf X}; \theta)=\sum_{i=1}^{n}|W_{i}|= \sum_{i=1}^{n}|X_{i}|/\theta\), which has a \(\mathtt{Gamma}(n,1)\) distribution. Now we need to find \(\lambda_{i},i=1,2\), such that \[\begin{equation} \begin{aligned} &P(\lambda_{1} \leq T({\bf X};\theta) \leq \lambda_{2} \mid T({\bf X};\theta)\sim \mathtt{Gamma}(n,1)) \\ & \qquad =P\left(\frac{\sum_{i} |X_{i}|}{\lambda_{2}}\leq \theta \leq \frac{\sum_{i} |X_{i}|}{\lambda_{1}} \right) = (1-\alpha), \end{aligned} \tag{*} \end{equation}\] and the length of the CI \((\lambda_{1}^{-1}- \lambda_{2}^{-1})\sum_{i} |X_{i}|\) is minimized. In such cases, one may minimize the length w.r.t. \((\lambda_{1},\lambda_{2})\) subject to the constraint (*).

Instead of minimizing the length of CI, one may also minimize the expected length of the CI. The procedure for finding the shortest length CI, is also applicable for finding the shortest expected length CI.