10.4 - The generalized likelihood ratio test

Key points

  • A way to find tests of parameter(s) in the presence of nuisance parameter(s).
  • Looks at the ratio of restricted to unrestricted maximized likelihoods.
  • Asymptotic chi-square distribution regardless of the distribution of \(Y_i\), if needed.
  • No guarantees about UMP-ness (this property is typically not available anyway for two-sided tests).

Test statistic and decision

Let \[ \Lambda = \frac{\max_{\Theta \in \Omega_0} L(\Theta)} {\max_{\Theta \in \Omega_0 \cup \Omega_a} L(\Theta)}. \]

Reject \(H_0\) when \(\Lambda \le c\).

  • \(\Omega_0\): null (restricted) parameter space with \(r_0\) free parameters.
  • \(\Omega_0 \cup \Omega_a\): unrestricted parameter space with \(r\) free parameters.

Asymptotically, if needed, \[ -2\ln\Lambda \sim \chi^2_{r-r_0}. \]

Reject for large values of \(-2\ln(\Lambda)\), where “large” is defined relative to the \(\chi^2_{r-r_0}\) distribution.

Note on the GLRT test statistic

Rejecting when \[ \Lambda = \frac{\max_{\Theta\in \Omega_0} L(\Theta)} {\max_{\Theta\in \Omega_0\cup\Omega_a} L(\Theta)} \le c \]

is equivalent to rejecting when \[ \frac{\max_{\Theta\in \Omega_0\cup\Omega_a} L(\Theta)} {\max_{\Theta\in \Omega_0} L(\Theta)} \ge c'. \]

Exponential example

Suppose we use i.i.d. exponential data to test

\[ H_0:\ \beta = 5 \] \[ H_a:\ \beta \neq 5 \]

  • \(\Theta=\{\beta\}\)
  • \(\Omega_0=\{5\}\)
  • \(\Omega_a=\mathbb{R}^+ \setminus \{5\}\)
  • \(\Omega_0\cup\Omega_a=\mathbb{R}^+\)

Multinomial goodness of fit

Examples:

  • Are crimes at the University of Minnesota evenly distributed across the four seasons?
Fall Winter Spring Summer
32 17 30 24
  • Are genetic traits independent?
  • If I roll a fair die 100 times and observe the following table, is this evidence of an unfair die?
1 2 3 4 5 6
18 15 14 19 12 22

Multinomial goodness of fit

  • Given \((Y_1,\ldots,Y_k) \sim MULT(n; p_1,p_2,\ldots,p_k),\) test:

\[ H_0: p_1=p_{10},\ldots,p_k=p_{k0} \] \[ H_a: \text{at least one } p_i\neq p_{i0}. \]

  • The likelihood is: \[ L(\Theta) = \frac{n!}{y_1!y_2!\cdots y_k!} p_1^{y_1} p_2^{y_2}\cdots p_k^{y_k}. \]

Restricted MLE

Under \(H_0\), \(p_1=p_{10},\ldots,p_k=p_{k0}\), so there are no free parameters over which to optimize!

\[ \max_{\Theta\in\Omega_0} L(\Theta) = \frac{n!}{y_1! \cdots y_k!} p_{10}^{y_1} \cdots p_{k0}^{y_k}. \]

Unrestricted MLEs

  • Under \(H_a\), \(p_i\in (0,1)\) with the constraint \(\sum_{i=1}^k p_i=1\).
  • To incorporate this constraint, we add a LaGrange multiplier to our log-likelihood:

\[\ln(L(\Theta)) = c + y_1 \ln(p_1) + y_2 \ln(p_2) + ... + y_k \ln(p_k) + \lambda(p_1+p_2+...+p_k -1)\]

\[\frac{\partial}{\partial p_i} \ln L(\Theta) = \frac{y_i}{p_i} + \lambda \stackrel{set}{=} 0\implies p_i = -\frac{y_i}{\lambda}\] \[\frac{\partial}{\partial \lambda} \ln L(\Theta) = p_1+p_2+...+p_k -1 \stackrel{set}{=}0 \]

\[\implies -\frac{y_1}{\lambda} - \frac{y_2}{\lambda} - ... - \frac{y_k}{\lambda} = 1 \implies \lambda = -\sum_{i=1}^n y_i = -n\]

\[\implies \hat p_i = \frac{y_i}{n}.\]

GLRT test statistic

\[\Lambda = \frac{\max_{\Theta \in \Omega_0} L(\Theta)} {\max_{\Theta \in \Omega_0 \cup \Omega_a} L(\Theta)} = \frac{L(p_{10}, \ldots, p_{k0})}{L(\hat p_1, \ldots, \hat p_k)}\]

\[= \frac{ \frac{n!}{y_1!\cdots y_k!}\, p_{10}^{y_1} p_{20}^{y_2}\cdots p_{k0}^{y_k} }{ \frac{n!}{y_1!\cdots y_k!}\, \hat p_1^{y_1} \hat p_2^{y_2}\cdots \hat p_k^{y_k} }\]

\[= \left( \frac{p_{10}}{\hat p_1} \right)^{y_1} \left( \frac{p_{20}}{\hat p_2} \right)^{y_2} \cdots \left( \frac{p_{k0}}{\hat p_k} \right)^{y_k}\]

\[ = \left( \frac{n p_{10}}{y_1} \right)^{y_1} \left( \frac{n p_{20}}{y_2} \right)^{y_2} \cdots \left( \frac{n p_{k0}}{y_k} \right)^{y_k} \]

\[ = \left( \frac{E_1}{O_1} \right)^{O_1} \left( \frac{E_2}{O_2} \right)^{O_2} \cdots \left( \frac{E_k}{O_k} \right)^{O_k}\leftarrow \mbox{Distribution unknown!} \]

Apply the asymptotic property: \[ -2\ln \Lambda = -2\sum_{i=1}^k O_i \ln\left(\frac{E_i}{O_i}\right) \sim \chi^2_{\text{df}}, \]

where \(df\)=(number of free parameters under \(H_a\)) – (number of free parameters under \(H_0\)) = \((k-1)-0\).

Testing actual data

observed_crimes <- c(32, 17, 30, 24)
(expected_crimes <- rep(0.25*103, 4))
[1] 25.75 25.75 25.75 25.75
(Lambda <- -2*sum(observed_crimes*log(expected_crimes/observed_crimes)))
[1] 5.577245
1 - pchisq(Lambda, df = 4 - 1)
[1] 0.1340908

In JMP:

Two-sided normal test

  • Let \(Y_1,\ldots,Y_n \sim N(\mu,\sigma^2),\) with both parameters unknown. We want to test:

\[ H_0:\mu=\mu_0 \]

\[ H_a:\mu\neq\mu_0. \]

  • \(\Theta = \{\mu,\sigma^2\}\) with \(\sigma^2\) the “nuisance parameter”
  • \(\Omega_0 = \{\mu_0,\mathbb{R}^+\}\)
  • \(\Omega_a=\{\mathbb{R}\setminus \mu_0,\mathbb{R}^+\}\)
  • \(\Omega_0\cup\Omega_a=\{\mathbb{R},\mathbb{R}^+\}\)

Restricted MLEs

\[ \max_{\Theta\in\Omega_0} L(\Theta) = \max_{\mu=\mu_0,\ \sigma^2\in\mathbb{R}^+} L(\mu,\sigma^2) = \max_{\sigma^2} (2\pi\sigma^2)^{-n/2} \exp\left( -\frac{1}{2\sigma^2}\sum_{i=1}^n (y_i-\mu_0)^2 \right) \]

\[ \ln L(\mu_0,\sigma^2) = -\frac{n}{2}\ln\sigma^2 - \frac{1}{2\sigma^2}\sum_{i=1}^n (y_i-\mu_0)^2 \]

\[ \frac{d}{d\sigma^2}\ln L(\mu_0,\sigma^2) = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}\sum_{i=1}^n (y_i-\mu_0)^2 = 0 \]

\[ \Rightarrow \hat\sigma_0^2 = \frac{1}{n} \sum_{i=1}^n (y_i-\mu_0)^2 \]

Restricted MLEs:

\[ \hat\Theta_0=(\mu_0,\hat\sigma_0^2) = \left( \mu_0, \frac{\sum_{i=1}^n (y_i-\mu_0)^2}{n} \right) \]

Unrestricted MLEs

\[ \max_{\Theta\in\{\Omega_0\cup\Omega_a\}} L(\Theta) = \max_{\mu\in\mathbb{R},\ \sigma^2\in\mathbb{R}^+} L(\mu,\sigma^2) \]

\[ = \max_{\mu,\sigma^2} \left[(2\pi\sigma^2)^{-n/2} \exp\left( -\frac{1}{2\sigma^2}\sum_{i=1}^n (y_i-\mu)^2 \right)\right] \]

We’ve done this before… the unrestricted MLEs are:

\[ \hat\Theta_a=\left\{\hat\mu_a=\bar y,\hat\sigma_a^2=\frac{\sum_{i=1}^n (y_i-\bar y)^2}{n}\right\} \]

GLRT test statistic

\[ \frac {\max_{\Theta\in\{\Omega_0\cup\Omega_a\}} L(\Theta)} {\max_{\Theta\in\Omega_0} L(\Theta)} = \frac{L(\hat\Theta_a)}{L(\hat\Theta_0)} \]

\[ = \frac {(2\pi\hat\sigma_a^2)^{-n/2} \exp\left(-\frac{1}{2\hat\sigma_a^2}\sum_{i=1}^n (y_i-\bar y)^2\right)} {(2\pi\hat\sigma_0^2)^{-n/2} \exp\left(-\frac{1}{2\hat\sigma_0^2}\sum_{i=1}^n (y_i-\mu_0)^2\right)} \] \[ = \frac{(\sum_{1}^{n}(y_i-\bar y)^2)^{-n/2}} {(\sum_{1}^{n}(y_i-\mu_0)^2)^{-n/2}} \exp\!\left( -\frac{(\sum_{1}^{n}(y_i-\bar y)^2)} {2\,\sum_{1}^{n}(y_i-\bar y)^2/n} + \frac{\sum_{1}^{n}(y_i-\mu_0)^2} {2\,\sum_{1}^{n}(y_i-\mu_0)^2/n} \right) = \left( \frac{\sum_{1}^{n}(y_i-\mu_0)^2} {\sum_{1}^{n}(y_i-\bar y)^2} \right)^{n/2} \exp\!\left( -\frac{n}{2}+\frac{n}{2} \right) \]

\[ = \left( \frac{\sum_{1}^{n}(y_i-\bar y+\bar y-\mu_0)^2} {\sum_{1}^{n}(y_i-\bar y)^2} \right)^{n/2} = \left( \frac{\sum_{1}^{n}(y_i-\bar y)^2 +2(\bar y-\mu_0)\underbrace{\sum_{1}^{n}(y_i-\bar y)}_{0} +n(\bar y-\mu_0)^2} {\sum_{1}^{n}(y_i-\bar y)^2} \right)^{n/2} \]

\[ = \left( 1+\frac{n(\bar y-\mu_0)^2} {\sum_{1}^{n}(y_i-\bar y)^2} \right)^{n/2} \;>\; c \ \text{when}\ \frac{n(\bar y-\mu_0)^2} {\sum_{1}^{n}(y_i-\bar y)^2} \;>\; c' \]

GLRT test statistic

Reject when:

\[ \frac{n(\bar y-\mu_0)^2}{\sum_{i=1}^n (y_i-\bar y)^2} >c' \]

is equivalent to reject when:

\[ \frac{\sqrt{n}(\bar y-\mu_0)}{\sqrt{\sum_{i=1}^n (y_i-\bar y)^2}} >c'' \quad\text{OR}\quad \frac{\sqrt{n}(\bar y-\mu_0)}{\sqrt{\sum_{i=1}^n (y_i-\bar y)^2}}< c'' \]

is equivalent to reject when:

\[ \frac{\sqrt{n}(\bar y-\mu_0)}{\sqrt{\frac{\sum_{i=1}^n (y_i-\bar y)^2}{n-1}}} >c''' \quad\text{OR}\quad \frac{\sqrt{n}(\bar y-\mu_0)}{\sqrt{\frac{\sum_{i=1}^n (y_i-\bar y)^2}{n-1}}} <c''' \]

GLRT test statistic

But note, under \(H_0\):

\[ \frac{\sqrt{n}(\bar y-\mu_0)}{\sqrt{\frac{\sum_{i=1}^n (y_i-\bar y)^2}{n-1}}} = \frac{\frac{\bar y-\mu_0}{\sigma/\sqrt{n}}}{ \sqrt{\frac{(n-1)S^2}{\sigma^2(n-1)}}} = \frac{N(0,1)}{\sqrt{\chi^2_{n-1}/(n-1)}} \sim t_{n-1}! \]

\(\Rightarrow\) the GLRT test statistic has an exact \(t_{n-1}\) distribution when testing two-sided alternatives for \(\mu\) in the presence of unknown \(\sigma^2\), if the data come from a normal distribution.

Simple logistic regression

  • Let \(Y_i \mid x_i \sim BERN(p_i),\) with:

\[logit(p_i)=\beta_0+\beta_1 x_i\]

  • Assume binary \(x_i\) such that:
\(y_i=1\) \(y_i=0\)
\(x_i=1\) \(a\) \(b\)
\(x_i=0\) \(c\) \(d\)
  • Want to test:

\[H_0:\beta_1=0\] \[H_a:\beta_1\neq 0\]

Simple logistic regression

\[ logit( p_i)=\ln \left(\frac{p_i}{1-p_i}\right)=\beta_0+\beta_1 x_i \]

\[ \Rightarrow p_i=\mbox{expit}(\beta_0+\beta_1 x_i) =\frac{\exp(\beta_0+\beta_1 x_i)}{1+\exp(\beta_0+\beta_1 x_i)} \]

\[ \Rightarrow 1-p_i =1-\mbox{expit}(\beta_0+\beta_1 x_i) =\frac{1}{1+\exp(\beta_0+\beta_1 x_i)} \]

General likelihood

Likelihood, in general, with \(\Theta=\{\beta_0,\beta_1\}\):

\[ L(\Theta) =\prod_{i=1}^n \mbox{expit}(\beta_0+\beta_1 x_i)^{y_i} \left(1-\mbox{expit}(\beta_0+\beta_1 x_i)\right)^{1-y_i} \]

\[ = \mbox{expit}(\beta_0)^{c} \mbox{expit}(\beta_0+\beta_1)^{a} \left(1-\mbox{expit}(\beta_0)\right)^{d} \left(1-\mbox{expit}(\beta_0+\beta_1)\right)^{b} \]

\(\mathcal{y}_i = 1\) \(\mathcal{y}_i = 0\)
\(x_i=1\) \(a\) \(b\)
\(x_i=0\) \(c\) \(d\)

Restricted MLE

Under \(H_0\), \(\beta_1=0\) and the restricted likelihood becomes:

\[ L(\beta_0,0) = \mbox{expit}(\beta_0)^{c} \mbox{expit}(\beta_0)^{a} \left(1-\mbox{expit}(\beta_0)\right)^{d} \left(1-\mbox{expit}(\beta_0)\right)^{b} \]

\[ = \mbox{expit}(\beta_0)^{c+a} \left(1-\mbox{expit}(\beta_0)\right)^{d+b} \]

\[ \ln L(\beta_0,0) = (c+a)\ln\left(\frac{e^{\beta_0}}{1+e^{\beta_0}}\right) + (b+d)\ln\left(\frac{1}{1+e^{\beta_0}}\right) \]

\[ = (c+a)(\beta_0-\ln(1+e^{\beta_0})) - (b+d)\ln(1+e^{\beta_0}) \]

Restricted MLE

\[ \ln L(\beta_0,0) = (c+a)(\beta_0-\ln(1+e^{\beta_0})) - (b+d)\ln(1+e^{\beta_0}) \] \[ \frac{d}{d \beta_0}\ln L(\beta_0,0) =(c+a)\left(1-\frac{e^{\beta_0}}{1+e^{\beta_0}}\right) - (b+d)\frac{e^{\beta_0}}{1+e^{\beta_0}} \stackrel{set}{=}0 \]

\[ \Rightarrow c+a = (c+a+b+d)\mbox{expit}(\beta_0) \]

\[ \Rightarrow \hat\beta_{0,\text{restricted}} = logit\left( \frac{c+a}{c+a+b+d} \right) \]

= overall log-odds of \(y_i=1\)

\(\mathcal{y}_i = 1\) \(\mathcal{y}_i = 0\)
\(x_i=1\) \(a\) \(b\)
\(x_i=0\) \(c\) \(d\)

Unrestricted MLEs

To find the unrestricted MLEs:

\[ L(\beta_0,\beta_1) = \mbox{expit}(\beta_0)^{c} \mbox{expit}(\beta_0+\beta_1)^{a} \left(1-\mbox{expit}(\beta_0)\right)^{d} \left(1-\mbox{expit}(\beta_0+\beta_1)\right)^{b} \] \[ \ln L(\beta_0,\beta_1) = c \ln\!\left( \frac{e^{\beta_0}}{1+e^{\beta_0}} \right) + a \ln\!\left( \frac{e^{\beta_0+\beta_1}}{1+e^{\beta_0+\beta_1}} \right) + d \ln\!\left( \frac{1}{1+e^{\beta_0}} \right) + b \ln\!\left( \frac{1}{1+e^{\beta_0+\beta_1}} \right) \]

\[ = c\left(\beta_0-\ln(1+e^{\beta_0})\right) + a\left(\beta_0+\beta_1-\ln(1+e^{\beta_0+\beta_1})\right) - d\ln(1+e^{\beta_0}) - b\ln(1+e^{\beta_0+\beta_1}) \]

Unrestricted MLEs

\[ \frac{\partial}{\partial \beta_0}\ln(L(\beta_0,\beta_1)) = c\left( 1-\frac{e^{\beta_0}}{1+e^{\beta_0}} \right) + a\left( 1-\frac{e^{\beta_0+\beta_1}}{1+e^{\beta_0+\beta_1}} \right) - d\frac{e^{\beta_0}}{1+e^{\beta_0}} - b\frac{e^{\beta_0+\beta_1}}{1+e^{\beta_0+\beta_1}} \]

\[ = c+a-(c+d)\mbox{expit}(\beta_0)-(a+b)\mbox{expit}(\beta_0+\beta_1) \stackrel{set}{=}0 \]

\[ \frac{\partial}{\partial \beta_1}\ln(L(\beta_0,\beta_1)) = a\left( 1-\frac{e^{\beta_0+\beta_1}}{1+e^{\beta_0+\beta_1}} \right) - b\frac{e^{\beta_0+\beta_1}}{1+e^{\beta_0+\beta_1}} \]

\[ = a-(a+b)\mbox{expit}(\beta_0+\beta_1) \stackrel{set}{=}0 \;\Rightarrow\; \frac{a}{a+b} = \mbox{expit}(\beta_0+\beta_1) \]

\[ \Rightarrow c+a-(c+d)\mbox{expit}(\beta_0)-(a+b)\cdot\frac{a}{a+b} =0 \;\Rightarrow\; c=(c+d)\mbox{expit}(\beta_0) \;\Rightarrow\; \hat\beta_0=logit\!\left(\frac{c}{c+d}\right) \]

\[ \Rightarrow logit\!\left(\frac{a}{a+b}\right) = logit\!\left(\frac{c}{c+d}\right) + \beta_1 \;\Rightarrow\; \hat\beta_1 = logit\!\left(\frac{a}{a+b}\right) - logit\!\left(\frac{c}{c+d}\right) \]

Unrestricted MLEs

\(\mathcal{y}_i = 1\) \(\mathcal{y}_i = 0\)
\(x_i=1\) \(a\) \(b\)
\(x_i=0\) \(c\) \(d\)
  • \(\hat\beta_0=logit\left(\frac{c}{c+d}\right)\); log-odds of \(Y=1\) when \(X=0\)
  • \(\hat\beta_1=logit\left(\frac{a}{a+b}\right)-logit\left(\frac{c}{c+d}\right)\);difference in log-odds (aka odds ratio), for a “one-unit increase in \(X\)”, i.e., comparing \(X=0\) to \(X=1\)

Example: smoking and low birth weight

LBW (\(y_i=1\)) Normal (\(y_i=0\))
Smoking mom (\(x_i=1\)) 21 154
Non-smoking mom (\(x_i=0\)) 106 2219

\[ \hat{\beta}_{0,\text{restricted}} = logit\!\left( \frac{21+106}{21+106+154+2219} \right) = -2.928 \]

\[ \hat{\beta}_0 = logit\!\left( \frac{106}{106+2219} \right) = -3.0414 \]

\[ \hat{\beta}_1 = logit\!\left( \frac{21}{21+154} \right) - logit\!\left( \frac{106}{106+2219} \right)=1.0489 \]

Test statistic

\[ \Lambda = \frac{ \mbox{expit}(\hat{\beta}_{0,\text{restricted}})^{21+106} \left(1-\mbox{expit}(\hat{\beta}_{0,\text{restricted}})\right)^{154+2219} }{ \mbox{expit}(\hat{\beta}_0)^{106} \mbox{expit}(\hat{\beta}_0+\hat{\beta}_1)^{21} \left(1-\mbox{expit}(\hat{\beta}_0)\right)^{2219} \left(1-\mbox{expit}(\hat{\beta}_0+\hat{\beta}_1)\right)^{154} } \]

\[ -2\ln(\Lambda) = 14.14088 \]

  • Under \(H_0\), 1 parameter can vary
  • Under \(H_a\), 2 parameters can vary

\[\implies -2 \ln(\Lambda) \rightarrow_d \chi^2_{2-1}\]

Test-statistic and p-value

  • R:
logit <- \(p) log(p/(1-p))
expit <- \(z) exp(z)/(1+exp(z))

b0_r <- logit((21+106)/2500)
b0 <- logit(106/(106+2219))
b1 <- logit(21/(21+154))-logit(106/(106+2219))


(Lam <- expit(b0_r)^(21+106)*(1-expit(b0_r))^(154+2219) / 
    (expit(b0)^106 * expit(b0+b1)^21 * (1-expit(b0))^2219 * (1-expit(b0+b1))^154))
[1] 0.0008498576
-2*log(Lam)
[1] 14.14088
1-pchisq(-2*log(Lam), df = 2-1)
[1] 0.0001696171

Test-statistic and p-value

  • JMP: