\(1\). Consider Example \(2\) of Lecture \(3\) (page \(6\)). Suppose that \(n\) Hershey’s Kisses were tossed and each outcome was recorded (as opposed to simply recording the total number of Kisses that landed on their base).

  1. Write down the likelihood function, where the data consist of observations on the individual outcomes, denoted \(y_1\), , \(y_n\). State the assumptions that you have made in the construction of the likelihood function, and whether these assumptions are likely to be satisfied.

         Based on the information provided, \(y_1, \dots, y_n\) have Bernoulli distribution, and the assumptions of Bernoulli are:

  1. Each trial has 2 possible outcomes (land on base or not);
  2. The trials are independent. (One Kisses landed on based has no influence over another Kisses)
  3. On each trial, the probability of success is \(\pi\) (land on the base) and the probability of failure is \(1-\pi\) (Not land on the base) where \(\pi \in [0,1]\) is the success parameter of the process.

         Then the likelihood function is:

\[L(\mathbf{y}; \pi) = \left\{ \begin{array}{ll} \prod^n_{i=1} \pi^{y_i} \left(1-\pi \right)^{1-y_i} = \pi^{\sum^n_{i=1} y_i} \left( 1-\pi\right)^{n-\sum^n_{i=1}y_i}, \hspace{0.5cm} &\text{$y_i=1$ or $0$; $0<\pi<1$}\\ 0, \hspace{0.5cm} &\text{otherwise} \\ \end{array} \right.\]

\[y_i = \left\{ \begin{array}{ll} 1, &\text{the Kiss land on the base} \\ 0, &\text{otherwise} \end{array} \right.\]

  1. Calculate the Fisher information in a single observation.

         Since \(y_1, \dots , y_n\) are independent and identically distributed as Bernoulli, \[i_n(\theta) = nE\left[ -\frac{\partial^2}{\partial \theta^2} logp_\theta \left( Y_1 \right) \right]\]

         As the result, the Fisher information on a single observation is: \[i_1(\theta) = E\left[ -\frac{\partial^2}{\partial \theta^2} logp_\theta \left( Y_1 \right) \right]\]

\[\begin{equation} \begin{split} \text{for a single observation, $n=1$, $\implies$} \hspace{0.5cm} \frac{\partial}{\partial \pi} log p_\pi \left(Y \right) &= \frac{ y_i}{\pi}- \frac{1-y_i}{1-\pi} \\ \frac{\partial^2}{\partial \pi^2} logp_\pi \left(Y \right) &= - \left[\frac{ y_i }{\pi^2}+\frac{1-y_i}{\left( 1-\pi \right)^2} \right] \\ i\left( \pi \right) &= E\left[ \frac{y_i }{\pi^2} + \frac{1-y_i}{\left( 1-\pi \right)^2} \right] \\ &= \frac{1}{\pi^2}E\left(y_i\right) + \frac{1}{\left(1-\pi\right)^2}-\frac{1}{\left(1-\pi\right)^2}E\left(y_i\right) \\ &= \frac{\pi}{\pi^2} + \frac{1}{\left( 1-\pi\right)^2} - \frac{\pi}{\left(1-\pi\right)^2} \\ &= \frac{1}{\pi} + \frac{1}{\left( 1-\pi\right)^2} - \frac{\pi}{\left(1-\pi\right)^2} \\ &= \frac{\left(1-\pi\right)^2+\pi-\pi^2}{\pi\left(1-\pi\right)^2} \\ &= \frac{1-2\pi+\pi^2+\pi-\pi^2}{\pi\left(1-\pi\right)^2} \\ &= \frac{1}{\pi \left(1-\pi\right)} \\ \end{split} \end{equation}\]

  1. Does modeling individual outcomes result in a different total Fisher information from modeling the total number of Kisses that landed on their base? If not, give an explanation (not proof) of why this is so (without using the concept of sufficiency if you have taken MATH \(164\)).

\(2\). [Adapted from problem \(1.10\) of Agresti, second edition] A sample of \(100\) women suffer from dysmenorrhea. A new analgesic is claimed to provide greater relief than a standard one. After using each analgesic in a crossover experiment, \(40\) reported greater relief with the standard analgesic and 60 reported greater relief with the new one.

  1. Of interest is the proportion of women who experience greater relief using the new analgesic than the standard analgesic, denoted \(\pi\). Construct the likelihood function, clearly defining all the quantities that appear in your likelihood function. Also, state any assumptions that you have made and whether these assumptions are likely to be satisfied.

\[L(\pi) = \left\{ \begin{array}{ll} \pi^{60} \left(1-\pi \right)^{40}, \hspace{1cm} &\text{$0<\pi<1$} \\ 0, \hspace{1cm} &\text{otherwise.} \\ \end{array} \right.\]

  1. State the hypotheses for testing whether the new analgesic provides greater relief than the standard analgesic. Which of the three tests would you select to test the hypotheses? Explain.

\[\mathbb{H_0}: \pi=0.5 \hspace{0.5cm} v.s \hspace{0.5cm} \mathbb{H_a}: \pi>0.5\]

  1. Test the hypotheses at \(5\)% level using (i) the Wald test, (ii) the likelihood- ratio test, and (iii) the score test. You have to show the steps in your calculations. Do not use any R packages to calculate the test statistics.
  1. The Wald test: \[\begin{equation} \begin{split} \hat{\pi} &= \frac{60}{100} = \frac{3}{5} \\ \hat{se}\left(\hat{\pi}\right) &= \sqrt{\frac{3/5 \times (1-3/5)}{100}} = 0.049 \\ \text{Wald Statistic:} \hspace{.5cm} T_w &\approx \frac{0.6 - 0.5}{.049} \approx 2.041 \\ \text{P-value} &= Pr(T_w > 2.04) < 0.021 \\ \end{split} \end{equation}\]
# set up variables #
x <- 60
n <- 100
pi_0 <- 0.5

# pi_hat #
pi_hat <- x/n
pi_hat
## [1] 0.6
se_hat_pi_hat <- sqrt(pi_hat*(1-pi_hat))/sqrt(n)
se_hat_pi_hat
## [1] 0.04898979
# test statistics #
tw <- (pi_hat - pi_0) / se_hat_pi_hat
tw
## [1] 2.041241
# p-value #
pnorm(tw, lower.tail=F)
## [1] 0.02061342
  1. The likelihood ratio test: \[\begin{equation} \begin{split} \text{Test statistic:} \hspace{0.5cm} T_{LR} &= -2\left[logL(0.5) - LogL(0.6) \right] \approx 4.03 \\ \text{P-value} &= Pr(T_{LR} > 4.03) < 0.045 \\ \end{split} \end{equation}\]
# test statistic #
tlr <- -2*(log(pi_0^x*(1-pi_0)^(n-x))-log(pi_hat^x*(1-pi_hat)^(n-x)))
tlr
## [1] 4.027103
sqrt(tlr)
## [1] 2.006764
# p-value #
pchisq(tlr, df=1, lower.tail=F)
## [1] 0.04477477
  1. Score test: \[\begin{equation} \begin{split} u(0.5) &= \frac{60}{0.5} - \frac{40}{1-0.5} = 40 \\ i(0.5) &= \frac{100}{0.5 \times \left(1-0.5 \right)} = 400 \\ T_s &= \frac{40}{\sqrt{400}} = 2 \\ \text{P-value} &= Pr(T_s > 28.284) < 0.023 \end{split} \end{equation}\]
# u(pi) #
upi <- x/pi_0 - (n-x)/(1-pi_0)
upi
## [1] 40
# i(pi) #
ipi <- n / (pi_0*(1-pi_0))
ipi
## [1] 400
# ts #
ts <- upi / sqrt(ipi) 
ts
## [1] 2
# p-value #
pnorm(ts, lower.tail=F)
## [1] 0.02275013
  1. Construct a \(95%\) confidence interval for \(\pi\) using (i) the Wald confidence interval, (ii) the likelihood-ratio confidence interval, and (iii) the Wilson confidence interval.
  1. The Wald CI: \[\begin{equation} \begin{split} 0 &< \pi < \hat{\pi} + \frac{Z_{0.05}}{\sqrt{i_n\left(\hat{\pi} \right)}} \\ 0 &< \pi < 0.6 + \frac{1.64}{\sqrt{ 100 / (0.6*(1-0.6))}} \\ 0 &< \pi < 0.681 \\ \end{split} \end{equation}\]
wup <- pi_hat+qnorm(0.05,lower.tail=F) / (sqrt(n/(pi_hat*(1-pi_hat))))
wup
## [1] 0.680581
  1. The LR CI: \[\begin{equation} \begin{split} -2\left[ 60log\pi + 40log\left(1-\pi \right) - 60log0.6 - 40log0.4 \right] &< 3.841\\ 60log\pi + 40log\left(1-\pi \right) - 60log0.6 - 40log0.4 &> -\frac{3.841}{2} \\ 60log\pi -40log\left(1-\pi \right) + 30.650 + 36.651 &> -1.921 \\ 0.502 < &\pi < 0.693 \\ \implies \text{the one sided CI:} \hspace{1cm} 0<&\pi<0.693 \\ \end{split} \end{equation}\]
qchisq(.05, df=1, lower.tail=F)
## [1] 3.841459
  1. The Wilson CI: \[\begin{equation} \begin{split} 0 &< \pi < \tilde{\pi} + \frac{Z_{0.05} \sqrt{100}}{100+Z^2_{0.05}} \sqrt{\hat{\pi}\left(1-\hat{\pi} \right) + \frac{Z^2_{0.05}}{4n}} \\ 0 &< \pi < \frac{60 + 0.5 Z^2_{0.05}}{100 + Z^2_{0.05}} + \frac{Z_{0.05} \sqrt{100}}{100+Z^2_{0.05}} \sqrt{0.6 \left(1-0.6 \right) + \frac{Z^2_{0.05}}{4*100}} \\ 0 &< \pi < 0.677 \\ \end{split} \end{equation}\]
z <- qnorm(0.05,lower.tail=F)
z
## [1] 1.644854
pi_t <- (x+0.5*z^2) / (n + z^2) 
pi_t
## [1] 0.5973657
sup <- pi_t + (z*sqrt(n) / (n+z^2)) * sqrt(pi_hat * (1-pi_hat) + z^2 / (n*4))  
sup
## [1] 0.6769219
knitr::kable(tbt, col.names=colnames(tbt), digits = 3, align="c")
Test Statistic P-value Lower Bound Upper Bound
Wald Test 2.041 0.021 0 0.681
LR Ratio Test 4.027 0.045 0 0.693
Score Test 2.000 0.023 0 0.677
  1. State your conclusions and any caveats.

\(3\). [Adapted from problem \(1.17\) of Agresti, second edition] Assume that \(y_1, \dots , y_n\) are independently drawn from a Poisson(\(\mu\)) distribution.

  1. Construct the likelihood function and find the maximum likelihood estimator of \(\mu\).

\[\begin{equation} \begin{split} \text{For Possion, the likelihood function is} \\ L(\mu; y) &=\prod^n_{i=1} \frac{\mu^{y_i} e^{-\mu}}{y_i !}\\ &= \frac{\mu^{\sum^n_{i=1}y_i}e^{-n\mu}}{y_1 ! y_2 ! \dots y_n !}\\ \text{and the loglikelihood is} \\ l\left(\mu ; y \right) &= \sum^n_{i=1} y_i log \mu - n\mu \\ \text{The Maximum Likelihood Estimator} \\ \frac{\partial}{\partial \mu} l\left(\mu ; y \right) &= \frac{\sum^n_{i=1} y_i}{\mu} - n = 0 \\ \hat{\mu} &= \frac{\sum^n_{i=1} y_i}{n} \\ \end{split} \end{equation}\]

  1. Construct and compare the Wald, score and likelihood ratio statistic for testing

\[\mathbb{H_0}: \mu=\mu_0 \hspace{0.5cm} \text{versus} \hspace{0.5cm} \mathbb{H_1}: \mu \ne \mu_0\]

  1. The Wald Test: \[\begin{equation} \begin{split} \hat{\mu} &= \frac{\sum^n_{i=1} y_i}{n} \\ &= \bar{y}\\ u(\hat{\mu}) &= \frac{\sum^n_{i=1} y_i}{\hat{\mu}} - n \\ i(\hat{\mu}) &= -E\left[-\frac{\sum^n_{i=1} y_i}{\hat{\mu}^2} \right] \\ &=\frac{1}{\hat{\mu}^2} nE\left[y_i \right] \\ &= \frac{1}{\hat{\mu}^2} n \mu \\ &= \frac{n}{\hat{\mu}} \\ &= \frac{n}{\bar{y}} \\ \sqrt{\frac{1}{i\left(\hat{\mu}\right)}} &= \sqrt{\frac{\hat{\mu}}{n}} \\ &= \sqrt{\frac{\bar{y}}{n}} \\ \text{Wald Statistic:} \hspace{0.5cm} T_w &\approx \frac{\bar{y} - \mu_0} {\sqrt{\frac{\bar{y}}{n}} } \\ \end{split} \end{equation}\]

  2. Score Test: \[\begin{equation} \begin{split} u(\mu_0) &= \frac{\sum^n_{i=1} y_i}{\mu_0} - n \\ i(\mu_0) &= -E\left[-\frac{\sum^n_{i=1} y_i}{\mu_0^2} \right] \\ &=\frac{1}{\mu_0^2} nE\left[y_i \right] \\ &= \frac{1}{\mu_0^2} n \mu \\ &= \frac{n}{\mu_0} \\ T_s &= \frac{\frac{\sum^n_{i=1} y_i}{\mu_0} - n }{\sqrt{\frac{n}{\mu_0}}} \\ &= \frac{\frac{n\bar{y}}{\mu_0}-n}{\sqrt{\frac{n}{\mu_0}}} \\ &= \frac{\frac{n^2\bar{y}-n^2\mu_0}{n\mu_0}}{\sqrt{\frac{n}{\mu_0}}} \\ &= \frac{n\times\left(\bar{y}-\mu_0\right)}{\mu_0} \times \frac{\sqrt{\mu_0}}{\sqrt{n}} \\ &= \frac{\bar{y} - \mu_0} {\sqrt{\frac{\mu_0}{n}} } \\ \end{split} \end{equation}\]

  3. Likelihood Ratio Test: \[\begin{equation} \begin{split} log\left(\mu_0 \right) &= \sum^n_{i=1} y_i log \mu_0 - n \mu_0 \\ log\left(\hat{\mu}\right) &= \sum^n_{i=1} y_i log \left(\frac{\sum^n_{i=1} y_i }{n} \right) - \sum^n_{i=1} y_i \\ T_{LR} &= -2 \left[ \left(\sum^n_{i=1} y_i log \mu_0 - n \mu_0\right) - \left( \sum^n_{i=1} y_i log \left(\frac{\sum^n_{i=1} y_i }{n} \right) - \sum^n_{i=1} y_i\right)\right] \\ &= -2\left[n\bar{y}log\mu_0-n\mu_0-n\bar{y}\log\bar{y}+n\bar{y} \right] \end{split} \end{equation}\]


\(4\). According to Benford’s Law, the leading digit in lists of numbers from many naturally- occurring data set is distributed in a non-uniform manner. Specifically, if \(D_1\) denotes the leading digit of a number, then

\[P(D_1 = i) = log_{10} \left(\frac{i+1}{i} \right), \hspace{1cm} i=1,\dots,9.\]

Does Benford’s law hold true for stock market data? In each of the following, clearly state your hypotheses (all notation should be defined), the test statistic used and its distribution under the null hypothesis, and your conclusion in the context of the problem.

  1. Consider the daily returns on the Standard and Poor’s (S&P) 500 Index for January 2, 1986, through December 29, 1995. There were 2524 trading days with nonzero returns during these 10 years. The observed distribution of leading digits for the absolute daily return are as follows
Digit 1 2 3 4 5 6 7 8 9
Count 735 432 273 266 200 175 169 148 126

         Is there evidence of deviation from Benford’s Law?

\[\begin{equation} \begin{split} P(D_1=1) &= log_{10}2 = 0.693 \hspace{1cm} &E(D_1=1) =0.693*2524 = 759.80 \\ P(D_1=2) &= 0.405 \hspace{1cm} &E(D_1=2) = 444.45 \\ P(D_1=3) &= 0.288 \hspace{1cm} &E(D_1=3) = 315.34 \\ P(D_1=4) &= 0.223 \hspace{1cm} &E(D_1=4) = 244.60 \\ P(D_1=5) &= 0.182 \hspace{1cm} &E(D_1=5) = 199.85 \\ P(D_1=6) &= 0.154 \hspace{1cm} &E(D_1=6) = 168.97 \\ P(D_1=7) &= 0.134 \hspace{1cm} &E(D_1=7) = 146.37 \\ P(D_1=8) &= 0.118 \hspace{1cm} &E(D_1=8) = 129.11 \\ P(D_1=9) &= 0.105 \hspace{1cm} &E(D_1=9) = 115.49 \\ \text{$\chi^2$ Test Statistic:} \hspace{1cm} \chi^2 &= \sum^9_{i=1} \frac{\left(D_{1i}-E(D_1=i) \right)^2}{E\left(D_1=i\right)} \\ &= \text{Calculated by "hand" in R shown below} \\ &= 16.15 \\ \text{P-value} &\approx 0.04 \\ \end{split} \end{equation}\]

# probability and expected nonzero returns for i=1 to 9 #
pd1 <- function(x) {
  out_mat <- matrix(,ncol=2, nrow=9)
  for (i in 1:9){
    out_mat[i,1] <- log10((x[i]+1)/x[i])
    out_mat[i,2] <- out_mat[i,1]*2524
  }
  obs <- matrix(c(735, 432, 273, 266, 200, 175, 169, 148, 126), ncol=1)
  chi2 <- (obs-out_mat[,2])^2 / out_mat[,2]
  out_mat <- cbind(x,out_mat,obs, chi2)
  colnames(out_mat) <- c("Digit","Probability", "Expected Return Days", "Observation Return Days","X Square")
  return(out_mat)
}
x <- c(1,2,3,4,5,6,7,8,9)

knitr:: kable(pd1(x), digits=3, align="c")
Digit Probability Expected Return Days Observation Return Days X Square
1 0.301 759.800 735 0.809
2 0.176 444.454 432 0.349
3 0.125 315.345 273 5.686
4 0.097 244.601 266 1.872
5 0.079 199.853 200 0.000
6 0.067 168.974 175 0.215
7 0.058 146.372 169 3.498
8 0.051 129.109 148 2.764
9 0.046 115.492 126 0.956
# chi square score #
sum(pd1(x)[,5])
## [1] 16.15026
# P-value #
pchisq(sum(pd1(x)[,5]), df=9-1, lower.tail=F)
## [1] 0.0402792
  1. The distribution of leading digits for the S&P returns separated by the magnitude of return are as follows:
Digit 1 2 3 4 5 6 7 8 9
\(|Return|\) \(\ge\) 0.1% 698 387 232 228 162 136 124 108 92
\(|Return|\) < 0.1% 37 45 41 38 39 39 45 40 34

         Does Benford’s law hold for days with absolute returns no less than 0.1%? Is there evidence to support the conjecture that the distribution of leading digits for very small returns is uniform?

# probability and expected nonzero returns for i=1 to 9 #
pd1 <- function(x) {
  out_mat <- matrix(,ncol=2, nrow=9)
  o <- c(698, 387, 232, 228, 162, 136, 124, 108, 92)
  for (i in 1:9){
    out_mat[i,1] <- log10((x[i]+1)/x[i])
    out_mat[i,2] <- out_mat[i,1]*sum(o)
  }
  obs <- matrix(o, ncol=1)
  chi2 <- (out_mat[,2]-obs)^2 / out_mat[,2]
  out_mat <- cbind(x, out_mat,obs, chi2)
  colnames(out_mat) <- c("digit","Probability", "Expected Return Days", "Observation Return Days","X Square")
  return(out_mat)
}
x <- c(1,2,3,4,5,6,7,8,9)


knitr:: kable(pd1(x), digits=3, align="c")
digit Probability Expected Return Days Observation Return Days X Square
1 0.301 652.332 698 3.197
2 0.176 381.590 387 0.077
3 0.125 270.742 232 5.544
4 0.097 210.004 228 1.542
5 0.079 171.586 162 0.536
6 0.067 145.074 136 0.568
7 0.058 125.669 124 0.022
8 0.051 110.848 108 0.073
9 0.046 99.156 92 0.517
# chi square score #
sum(pd1(x)[,5])
## [1] 12.07466
# P-value #
pchisq(sum(pd1(x)[,5]), df=9-1, lower.tail=F)
## [1] 0.1479035

\[\begin{equation} \begin{split} \text{Total count} &= 37+45+41+38+38+39+45+10+34 = 357 \\ E(D_1=1) &= \dots = E(D_1=9) = \frac{1}{9} \times 357 = 39.667 \\ \text{$\chi^2$ Test Statistic:} \hspace{1cm} \chi^2 &= \sum^9_{i=1} \frac{\left(D_{1i}-E(D_1=i) \right)^2}{E\left(D_1=i\right)} \\ &= \text{Calculated by "hand" in R shown below} \\ &= 2.622 \\ \text{P-value} &\approx 0.956 \\ \end{split} \end{equation}\]

# probability and expected nonzero returns for i=1 to 9 #
pd1 <- function(x) {
  out_mat <- matrix(,ncol=2, nrow=9)
  o <- c(37, 45, 41, 38, 38, 39, 45, 40, 34)
  for (i in 1:9){
    out_mat[i,1] <- 1/length(x)
    out_mat[i,2] <- 1/9*sum(o)
  }
  obs <- matrix(o, ncol=1)
  chi2 <- (out_mat[,2]-obs)^2 / out_mat[,2]
  out_mat <- cbind(x, out_mat,obs, chi2)
  colnames(out_mat) <- c("digit","Probability", "Expected Return Days", "Observation Return Days","X Square")
  return(out_mat)
}
x <- c(1,2,3,4,5,6,7,8,9)


knitr:: kable(pd1(x), digits=3, align="c")
digit Probability Expected Return Days Observation Return Days X Square
1 0.111 39.667 37 0.179
2 0.111 39.667 45 0.717
3 0.111 39.667 41 0.045
4 0.111 39.667 38 0.070
5 0.111 39.667 38 0.070
6 0.111 39.667 39 0.011
7 0.111 39.667 45 0.717
8 0.111 39.667 40 0.003
9 0.111 39.667 34 0.810
# chi square score #
sum(pd1(x)[,5])
## [1] 2.621849
# P-value #
pchisq(sum(pd1(x)[,5]), df=9-1, lower.tail=F)
## [1] 0.9558066

\[\begin{equation} \begin{split} H_0: &\text{The first digits in the firt digits in days with very small returns from 1986 to 1995 follows uniform distribution} \\ H_a: &\text{The first digits in the firt digits in days with very small returns from 1986 to 1995 are linearly ordered} \end{split} \end{equation}\]

\[H_0: \left( \pi_1, \dots, \pi_9 \right) = \left(\frac{1}{9}, \dots, \frac{1}{9}\right)\]

\[H_a: \text{$\pi_j$'s are linearly ordered}\]

\[\begin{equation} \begin{split} \mu^0 &= \frac{1}{9} \times (1+\dots+9) = 5 \\ \mu^0_2 &= \frac{1}{9} \times \left[ (1-5)^2+\dots+(1-9)^2 \right]= 6.667 \\ X^2_1 &= \text{Calculated by "hand" in R below} \\ &= 0.136 \\ P-value &\approx 0.712 \\ \end{split} \end{equation}\]

\[\begin{equation} \begin{split} X^2-X^2_1 &= 2.486 \\ P-value &\approx 0.928 \end{split} \end{equation}\]

mumu <- function(x, o){
  mu0 <- 1/9 * sum(x); 
  mu00 <- c()
  gsj <- c()
  ngsj <- c()
  for (i in 1:length(x)){
   mu00[i] <- (x[i]-mu0)^2
  }
  mu02 <- 1/9*sum(mu00)
  for (j in 1:length(x)){
    gsj[j] <- (x[j]-mu0) / sqrt(mu02)
    ngsj[j] <- o[j]*gsj[j]
  }
  return((1/sqrt(sum(o)) * sum(ngsj))^2)
}
x <- c(1,2,3,4,5,6,7,8,9)
o <- c(37, 45, 41, 38, 38, 39, 45, 40, 34)

# X^2_1 #
mumu(x, o)
## [1] 0.1361345
# P-value #
pchisq(mumu(x,o),df=1, lower.tail=F)
## [1] 0.7121545
#X^2 - X^2_1#
sum(pd1(x)[,5])-mumu(x,o)
## [1] 2.485714
#p-value#
pchisq(sum(pd1(x)[,5])-mumu(x,o), df=9-2, lower.tail=F)
## [1] 0.9281689