Homework 2 Math 258

$1$. Consider Example $2$ of Lecture $3$ (page $6$). Suppose that $n$ Hershey’s Kisses were tossed and each outcome was recorded (as opposed to simply recording the total number of Kisses that landed on their base).

Write down the likelihood function, where the data consist of observations on the individual outcomes, denoted $y_1$, , $y_n$. State the assumptions that you have made in the construction of the likelihood function, and whether these assumptions are likely to be satisfied.

Based on the information provided, $y_1, \dots, y_n$ have Bernoulli distribution, and the assumptions of Bernoulli are:

Each trial has 2 possible outcomes (land on base or not);
The trials are independent. (One Kisses landed on based has no influence over another Kisses)
On each trial, the probability of success is $\pi$ (land on the base) and the probability of failure is $1-\pi$ (Not land on the base) where $\pi \in [0,1]$ is the success parameter of the process.

Obviously, all the assumptions are likely to be satisfied. Of course, the Kisses candies are not identical in real world, but they should be closed enough

Then the likelihood function is:

\[L(\mathbf{y}; \pi) = \left\{ \begin{array}{ll} \prod^n_{i=1} \pi^{y_i} \left(1-\pi \right)^{1-y_i} = \pi^{\sum^n_{i=1} y_i} \left( 1-\pi\right)^{n-\sum^n_{i=1}y_i}, \hspace{0.5cm} &\text{$y_i=1$ or $0$; $0<\pi<1$}\\ 0, \hspace{0.5cm} &\text{otherwise} \\ \end{array} \right.\]

\[y_i = \left\{ \begin{array}{ll} 1, &\text{the Kiss land on the base} \\ 0, &\text{otherwise} \end{array} \right.\]

Calculate the Fisher information in a single observation.

Since $y_1, \dots , y_n$ are independent and identically distributed as Bernoulli, \[i_n(\theta) = nE\left[ -\frac{\partial^2}{\partial \theta^2} logp_\theta \left( Y_1 \right) \right]\]

As the result, the Fisher information on a single observation is: \[i_1(\theta) = E\left[ -\frac{\partial^2}{\partial \theta^2} logp_\theta \left( Y_1 \right) \right]\]

\[\begin{equation} \begin{split} \text{for a single observation, $n=1$, $\implies$} \hspace{0.5cm} \frac{\partial}{\partial \pi} log p_\pi \left(Y \right) &= \frac{ y_i}{\pi}- \frac{1-y_i}{1-\pi} \\ \frac{\partial^2}{\partial \pi^2} logp_\pi \left(Y \right) &= - \left[\frac{ y_i }{\pi^2}＋\frac{1-y_i}{\left( 1-\pi \right)^2} \right] \\ i\left( \pi \right) &= E\left[ \frac{y_i }{\pi^2} + \frac{1-y_i}{\left( 1-\pi \right)^2} \right] \\ &= \frac{1}{\pi^2}E\left(y_i\right) + \frac{1}{\left(1-\pi\right)^2}-\frac{1}{\left(1-\pi\right)^2}E\left(y_i\right) \\ &= \frac{\pi}{\pi^2} + \frac{1}{\left( 1-\pi\right)^2} - \frac{\pi}{\left(1-\pi\right)^2} \\ &= \frac{1}{\pi} + \frac{1}{\left( 1-\pi\right)^2} - \frac{\pi}{\left(1-\pi\right)^2} \\ &= \frac{\left(1-\pi\right)^2+\pi-\pi^2}{\pi\left(1-\pi\right)^2} \\ &= \frac{1-2\pi+\pi^2+\pi-\pi^2}{\pi\left(1-\pi\right)^2} \\ &= \frac{1}{\pi \left(1-\pi\right)} \\ \end{split} \end{equation}\]

Does modeling individual outcomes result in a different total Fisher information from modeling the total number of Kisses that landed on their base? If not, give an explanation (not proof) of why this is so (without using the concept of sufficiency if you have taken MATH $164$).

The outcome results are not the same, comparing to the model of the total number of Kisses that landed on their base. Since each $y_i$ $\sim^{i.i.d} Bernoulli(\pi)$, we will get the same amount of information if we have $n$ observations. Recall for the total land counts, the Fisher’s information is $\frac{1}{\pi \left(1-\pi\right)}\times n = \frac{n}{\pi\left(1-\pi\right)}$

$2$. [Adapted from problem $1.10$ of Agresti, second edition] A sample of $100$ women suffer from dysmenorrhea. A new analgesic is claimed to provide greater relief than a standard one. After using each analgesic in a crossover experiment, $40$ reported greater relief with the standard analgesic and 60 reported greater relief with the new one.

Of interest is the proportion of women who experience greater relief using the new analgesic than the standard analgesic, denoted $\pi$. Construct the likelihood function, clearly defining all the quantities that appear in your likelihood function. Also, state any assumptions that you have made and whether these assumptions are likely to be satisfied.

$\pi=$ the proportion of women who experience greater relief using the new analgesic than the standard analgesic
In fact, binomial distribution is a special case of a multinomial distribution
Assumptions of multinomial distribution:
- The trials are independent;
- the probability that an outcome falls into a certain category is the same for each trial.
Obviously, all the assumptions are likely to be satisfied. Of course, the observations, women, may have influence on each other.

\[L(\pi) = \left\{ \begin{array}{ll} \pi^{60} \left(1-\pi \right)^{40}, \hspace{1cm} &\text{$0<\pi<1$} \\ 0, \hspace{1cm} &\text{otherwise.} \\ \end{array} \right.\]

State the hypotheses for testing whether the new analgesic provides greater relief than the standard analgesic. Which of the three tests would you select to test the hypotheses? Explain.

\[\mathbb{H_0}: \pi=0.5 \hspace{0.5cm} v.s \hspace{0.5cm} \mathbb{H_a}: \pi>0.5\]

In English, the null hypothesis is that the new analgesic does not provide greater relief than the standard analgesic. The alternative hypothesis is that the new analgesic indeed provide greater relief than the standard analgesic.
As we discuss in the lecture, all three testing procedures are based on the likelihood function in one way or another. They are all asymptotically equivalent. However, Wald test statistic is easy to calculate (only need to estimate the parameters and the standard error of the estimated parameters in our models, in this case, only $\hat{\pi}$ and $\hat{se}\left(\hat{\pi}\right)$). Also, its confidence interval has a closed form, while the other two do not.

Test the hypotheses at $5$% level using (i) the Wald test, (ii) the likelihood- ratio test, and (iii) the score test. You have to show the steps in your calculations. Do not use any R packages to calculate the test statistics.

The Wald test: \[\begin{equation} \begin{split} \hat{\pi} &= \frac{60}{100} = \frac{3}{5} \\ \hat{se}\left(\hat{\pi}\right) &= \sqrt{\frac{3/5 \times (1-3/5)}{100}} = 0.049 \\ \text{Wald Statistic:} \hspace{.5cm} T_w &\approx \frac{0.6 - 0.5}{.049} \approx 2.041 \\ \text{P-value} &= Pr(T_w > 2.04) < 0.021 \\ \end{split} \end{equation}\]

P-value $< 0.021$, so we reject our $h_0$ at $5$% significant level. the new analgesic indeed provide greater relief than the standard analgesic.

# set up variables #
x <- 60
n <- 100
pi_0 <- 0.5

# pi_hat #
pi_hat <- x/n
pi_hat

## [1] 0.6

se_hat_pi_hat <- sqrt(pi_hat*(1-pi_hat))/sqrt(n)
se_hat_pi_hat

## [1] 0.04898979

# test statistics #
tw <- (pi_hat - pi_0) / se_hat_pi_hat
tw

## [1] 2.041241

# p-value #
pnorm(tw, lower.tail=F)

## [1] 0.02061342

The likelihood ratio test: \[\begin{equation} \begin{split} \text{Test statistic:} \hspace{0.5cm} T_{LR} &= -2\left[logL(0.5) - LogL(0.6) \right] \approx 4.03 \\ \text{P-value} &= Pr(T_{LR} > 4.03) < 0.045 \\ \end{split} \end{equation}\]

P-value $< 0.045$, so we reject our $h_0$ at $5$% significant level. the new analgesic indeed provide greater relief than the standard analgesic.

# test statistic #
tlr <- -2*(log(pi_0^x*(1-pi_0)^(n-x))-log(pi_hat^x*(1-pi_hat)^(n-x)))
tlr

## [1] 4.027103

sqrt(tlr)

## [1] 2.006764

# p-value #
pchisq(tlr, df=1, lower.tail=F)

## [1] 0.04477477

Score test: \[\begin{equation} \begin{split} u(0.5) &= \frac{60}{0.5} - \frac{40}{1-0.5} = 40 \\ i(0.5) &= \frac{100}{0.5 \times \left(1-0.5 \right)} = 400 \\ T_s &= \frac{40}{\sqrt{400}} = 2 \\ \text{P-value} &= Pr(T_s > 28.284) < 0.023 \end{split} \end{equation}\]

P-value $< 0.023$, so we reject our $h_0$ at $5$% significant level. the new analgesic indeed provide greater relief than the standard analgesic.

# u(pi) #
upi <- x/pi_0 - (n-x)/(1-pi_0)
upi

## [1] 40

# i(pi) #
ipi <- n / (pi_0*(1-pi_0))
ipi

## [1] 400

# ts #
ts <- upi / sqrt(ipi) 
ts

## [1] 2

# p-value #
pnorm(ts, lower.tail=F)

## [1] 0.02275013

Construct a $95%$ confidence interval for $\pi$ using (i) the Wald confidence interval, (ii) the likelihood-ratio confidence interval, and (iii) the Wilson confidence interval.

The Wald CI: \[\begin{equation} \begin{split} 0 &< \pi < \hat{\pi} + \frac{Z_{0.05}}{\sqrt{i_n\left(\hat{\pi} \right)}} \\ 0 &< \pi < 0.6 + \frac{1.64}{\sqrt{ 100 / (0.6*(1-0.6))}} \\ 0 &< \pi < 0.681 \\ \end{split} \end{equation}\]

wup <- pi_hat+qnorm(0.05,lower.tail=F) / (sqrt(n/(pi_hat*(1-pi_hat))))
wup

## [1] 0.680581

The LR CI: \[\begin{equation} \begin{split} -2\left[ 60log\pi + 40log\left(1-\pi \right) - 60log0.6 - 40log0.4 \right] &< 3.841\\ 60log\pi + 40log\left(1-\pi \right) - 60log0.6 - 40log0.4 &> -\frac{3.841}{2} \\ 60log\pi -40log\left(1-\pi \right) + 30.650 + 36.651 &> -1.921 \\ 0.502 < &\pi < 0.693 \\ \implies \text{the one sided CI:} \hspace{1cm} 0<&\pi<0.693 \\ \end{split} \end{equation}\]

qchisq(.05, df=1, lower.tail=F)

## [1] 3.841459

The Wilson CI: \[\begin{equation} \begin{split} 0 &< \pi < \tilde{\pi} + \frac{Z_{0.05} \sqrt{100}}{100+Z^2_{0.05}} \sqrt{\hat{\pi}\left(1-\hat{\pi} \right) + \frac{Z^2_{0.05}}{4n}} \\ 0 &< \pi < \frac{60 + 0.5 Z^2_{0.05}}{100 + Z^2_{0.05}} + \frac{Z_{0.05} \sqrt{100}}{100+Z^2_{0.05}} \sqrt{0.6 \left(1-0.6 \right) + \frac{Z^2_{0.05}}{4*100}} \\ 0 &< \pi < 0.677 \\ \end{split} \end{equation}\]

z <- qnorm(0.05,lower.tail=F)
z

## [1] 1.644854

pi_t <- (x+0.5*z^2) / (n + z^2) 
pi_t

## [1] 0.5973657

sup <- pi_t + (z*sqrt(n) / (n+z^2)) * sqrt(pi_hat * (1-pi_hat) + z^2 / (n*4))  
sup

## [1] 0.6769219

Summary Table:

knitr::kable(tbt, col.names=colnames(tbt), digits = 3, align="c")

	Test Statistic	P-value	Upper Bound
Wald Test	2.041	0.021	0.681
LR Ratio Test	4.027	0.045	0.693
Score Test	2.000	0.023	0.677

State your conclusions and any caveats.

Based on the results of all three confidence intervals we have above, the null hypothesis at $5$% significant level. This result matches the result based on all three test in par $c$.
Among all three CI’s, Wald CI is the one easiest to calculate. However, Wald CI may include values which are not valid for the parameters in question (negative $\pi$’s, for example). At the same time, the likelihood ratio CI is reasonable when the log likelihood function is approximately quadratic on some scale. In this example when we consider $\pi$, which is a probability, our likelihood ratio CI is automatically on the probability scale. Therefor, a likelihood ratio CI is valid under a weaker assumption than the Wald CI.

$3$. [Adapted from problem $1.17$ of Agresti, second edition] Assume that $y_1, \dots , y_n$ are independently drawn from a Poisson($\mu$) distribution.

Construct the likelihood function and find the maximum likelihood estimator of $\mu$.

\[\begin{equation} \begin{split} \text{For Possion, the likelihood function is} \\ L(\mu; y) &=\prod^n_{i=1} \frac{\mu^{y_i} e^{-\mu}}{y_i !}\\ &= \frac{\mu^{\sum^n_{i=1}y_i}e^{-n\mu}}{y_1 ! y_2 ! \dots y_n !}\\ \text{and the loglikelihood is} \\ l\left(\mu ; y \right) &= \sum^n_{i=1} y_i log \mu - n\mu \\ \text{The Maximum Likelihood Estimator} \\ \frac{\partial}{\partial \mu} l\left(\mu ; y \right) &= \frac{\sum^n_{i=1} y_i}{\mu} - n = 0 \\ \hat{\mu} &= \frac{\sum^n_{i=1} y_i}{n} \\ \end{split} \end{equation}\]

Construct and compare the Wald, score and likelihood ratio statistic for testing

\[\mathbb{H_0}: \mu=\mu_0 \hspace{0.5cm} \text{versus} \hspace{0.5cm} \mathbb{H_1}: \mu \ne \mu_0\]

The Wald Test: \[\begin{equation} \begin{split} \hat{\mu} &= \frac{\sum^n_{i=1} y_i}{n} \\ &= \bar{y}\\ u(\hat{\mu}) &= \frac{\sum^n_{i=1} y_i}{\hat{\mu}} - n \\ i(\hat{\mu}) &= -E\left[-\frac{\sum^n_{i=1} y_i}{\hat{\mu}^2} \right] \\ &=\frac{1}{\hat{\mu}^2} nE\left[y_i \right] \\ &= \frac{1}{\hat{\mu}^2} n \mu \\ &= \frac{n}{\hat{\mu}} \\ &= \frac{n}{\bar{y}} \\ \sqrt{\frac{1}{i\left(\hat{\mu}\right)}} &= \sqrt{\frac{\hat{\mu}}{n}} \\ &= \sqrt{\frac{\bar{y}}{n}} \\ \text{Wald Statistic:} \hspace{0.5cm} T_w &\approx \frac{\bar{y} - \mu_0} {\sqrt{\frac{\bar{y}}{n}} } \\ \end{split} \end{equation}\]
Score Test: \[\begin{equation} \begin{split} u(\mu_0) &= \frac{\sum^n_{i=1} y_i}{\mu_0} - n \\ i(\mu_0) &= -E\left[-\frac{\sum^n_{i=1} y_i}{\mu_0^2} \right] \\ &=\frac{1}{\mu_0^2} nE\left[y_i \right] \\ &= \frac{1}{\mu_0^2} n \mu \\ &= \frac{n}{\mu_0} \\ T_s &= \frac{\frac{\sum^n_{i=1} y_i}{\mu_0} - n }{\sqrt{\frac{n}{\mu_0}}} \\ &= \frac{\frac{n\bar{y}}{\mu_0}-n}{\sqrt{\frac{n}{\mu_0}}} \\ &= \frac{\frac{n^2\bar{y}-n^2\mu_0}{n\mu_0}}{\sqrt{\frac{n}{\mu_0}}} \\ &= \frac{n\times\left(\bar{y}-\mu_0\right)}{\mu_0} \times \frac{\sqrt{\mu_0}}{\sqrt{n}} \\ &= \frac{\bar{y} - \mu_0} {\sqrt{\frac{\mu_0}{n}} } \\ \end{split} \end{equation}\]
Likelihood Ratio Test: \[\begin{equation} \begin{split} log\left(\mu_0 \right) &= \sum^n_{i=1} y_i log \mu_0 - n \mu_0 \\ log\left(\hat{\mu}\right) &= \sum^n_{i=1} y_i log \left(\frac{\sum^n_{i=1} y_i }{n} \right) - \sum^n_{i=1} y_i \\ T_{LR} &= -2 \left[ \left(\sum^n_{i=1} y_i log \mu_0 - n \mu_0\right) - \left( \sum^n_{i=1} y_i log \left(\frac{\sum^n_{i=1} y_i }{n} \right) - \sum^n_{i=1} y_i\right)\right] \\ &= -2\left[n\bar{y}log\mu_0-n\mu_0-n\bar{y}\log\bar{y}+n\bar{y} \right] \end{split} \end{equation}\]

$4$. According to Benford’s Law, the leading digit in lists of numbers from many naturally- occurring data set is distributed in a non-uniform manner. Specifically, if $D_1$ denotes the leading digit of a number, then

\[P(D_1 = i) = log_{10} \left(\frac{i+1}{i} \right), \hspace{1cm} i=1,\dots,9.\]

Does Benford’s law hold true for stock market data? In each of the following, clearly state your hypotheses (all notation should be defined), the test statistic used and its distribution under the null hypothesis, and your conclusion in the context of the problem.

Consider the daily returns on the Standard and Poor’s (S&P) 500 Index for January 2, 1986, through December 29, 1995. There were 2524 trading days with nonzero returns during these 10 years. The observed distribution of leading digits for the absolute daily return are as follows

Digit	1	2	3	4	5	6	7	8	9
Count	735	432	273	266	200	175	169	148	126

Is there evidence of deviation from Benford’s Law?

The test hypothesis: \[H_0: \text{The first digits in the days with nonzero returns from 1986 to 1995 follows Benford's Law}\] \[H_a: \text{The first digits in the days with nonzero returns from 1986 to 1995 violate Benford's Law}\]
We can use Goodness of Fit test here:
- Assumptions:
  - the sample data is a random sampling from a fixed distribution;
  - Sample size is large;
  - 5 or more of expected cell counts;
- Test Statistic:
  - $\chi^2$ test: $\sum \frac{\left(Obs - Expected \right)^2}{Expected}$
Notations:
- $i$: leading digit of a number, from $1$ to $9$;
- $D_{1i}$: Observed counts of nonzero daily returns on the S&P 500 index for January 2, 1986 to December 29, 1995;
- $P(D_1=i)$: the probability the leading digit will occur;
- $E(D_1=i)$: the expected counts of nonzero daily returns on the S&P 500 index for January 2, 1986 to December 29, 1995.

\[\begin{equation} \begin{split} P(D_1=1) &= log_{10}2 = 0.693 \hspace{1cm} &E(D_1=1) =0.693*2524 = 759.80 \\ P(D_1=2) &= 0.405 \hspace{1cm} &E(D_1=2) = 444.45 \\ P(D_1=3) &= 0.288 \hspace{1cm} &E(D_1=3) = 315.34 \\ P(D_1=4) &= 0.223 \hspace{1cm} &E(D_1=4) = 244.60 \\ P(D_1=5) &= 0.182 \hspace{1cm} &E(D_1=5) = 199.85 \\ P(D_1=6) &= 0.154 \hspace{1cm} &E(D_1=6) = 168.97 \\ P(D_1=7) &= 0.134 \hspace{1cm} &E(D_1=7) = 146.37 \\ P(D_1=8) &= 0.118 \hspace{1cm} &E(D_1=8) = 129.11 \\ P(D_1=9) &= 0.105 \hspace{1cm} &E(D_1=9) = 115.49 \\ \text{$\chi^2$ Test Statistic:} \hspace{1cm} \chi^2 &= \sum^9_{i=1} \frac{\left(D_{1i}-E(D_1=i) \right)^2}{E\left(D_1=i\right)} \\ &= \text{Calculated by "hand" in R shown below} \\ &= 16.15 \\ \text{P-value} &\approx 0.04 \\ \end{split} \end{equation}\]

Conclusion: P-value of the goodness of fit test $\approx 0.04$, based on $5$% significant level, we reject the null hypothesis. the first digits in days with nonzero returns from 1986 to 1995 violate Benford’s Law.

# probability and expected nonzero returns for i=1 to 9 #
pd1 <- function(x) {
  out_mat <- matrix(,ncol=2, nrow=9)
  for (i in 1:9){
    out_mat[i,1] <- log10((x[i]+1)/x[i])
    out_mat[i,2] <- out_mat[i,1]*2524
  }
  obs <- matrix(c(735, 432, 273, 266, 200, 175, 169, 148, 126), ncol=1)
  chi2 <- (obs-out_mat[,2])^2 / out_mat[,2]
  out_mat <- cbind(x,out_mat,obs, chi2)
  colnames(out_mat) <- c("Digit","Probability", "Expected Return Days", "Observation Return Days","X Square")
  return(out_mat)
}
x <- c(1,2,3,4,5,6,7,8,9)

knitr:: kable(pd1(x), digits=3, align="c")

Digit	Probability	Expected Return Days	Observation Return Days	X Square
1	0.301	759.800	735	0.809
2	0.176	444.454	432	0.349
3	0.125	315.345	273	5.686
4	0.097	244.601	266	1.872
5	0.079	199.853	200	0.000
6	0.067	168.974	175	0.215
7	0.058	146.372	169	3.498
8	0.051	129.109	148	2.764
9	0.046	115.492	126	0.956

# chi square score #
sum(pd1(x)[,5])

## [1] 16.15026

# P-value #
pchisq(sum(pd1(x)[,5]), df=9-1, lower.tail=F)

## [1] 0.0402792

The distribution of leading digits for the S&P returns separated by the magnitude of return are as follows:

Digit	1	2	3	4	5	6	7	8	9
$\|Return\|$ $\ge$ 0.1%	698	387	232	228	162	136	124	108	92
$\|Return\|$ < 0.1%	37	45	41	38	39	39	45	40	34

Does Benford’s law hold for days with absolute returns no less than 0.1%? Is there evidence to support the conjecture that the distribution of leading digits for very small returns is uniform?

The test hypothesis: \[H_0: \text{The first digits in the firt digits in days with absolute returns no less than $0.1$% from 1986 to 1995 follows Benford's Law}\] \[H_a: \text{The first digits in the firt digits in days with absolute returns no less than $0.1$% from 1986 to 1995 violate Benford's Law}\]
the total return with absolute returns no less than $0.1$% is: \[698+387+232+228+162+136+124+108+92 = 2167\]
the new expected return counts are: \[\begin{equation} \begin{split} E(D_1=1)&=0.693*2167=652.332 \\ E(D_1=2)&=381.590 \\ E(D_1=3)&=270.742 \\ E(D_1=4)&=210.004 \\ E(D_1=5)&=171.586 \\ E(D_1=6)&=145.074 \\ E(D_1=7)&=125.669 \\ E(D_1=8)&=110.848 \\ E(D_1=9)&=99.156 \\ \text{$\chi^2$ Test Statistic:} \hspace{1cm} \chi^2 &= \sum^9_{i=1} \frac{\left(D_{1i}-E(D_1=i) \right)^2}{E\left(D_1=i\right)} \\ &= \text{Calculated by "hand" in R shown below} \\ &= 12.075 \\ \text{P-value} &\approx 0.148 \\ \end{split} \end{equation}\]
Conclusion: P-value of the goodness of fit test $\approx 0.148$, based on $5$% significant level, we fail reject the null hypothesis. We do not have enough evident to conclude that the first digits in days with absolute returns less than $0.1$% from 1986 to 1995 violate Benford’s Law.

# probability and expected nonzero returns for i=1 to 9 #
pd1 <- function(x) {
  out_mat <- matrix(,ncol=2, nrow=9)
  o <- c(698, 387, 232, 228, 162, 136, 124, 108, 92)
  for (i in 1:9){
    out_mat[i,1] <- log10((x[i]+1)/x[i])
    out_mat[i,2] <- out_mat[i,1]*sum(o)
  }
  obs <- matrix(o, ncol=1)
  chi2 <- (out_mat[,2]-obs)^2 / out_mat[,2]
  out_mat <- cbind(x, out_mat,obs, chi2)
  colnames(out_mat) <- c("digit","Probability", "Expected Return Days", "Observation Return Days","X Square")
  return(out_mat)
}
x <- c(1,2,3,4,5,6,7,8,9)


knitr:: kable(pd1(x), digits=3, align="c")

digit	Probability	Expected Return Days	Observation Return Days	X Square
1	0.301	652.332	698	3.197
2	0.176	381.590	387	0.077
3	0.125	270.742	232	5.544
4	0.097	210.004	228	1.542
5	0.079	171.586	162	0.536
6	0.067	145.074	136	0.568
7	0.058	125.669	124	0.022
8	0.051	110.848	108	0.073
9	0.046	99.156	92	0.517

# chi square score #
sum(pd1(x)[,5])

## [1] 12.07466

# P-value #
pchisq(sum(pd1(x)[,5]), df=9-1, lower.tail=F)

## [1] 0.1479035

The test hypothesis: \[H_0: \text{The first digits in the firt digits in days with very small returns from 1986 to 1995 follows uniform distribution}\] \[H_a: \text{The first digits in the firt digits in days with very small returns from 1986 to 1995 do not follows uniform distribution}\]

\[\begin{equation} \begin{split} \text{Total count} &= 37+45+41+38+38+39+45+10+34 = 357 \\ E(D_1=1) &= \dots = E(D_1=9) = \frac{1}{9} \times 357 = 39.667 \\ \text{$\chi^2$ Test Statistic:} \hspace{1cm} \chi^2 &= \sum^9_{i=1} \frac{\left(D_{1i}-E(D_1=i) \right)^2}{E\left(D_1=i\right)} \\ &= \text{Calculated by "hand" in R shown below} \\ &= 2.622 \\ \text{P-value} &\approx 0.956 \\ \end{split} \end{equation}\]

# probability and expected nonzero returns for i=1 to 9 #
pd1 <- function(x) {
  out_mat <- matrix(,ncol=2, nrow=9)
  o <- c(37, 45, 41, 38, 38, 39, 45, 40, 34)
  for (i in 1:9){
    out_mat[i,1] <- 1/length(x)
    out_mat[i,2] <- 1/9*sum(o)
  }
  obs <- matrix(o, ncol=1)
  chi2 <- (out_mat[,2]-obs)^2 / out_mat[,2]
  out_mat <- cbind(x, out_mat,obs, chi2)
  colnames(out_mat) <- c("digit","Probability", "Expected Return Days", "Observation Return Days","X Square")
  return(out_mat)
}
x <- c(1,2,3,4,5,6,7,8,9)


knitr:: kable(pd1(x), digits=3, align="c")

digit	Probability	Expected Return Days	Observation Return Days	X Square
1	0.111	39.667	37	0.179
2	0.111	39.667	45	0.717
3	0.111	39.667	41	0.045
4	0.111	39.667	38	0.070
5	0.111	39.667	38	0.070
6	0.111	39.667	39	0.011
7	0.111	39.667	45	0.717
8	0.111	39.667	40	0.003
9	0.111	39.667	34	0.810

# chi square score #
sum(pd1(x)[,5])

## [1] 2.621849

# P-value #
pchisq(sum(pd1(x)[,5]), df=9-1, lower.tail=F)

## [1] 0.9558066

With $p-value \approx 0.956$, some will conclude that the null hypothesis will be failed to reject; however, the story is not end here. If we consider testing the following:

\[\begin{equation} \begin{split} H_0: &\text{The first digits in the firt digits in days with very small returns from 1986 to 1995 follows uniform distribution} \\ H_a: &\text{The first digits in the firt digits in days with very small returns from 1986 to 1995 are linearly ordered} \end{split} \end{equation}\]

In mathematical terms, let $\pi_j=$ probability of the first digits in days with very small return, then

\[H_0: \left( \pi_1, \dots, \pi_9 \right) = \left(\frac{1}{9}, \dots, \frac{1}{9}\right)\]

\[H_a: \text{$\pi_j$'s are linearly ordered}\]

Under $H_0$, $X^2_1$ has a $\chi^2_1$ distribution
Calculations:

\[\begin{equation} \begin{split} \mu^0 &= \frac{1}{9} \times (1+\dots+9) = 5 \\ \mu^0_2 &= \frac{1}{9} \times \left[ (1-5)^2+\dots+(1-9)^2 \right]= 6.667 \\ X^2_1 &= \text{Calculated by "hand" in R below} \\ &= 0.136 \\ P-value &\approx 0.712 \\ \end{split} \end{equation}\]

Also, under $H_0$, $X^2 - X^2_1$ has a $\chi^2_7$ distribution.

\[\begin{equation} \begin{split} X^2-X^2_1 &= 2.486 \\ P-value &\approx 0.928 \end{split} \end{equation}\]

Conclusion: base on $5$% significance level, with p-value $\approx 0.712$, we failed to reject the null hypothesis. We do not have enough evidence to say that the first digit in days with very small return do not follow uniform distribution.

mumu <- function(x, o){
  mu0 <- 1/9 * sum(x); 
  mu00 <- c()
  gsj <- c()
  ngsj <- c()
  for (i in 1:length(x)){
   mu00[i] <- (x[i]-mu0)^2
  }
  mu02 <- 1/9*sum(mu00)
  for (j in 1:length(x)){
    gsj[j] <- (x[j]-mu0) / sqrt(mu02)
    ngsj[j] <- o[j]*gsj[j]
  }
  return((1/sqrt(sum(o)) * sum(ngsj))^2)
}
x <- c(1,2,3,4,5,6,7,8,9)
o <- c(37, 45, 41, 38, 38, 39, 45, 40, 34)

# X^2_1 #
mumu(x, o)

## [1] 0.1361345

# P-value #
pchisq(mumu(x,o),df=1, lower.tail=F)

## [1] 0.7121545

#X^2 - X^2_1#
sum(pd1(x)[,5])-mumu(x,o)

## [1] 2.485714

#p-value#
pchisq(sum(pd1(x)[,5])-mumu(x,o), df=9-2, lower.tail=F)

## [1] 0.9281689

Homework 2 Math 258

Liqian Situ

September 4, 2015

Digit	1	2	3	4	5	6	7	8	9
\(\|Return\|\) \(\ge\) 0.1%	698	387	232	228	162	136	124	108	92
\(\|Return\|\) < 0.1%	37	45	41	38	39	39	45	40	34