\(1\). We would like to find \(\sigma^2=E\left[ X^2 \right]\) where \(X\) has density \(f(x) = cexp(−|x|3/3)\) (\(c\) is some unkown proportionality constant to make \(f\) a density).

\((a)\) Use importance sampling with standardized weigths and a standard Normal envelope to estimate \(\sigma^2\) \(100\) times each for sample sizes \(n = 10^k\) (\(k = 2, . . . , 6\)). Record the mean and the variance of the estimates for each sample size value.

Since the area under the density curve must be one, so

\[\int^\infty_{-\infty} c exp \left(-\frac{|x|^3}{3} \right) dx =1\]

We know

\[f(x) = \left\{ \begin{array}{ll} c exp\left( \frac{X^3}{3} \right), \hspace{1cm} \text{$-\infty \le X < 0$} \\ c exp\left( \frac{-X^3}{3} \right), \hspace{1cm} \text{$0 \ge X \ge \infty$} \\ \end{array} \right.\]

So

set.seed(10)
m.est.a <- function(n){
 X <- rnorm(n)
 h <- X^2*exp(-abs(X^3)/3)
 w.star <- dunif(X,0,2.575799)/dnorm(X,0,1)
 w <- w.star/sum(w.star)
 mu <- sum(h*w)
 return(mu)
}
m.est.a <- Vectorize(m.est.a)

size=c(10^(2:6))

results.m.est.a <- matrix(ncol=5, nrow = 100, dimnames = list(c(1:100),size))
for (n in 1:5){
  for (k in 1:100){
  results.m.est.a[k,n] <- m.est.a(size[n])
  }
}

# compute mean and variance of estimate sigma^2 #
# mean #
mean.a <- apply(results.m.est.a,2,mean)
mean.a
##       100      1000     10000     1e+05     1e+06 
## 0.4003591 0.3883227 0.3866871 0.3869102 0.3869666
# variance #
variance.a<- apply(results.m.est.a,2,var)
variance.a
##          100         1000        10000        1e+05        1e+06 
## 4.739916e-03 3.514976e-04 4.732436e-05 4.353396e-06 4.232483e-07

\((b)\) Use importance sampling with standardized weigths and use the \(t-\)density with degree of freedom one as an envelope to estimate \(\sigma^2\) \(100\) times each for sample sizes \(n = 10^k\) \((k = 2, . . . , 6)\). Record the mean and the variance of the estimates for each sample size value.

set.seed(10)
m.est.b <- function(n){
 X <- rt(n,1)
 h <- X^2*exp(-abs(X^3)/3)
 w.star <- dunif(X,0,2.575799)/dt(X,1)
 w <- w.star/sum(w.star)
 mu <- sum(h*w)
 return(mu)
}
m.est.b <- Vectorize(m.est.b)

size=c(10^(2:6))

results.m.est.b <- matrix(ncol=5, nrow = 100, dimnames = list(c(1:100),size))
for (n in 1:5){
  for (k in 1:100){
  results.m.est.b[k,n] <- m.est.b(size[n])
  }
}

# compute mean and variance of estimate sigma^2 #
# mean #
mean.b <- apply(results.m.est.b,2,mean)
mean.b
##       100      1000     10000     1e+05     1e+06 
## 0.3888688 0.3857045 0.3868080 0.3864605 0.3869066
# variance #
variance.b<- apply(results.m.est.b,2,var)
variance.b
##          100         1000        10000        1e+05        1e+06 
## 2.532522e-03 2.917780e-04 2.748468e-05 2.588500e-06 2.683890e-07

\((c)\) Provide the best estimate for \(\sigma^2\) you can (with reasonable accuracy). That is, only report digits for which you can be (say 95%) confident that they are correct.

Note: If the higher sample sizes give you trouble please first check your code for efficiency. If that’s not the problem and you are still not able to run those simulations, go only to \(10^5\).

\(2\). The dataset “Normal.txt” contains observations \(x_1, . . . , x_100\) generated from a Normal distribution with (unknown) mean \(\mu\) and variance \(\sigma^2\). Recall, that the likelihood function of Normal observations is

\[L(\mu,\sigma^2,x_1,\dots,x_n)=\left( \frac{1}{\sqrt{2\pi\sigma^2}} \right)^n exp \left(-\frac{1}{2\sigma^2} \sum^n_{i=1} \left( x_i-\mu \right)^2 \right)\]

and the log-likelihood function is

\[\ell (\mu,\sigma^2,x_1,\dots,x_n) = -\frac{n}{2} ln(2\pi)-\frac{n}{2}ln(\sigma^2)-\frac{1}{2\sigma^2}\sum^n_{i=1} \left(x_i-\mu \right)^2\]

\((a)\) For the given data values \(x_1, . . . , x_{100}\) plot a contour plot of the likelihood function as a function of μ and \(\sigma^2\). (Choose meaningful ranges for both axes so that the peak of the likelihood function can be seen in your plot).

Note: Remind yourself of the maximum-likelihood estimates of a Normal sample, if necessary, and compute them for the given data. However, since this is an exercise in implementing the gradient ascent method, now forget those values again and let’s find them numerically.

\((b)\) Find the gradient of the log-likelihood function. That is, find

\[\Delta \ell = \left( \frac{\partial}{\partial \mu} \ell, \frac{\partial}{\partial \sigma^2} \ell \right)\]

\((c)\) Using starting values \(s_1 = (3.7, 0.87), s_2 = (3.5, 0.6)\) and \(s_3 = (1.5, 0.8)\) implement a gradient ascent to find the maximum likelihood estimates of \(\mu\) and \(\sigma^2\). That is, implement the algorithm

\[X_{n-1}=X_n-\gamma \Delta \ell \left( X_n \right)\]

where

\[\gamma = \frac{0.01}{|\Delta \ell |}\]

(Note: the constant \(0.01\) will make the convergence slower, but will also make it less likely that the algorithm “gets stuck” after overstepping the goal. \(|\Delta \ell |\) is the euclidean norm of the gradient vector). To implement a stopping criterion for your algorithm, check whether the value of the log-likelihood at the update changes by more than \(0.01%\). That is, stop if

\[\frac{|\ell \left( X_{n+1} \right) - \ell \left( X_n \right) |}{|\ell \left( X_n \right) |} < 0.0001\]

Show the three starting points and search paths in your contour plot.