Mathematical Statistics Project

Question (ii)

Write the likelihood function of the model (for independent observations) and present a way to approximate the maximum likelihood estimator when \(θ = (β, σ^2)\). It is recommended to perform appropriate sequential maximization of the parameters and, after finding detailed solutions, considering at each step the rest constant, apply conditional maximization with the help of an appropriate algorithm; that starts from an initial point and revises the estimates at each full step, with the help of 4 successive substeps. If you want, you can compare this method you made with the results given by a numerical optimization algorithm, e.g. see optim() in R. As data use the 20 measurements \(z_i\) at points \((x_i, y_i)\) given in the auxiliary R file.

SOLUTION
The likelihood function of the model for n independent observations \(z_i\) is \[ L_z(\theta) = \prod\limits_{i=1}^nf_{\theta,\sigma^2}(z_i) = (2\pi\sigma^2)^{-\frac{n}{2}}e^{-\frac{1}{2\sigma^2}\sum\limits_{i=1}^n(z_i-\mu_i(\beta))^2} \] We deduce that, the m.l.e. \(\hat\theta = (\hat\beta, \hat{\sigma^2})\) of \(\theta = (\beta, \sigma^2)\) can be found by the following relations: \[ \hat\beta = \arg\min\sum\limits_{i=1}^n(z_i - \mu_i(\beta))^2 \] \[ \hat{\sigma^2} = \frac{1}{n}\sum\limits_{i=1}^n(z_i - \mu_i(\hat\beta))^2 \] so \(\sigma^2\) will be estimated after obtaining \(\hat\beta\).
Let \(g(\beta) = \sum\limits_{i=1}^n(z_i - \mu_i(\beta))^2 = \sum\limits_{i=1}^n (z_i - 4\pi\beta_1[(x_i-\beta_2)^2+(y_i-\beta_3)^2])^2\).

First Solution

Step 1. Let us consider \(\beta_2, \beta_3\) fixed. Then, if \(r_i^2(\beta_2,\beta_3) = (x_i-\beta_2)^2+(y_i-\beta_3)^2\), we have, by differentiation: \[ \frac{\partial g}{\partial \beta_1} = 0 \iff \beta^{\star}_1(\beta_2,\beta_3) = \frac{\sum\limits_{i=1}^n z_ir_i^2(\beta_2,\beta_3)}{4\pi\sum\limits_{i=1}^nr_i^4(\beta_2,\beta_3)} \space \space \space(\star) \]

Step 2. By substituting \((\star)\) in \(g(\beta)\), after calculations, we have that: \[ g(\beta_2,\beta_3) = \sum\limits_{i=1}^n [z_i - \frac{\sum\limits_{i=1}^n z_i r_i^2}{\sum\limits_{i=1}^n r_i^4} r_i^2]^2 = \sum\limits_{i=1}^n z_i^2 - \frac{(\sum\limits_{i=1}^n z_ir_i^2)^2}{\sum\limits_{i=1}^nr_i^4} \]

Step 3. We deduce that \(\min\limits_{\beta_2,\beta_3} g(\beta_2,\beta_3) = \max\limits_{\beta_2,\beta_3} \frac{(\sum\limits_{i=1}^n z_i r_i^2)^2}{\sum\limits_{i=1}^n r_i^4}\). So, in order to find \(\hat{\beta_2}, \hat{\beta_3}\) we will maximize the root of the function \(\frac{(\sum\limits_{i=1}^n z_i r_i^2)^2}{\sum\limits_{i=1}^n r_i^4}\), which is: \[ h(\beta_2, \beta_3) = \frac{\sum\limits_{i=1}^n z_i[(\beta_2-x_i)^2+(\beta_3-y_i)^2]}{\sqrt{\sum\limits_{i=1}^n [(\beta_2-x_i)^2+(\beta_3-y_i)^2]^2}} \]

Step 4. After obtaining \((\hat{\beta_2}, \hat{\beta_3}) = \max\limits_{\beta_2,\beta_3} h(\beta_2, \beta_3)\), we can find \(\hat{\beta_1}\) through \((\star)\), and, lastly, \(\hat{\sigma^2}\).

Algorithmic Implementation using our data

For the initial point of our algorithm, we will choose the position \((\beta_2, \beta_3)\) from our data that gives us the smallest inverse intensity \(z\), meaning the point where the power is at its highest.

initial <- c(x[which(min(z) == z)], y[which(min(z) == z)])

n <- 20

loglike <- function(b,sigma2) {
  (-n/2)*log(2*pi*sigma2)-(sum((z-4*pi*b[1]*((x-b[2])^2+(y-b[3])^2))^2))/(2*sigma2)
}

h <- function(c) {
  num <- sum(z*((c[1]-x)^2+(c[2]-y)^2))
  den <- sqrt(sum(((c[1]-x)^2+(c[2]-y)^2)^2))
  return(-num/den)
}

b2 <- optim(initial, h, method = "BFGS")$par[1]
b3 <- optim(initial, h, method = "BFGS")$par[2]

b1 <- sum(z*((b2-x)^2+(b3-y)^2))/(4*pi*sum(((b2-x)^2+(b3-y)^2)^2))

sigma2 <- sum((z - 4*pi*b1*((b2-x)^2+(b3-y)^2))^2)/n

l1 <- loglike(c(b1,b2,b3),sigma2)

print(c(b1, b2, b3, sigma2, l1))

## [1]  1.005402e-03  1.031820e+02  5.699525e+01  1.338179e+05 -1.464211e+02

Different starting points

Now, we can pick different starting points based on our data in order to examine any effects they may have on our results.

znew <- sort(z)
num <- c(2,3,10,19,20)

for (i in num) {
  theta2 <- optim(c(x[which(z==znew[i])],y[which(z==znew[i])]), h, method = "BFGS")$par[1]
  theta3 <- optim(c(x[which(z==znew[i])],y[which(z==znew[i])]), h, method = "BFGS")$par[2]

  theta1 <- sum(z*((theta2-x)^2+(theta3-y)^2))/(4*pi*sum(((theta2-x)^2+(theta3-y)^2)^2))
  theta4 <- sum((z - 4*pi*theta1*((theta2-x)^2+(theta3-y)^2))^2)/n
  l <- loglike(c(theta1,theta2,theta3),theta4)

  print(c(theta1, theta2, theta3, theta4, l))
}

## [1]  1.005392e-03  1.031102e+02  5.691996e+01  1.338164e+05 -1.464210e+02
## [1]  1.005392e-03  1.031090e+02  5.692348e+01  1.338164e+05 -1.464210e+02
## [1]  1.170996e-05  4.009705e+03 -7.468160e+03  2.453975e+07 -1.985368e+02
## [1]  1.111807e-07  7.518348e+03 -8.840184e+04  2.151072e+07 -1.972194e+02
## [1]  5.346495e-08 -4.423146e+03  1.280144e+05  2.157565e+07 -1.972495e+02

We chose a range of initial points for \((\beta_2,\beta_3)\), based on the inverse intensity observed at its point; more specifically, after sorting the inverse intensity vector \(z\), we picked the 2nd, 3rd, 10th, 19th and 20th element (as we have already used the first, i.e. the minimum) to obtain the points at which the power is at its highest, medium and lowest respectively. From the outcome above, we deduce that our results depend on the initial value set. It seems that, when when we choose points where the power is decreased, meaning we stray far from the source, our algorithm likely gets stuck in local minima. We observe that the solutions produced with the last three initial points give a smaller value to our log-likelihood function, as opposed to the previous two, which almost match the value found above.

Final Results

We will choose the solution that gives the maximum value to our log-likelihood function:

theta2 <- optim(c(x[which(z==znew[2])],y[which(z==znew[2])]), h, method = "BFGS")$par[1]
theta3 <- optim(c(x[which(z==znew[2])],y[which(z==znew[2])]), h, method = "BFGS")$par[2]

theta1 <- sum(z*((theta2-x)^2+(theta3-y)^2))/(4*pi*sum(((theta2-x)^2+(theta3-y)^2)^2))
theta4 <- sum((z - 4*pi*theta1*((theta2-x)^2+(theta3-y)^2))^2)/n
a <- loglike(c(theta1,theta2,theta3),theta4)

theta2 <- optim(c(x[which(z==znew[3])],y[which(z==znew[3])]), h, method = "BFGS")$par[1]
theta3 <- optim(c(x[which(z==znew[3])],y[which(z==znew[3])]), h, method = "BFGS")$par[2]

theta1 <- sum(z*((theta2-x)^2+(theta3-y)^2))/(4*pi*sum(((theta2-x)^2+(theta3-y)^2)^2))
theta4 <- sum((z - 4*pi*theta1*((theta2-x)^2+(theta3-y)^2))^2)/n
b <- loglike(c(theta1,theta2,theta3),theta4)

print(a<b)

## [1] TRUE

Our results are presented below: \[\begin{eqnarray*} \hat{\beta_1} &=& 0.001005392\\ \hat{\beta_2} &=& 103.109\\ \hat{\beta_3} &=& 56.92348\\ \hat{\sigma^2} &=& 133816.4 \end{eqnarray*}\] We observe that, according to the results above, the value of the log-likelihood function is: \(l(\hat\theta) = -146.4210\).

Second Solution: Conditional Maximization

We will first find analytical solutions for \(\beta_1, \beta_2, \beta_3\), that maximize our function \(g(\beta)\), by considering at each step the rest constant.

Step 2. Now, let us consider \(\beta_1, \beta_3\) fixed. Then, \[\begin{eqnarray*} \frac{\partial g}{\partial \beta_2} = 0 &\iff& \sum\limits_{i=1}^n 2(z_i - 4\pi\beta_1[(x_i-\beta_2)^2 + (y_i-\beta_3)^2])4\pi\beta_12(x_i-\beta_2) = 0 \\ &\iff& n\beta_2^3 + (-3\sum\limits_{i=1}^nx_i)\beta_2^2 + (\sum\limits_{i=1}^n [3x_i^2-\delta_i])\beta_2 + (\sum\limits_{i=1}^n \delta_ix_i-x_i^3) = 0 \end{eqnarray*}\] where \(\delta_i = \frac{z_i}{4\pi\beta_1} - c_i\)and \(c_i = (y_i - \beta_3)^2\). So, \(\hat\beta_2\) is the solution(s) of the above cubic equation, with coefficients dependent on \(\beta_1, \beta_3\).

Step 3. Similarly to Step 2, by considering \(\beta_1, \beta_2\) fixed, we get \(\beta_3\) by solving the following cubic equation with coefficients dependent on \(\beta_1, \beta_2\): \[ \frac{\partial g}{\partial \beta_2} = 0 \iff n\beta_3^3 + (-3\sum\limits_{i=1}^ny_i)\beta_3^2 + (\sum\limits_{i=1}^n [3y_i^2-k_i])\beta_3 + (\sum\limits_{i=1}^n k_iy_i-y_i^3) = 0 \] where \(k_i = \frac{z_i}{4\pi\beta_1} - l_i\) and \(l_i = (x_i - \beta_2)^2\).

Algorithmic Implementation using our data Having found analytical solutions for the parameters \(\beta_1, \beta_2, \beta_3\), we will now construct a conditional maximization algorithm of the following form:

We start at an initial value that we believe is close to the real solution, here: we will choose the position \((\beta_2, \beta_3)\) from our data that gives us the smallest inverse intensity \(z\), along with said intensity, that is \(\beta_1 = z\)
Then, in the n-th repetition of our algorithm, we update the parameters as follows:
1. \(\beta_1^{new} = \frac{\sum\limits_{i=1}^n z_ir_i^2(\beta_2^{old},\beta_3^{old})}{4\pi\sum\limits_{i=1}^nr_i^4(\beta_2^{old},\beta_3^{old})}\)
2. \(\beta_2^{new}\) is the solution of the cubic equation in Step 2 above, with updated coefficients dependent on \(\beta_1^{new}\) and \(\beta_3^{old}\)
3. \(\beta_3^{new}\) is the solution of the cubic equation in Step 2 above, with updated coefficients dependent on \(\beta_1^{new}\) and \(\beta_2^{new}\)
We update our parameters as described above until we reach convergence, i.e. \(\max\limits_{\beta_1,\beta_2,\beta_3}|\beta^{new}-\beta^{old}| <\epsilon\), or the maximum amount of iterations is executed.

Note that \(\beta_2^{new}\) and \(\beta_3^{new}\) are both found by solving a (different) cubic equation. We know that if all of the coefficients of the cubic equation are real numbers -which is the case- then it has at least one real root. More specifically, if \(\Delta = 18abcd -4b^3d+b^2c^2-4ac^3-27a^2d^2\) is the discriminant of the cubic \(ax^3 + bx^2 + cx + d\), then:

If \(\Delta > 0\), the cubic has three distinct real roots
If \(\Delta < 0\), the cubic has one real root and two non-real complex conjugate roots
If \(\Delta = 0\), the cubic has a multiple root

Here, we will check whether both equations have only one real root at every step. If that is the case, by discarding any complex roots, we will get only one solution and we can move on to the next step of the algorithm. If it’s not the case, our code will reflect that by displaying an appropriate message.

library(RConics)

initial <- c(z[which(min(z) == z)], x[which(min(z) == z)], y[which(min(z) == z)])

delta <- function(co) {
  18*co[1]*co[2]*co[3]*co[4] - 4*co[2]^3*co[4] + co[2]^2*co[3]^2 - 4*co[1]*co[3]^3 -27*co[1]^2*co[4]^2
}

cond_max <- function(b1,b2,b3,maxiter,eps){
  iter <- 0
  diff <- 1
  while((diff>eps) & (iter<maxiter)){
    b1bef <- b1
    b2bef <- b2
    b3bef <- b3
  
    
    b1 <- sum(z*((b2bef-x)^2+(b3bef-y)^2))/(4*pi*sum(((b2bef-x)^2+(b3bef-y)^2)^2))
    
    c <- (y-b3bef)^2
    d <- z/(4*pi*b1) - c
    fac1 <- c(n, -3*sum(x), 3*sum(x^2)-sum(d), sum(d*x - x^3))
    roots1 <- cubic(fac1)
    if (delta(fac1) < -eps) {
      b2 <- Re(roots1[Im(roots1)==0])
    } else {
      print("The discriminant is nonnegative!")
      b2 <- NaN
      break
    }
    
    
    l <- (x-b2)^2
    k <- z/(4*pi*b1) - l
    fac2 <- c(n, -3*sum(y), 3*sum(y^2)-sum(k), sum(k*y - y^3))
    roots2 <- cubic(fac2)
    if (delta(fac2) < -eps) {
      b3 <- Re(roots2[Im(roots2)==0])
    } else {
      print("The discriminant is nonnegative!")
      b3 <- NaN
      break
    }
    
    diff <- max(abs(b1-b1bef),abs(b2-b2bef),abs(b3-b3bef))
    iter <- iter + 1
  }
  return(c(b1,b2,b3,iter))
} 

res <- cond_max(initial[1], initial[2], initial[3], 100, 1e-5)
b1 <- res[1]
b2 <- res[2]
b3 <- res[3]

sigma2 <- sum((z - 4*pi*b1*((b2-x)^2+(b3-y)^2))^2)/n

l2 <- loglike(c(b1,b2,b3),sigma2)

print(c(b1, b2, b3, sigma2,l2))

## [1]  1.005392e-03  1.031090e+02  5.692348e+01  1.338164e+05 -1.464210e+02

Different starting points

for (i in 1:20) {
  resu <- cond_max(znew[i], x[which(z==znew[i])], y[which(z==znew[i])], 100, 1e-5)
  theta1 <- resu[1]
  theta2 <- resu[2]
  theta3 <- resu[3]

  theta4 <- sum((z - 4*pi*theta1*((theta2-x)^2+(theta3-y)^2))^2)/n

  l <- loglike(c(theta1,theta2,theta3),theta4)
  if (is.nan(theta2)==F && is.nan(theta3)==F) {
    print(c(theta1, theta2, theta3, theta4, l))
  }
}

## [1]  1.005392e-03  1.031090e+02  5.692348e+01  1.338164e+05 -1.464210e+02
## [1]  1.005392e-03  1.031090e+02  5.692348e+01  1.338164e+05 -1.464210e+02
## [1]  1.005392e-03  1.031090e+02  5.692348e+01  1.338164e+05 -1.464210e+02
## [1]  1.005392e-03  1.031090e+02  5.692348e+01  1.338164e+05 -1.464210e+02
## [1]  1.005392e-03  1.031090e+02  5.692348e+01  1.338164e+05 -1.464210e+02
## [1] "The discriminant is nonnegative!"
## [1] "The discriminant is nonnegative!"
## [1] "The discriminant is nonnegative!"
## [1]  1.005392e-03  1.031090e+02  5.692348e+01  1.338164e+05 -1.464210e+02
## [1] "The discriminant is nonnegative!"
## [1] "The discriminant is nonnegative!"
## [1] "The discriminant is nonnegative!"
## [1] "The discriminant is nonnegative!"
## [1]  1.005392e-03  1.031090e+02  5.692348e+01  1.338164e+05 -1.464210e+02
## [1]  1.005392e-03  1.031090e+02  5.692348e+01  1.338164e+05 -1.464210e+02
## [1] "The discriminant is nonnegative!"
## [1] "The discriminant is nonnegative!"
## [1] "The discriminant is nonnegative!"
## [1] "The discriminant is nonnegative!"
## [1] "The discriminant is nonnegative!"

We choose a range of initial points for \((\beta_1,\beta_2,\beta_3)\), utilizing all of our data, since we assume that some of the starting points won’t produce solutions due to our code being constructed to accommodate only negative discriminants. From the outcome above, we deduce that our results depend on the initial value set, as our assumptions are confirmed. The starting points that do, however, give a solution, all produce the same one.

Final Results

In the following table, we can see the results obtained from the First and Second Solution and compare them; it is clear that they are identical to a certain decimal point -a good indicator that we may have found the correct solution. It is worth mentioning that both solutions give (approximately) the same value to our log-likelihood function.

	\(\hat\beta_1\)	\(\hat\beta_2\)	\(\hat\beta_3\)	\(\hat{\sigma^2}\)	\(l(\theta)\)
First Solution	0.001005392	103.109	56.92348	133816.4	-146.4210
Second Solution	0.001005392	103.109	56.92348	133816.4	-146.4210

Verification: Newton-Raphson method

Manually

library(base)
library(pracma)

f <- function(b) {
  d <- z/(4*pi*b[1]) - (y-b[3])^2
  k <- z/(4*pi*b[1]) - (x-b[2])^2
  
  f1 <- (4*pi*sum(((b[2]-x)^2+(b[3]-y)^2)^2))*b[1] - sum(z*((b[2]-x)^2+(b[3]-y)^2))
  f2 <- n*b[2]^3 -3*sum(x)*b[2]^2 + (3*sum(x^2)-sum(d))*b[2] + sum(d*x - x^3)
  f3 <- n*b[3]^3 -3*sum(y)*b[3]^2 + (3*sum(y^2)-sum(k))*b[3] + sum(k*y - y^3)
  return(c(f1,f2,f3))
}

newt_raph <- function(b,maxiter,eps){
  iter <- 0
  diff <- 1
  while((diff>eps) & (iter<maxiter)){
    bold <- b
    
    b <- bold + solve(jacobian(f,bold),-f(bold))
    
    diff <- max(abs(b - bold))
    iter <- iter + 1
  }
  c(b, iter)
}

results <- newt_raph(c(0.0001, 100, 50), 100, 1e-5)
print(results)

## [1] 1.005392e-03 1.031090e+02 5.692348e+01 7.000000e+00

With the help of the “rootSolve” package

library(rootSolve)

model <- function(b){
  d <- z/(4*pi*b[1]) - (y-b[3])^2
  k <- z/(4*pi*b[1]) - (x-b[2])^2
  
  f1 <- (4*pi*sum(((b[2]-x)^2+(b[3]-y)^2)^2))*b[1] - sum(z*((b[2]-x)^2+(b[3]-y)^2))
  f2 <- n*b[2]^3 -3*sum(x)*b[2]^2 + (3*sum(x^2)-sum(d))*b[2] + sum(d*x - x^3)
  f3 <- n*b[3]^3 -3*sum(y)*b[3]^2 + (3*sum(y^2)-sum(k))*b[3] + sum(k*y - y^3)
  c(F1 = f1, F2 = f2, F3 = f3)
}

result <- multiroot(f = model, start = c(0.0001, 100, 50))
print(result)

## $root
## [1] 1.005392e-03 1.031090e+02 5.692348e+01
## 
## $f.root
##            F1            F2            F3 
## -3.051758e-05  7.152557e-07  3.576279e-07 
## 
## $iter
## [1] 9
## 
## $estim.precis
## [1] 1.053015e-05

Remarks & Conclusion

We observe that the results for \(\beta = (\beta_1, \beta_2, \beta_3)\) from the above solutions and the Newton-Raphson method are almost identical. We deduce that the solution \(\hat\theta = (\hat\beta_1, \hat\beta_2, \hat\beta_3, \hat{\sigma^2}) = (0.001005392, 103.109, 56.92348, 133816.4)\) is sufficiently close to the source of the nuclear weapons, and soon harmony will be restored.

Mathematical Statistics Project

National and Kapodistrian University of Athens

Lito Konidari

Academic Year 2021-2022

Introduction

Question (i)

Question (ii)

First Solution

Second Solution: Conditional Maximization

Verification: Newton-Raphson method

Remarks & Conclusion