## Problem 1a

$\nabla f(\theta) = (-2\theta_1, -4\theta_2, -6\theta_3)^{T}$

## Problem 1b

$\nabla^2 f(\theta) = \begin{equation*} \mathbf{}\left[\begin{matrix} -2 & 0 & 0\\ 0 & -4 & 0\\0 & 0 & -6\ \end{matrix}\right] \end{equation*}$

## Problem 1c

We will test if $$d^{T}\nabla f(\theta_0) > 0$$. See code below. The output is -16, so this implies that $$d = (1,1,1)^{T}$$ is not an ascent direction at the initial value $$\theta_0 = (4,2,0)^{T}$$

fn <- function(theta) {
theta1 = theta[1,]
theta2 = theta[2,]
theta3 = theta[3,]
val = theta1^2 + 2*theta2^2 + 3*theta3^2
return(-1*val)
}

theta1 = theta[1,]
theta2 = theta[2,]
theta3 = theta[3,]
val1 = -2*theta1
val2 = -4*theta2
val3 = -6*theta3
return(matrix(c(val1, val2, val3), ncol=1))
}
d = matrix(c(1,1,1), ncol=1)

theta.0 = matrix(c(4,2,0), ncol=1)

t(d) %*% gradient(theta.0)
##      [,1]
## [1,]  -16
#-16 No. It is not an ascent direction.

## Problem 1d

By definition of steep ascent ou direction $$d_n$$ is calculated by $$\nabla f(\theta_n)$$. We do this using the code below.

steep.d = gradient(theta.0)
steep.d
##      [,1]
## [1,]   -8
## [2,]   -8
## [3,]    0
#d = (-8, -8, 0)

Thus, by the code above we see that the direction would be $$d = (-8,-8,0)^{T}$$

## Problem 1e

We check if our new theta, call it $$\theta_1$$ has a greater function output value than our previous $$\theta_0$$. In other words, we will compute $$\theta_1$$ and check if $$f(\theta_1) > f(\theta_0)$$. If this is true, then no step-halving is required. Otherwise, step-halving will be required. We do this in the code below.

next.theta = theta.0 + steep.d

l1 = fn(theta.0) #-24
l2 = fn(next.theta) #-88
if(l2 < l1) {
print("Step Halving Required")
} else {
print("No step halve required")
}
## [1] "Step Halving Required"

## Problem 1f

Now we use Newton’s method to compute the direction at a certain point. According to class notes our direction is $$d_n = -(\nabla^2 f(\theta_n))^{-1} \nabla f(\theta_n)$$. We will compute this in the code below.

hessian <- function(theta) {
theta1 = theta[1,]
theta2 = theta[2,]
theta3 = theta[3,]
val = c(-2, 0, 0, 0, -4, 0, 0, 0, -6)
mat = matrix(val, nrow=3, byrow=TRUE)
return(mat)
}
h = hessian(theta.0)
h.inv = solve(h)
dir
##      [,1]
## [1,]   -4
## [2,]   -2
## [3,]    0
next.theta = theta.0 + dir
next.theta
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0

Thus, our direction would be $$d = (-4,-2,0)^{T}$$ and $$\theta_1 = (0,0,0)^{T}$$

## Problem 1g

fun <- function(teta){
#Objective function
f = -(teta[1] ^2 + 2 * teta[2]^2 + 3 * teta[3]^2)
g = -c(2 * teta[1], 4 * teta[2], 6 * teta[3] )
list(f = f, g = g)
}

# Steepest Ascent Algorithm
steep <- function(teta0, maxit = NULL, tolgrad = NULL, tolerr = NULL){
for(it in 1:maxit){
comp <- fun(teta0)
f0 <- comp$f d <- comp$g
teta1 <- teta0 + d
comp <- fun(teta1)
f1 <- comp$f halve = 0 while(f1 < f0 & halve <= 20) { halve = halve + 1 teta1 = teta0 + d/(2^halve) comp = fun(teta1) f1 = comp$f
}
#--- begin step-halving if needed
#Write the step-halving piece here
#--------End Step halving
# --- check convergence
mre = max(abs(teta1 - teta0)/pmax(1, teta1))
norm_grad = norm(as.matrix(comp$g), "F") print(c(it, halve, f1, norm_grad, mre)) print("__________________________-") if(norm_grad < tolgrad & mre < tolerr){ break } teta0 <- teta1 } } theta.init = c(1,1,1) y = steep(theta.init, maxit=20, tolgrad = 10^-2, tolerr = 10^-2) ## [1] 1.000000 2.000000 -1.000000 3.162278 1.500000 ## [1] "__________________________-" ## [1] 2.000000 2.000000 -0.250000 1.581139 0.750000 ## [1] "__________________________-" ## [1] 3.0000000 2.0000000 -0.0625000 0.7905694 0.3750000 ## [1] "__________________________-" ## [1] 4.0000000 2.0000000 -0.0156250 0.3952847 0.1875000 ## [1] "__________________________-" ## [1] 5.00000000 2.00000000 -0.00390625 0.19764235 0.09375000 ## [1] "__________________________-" ## [1] 6.0000000000 2.0000000000 -0.0009765625 0.0988211769 0.0468750000 ## [1] "__________________________-" ## [1] 7.0000000000 2.0000000000 -0.0002441406 0.0494105884 0.0234375000 ## [1] "__________________________-" ## [1] 8.000000e+00 2.000000e+00 -6.103516e-05 2.470529e-02 1.171875e-02 ## [1] "__________________________-" ## [1] 9.000000e+00 2.000000e+00 -1.525879e-05 1.235265e-02 5.859375e-03 ## [1] "__________________________-" ## [1] 1.000000e+01 2.000000e+00 -3.814697e-06 6.176324e-03 2.929688e-03 ## [1] "__________________________-" ## Problem 2a $d\pi(d\beta) = \frac{(x_i^{T})(d\beta)(\exp(x_i^{T}\beta))}{(1+\exp(x_i^{T}\beta))^2}$ ## Problem 2b $\frac{\partial\ell(\beta)}{\partial\beta_1} = \sum_{i=1}^n(y_i - \pi_i(\beta))$ $\frac{\partial\ell(\beta)}{\partial\beta_2} = \sum_{i=1}^n(y_i - \pi_i(\beta))log(D_i)$ $\frac{\partial\ell(\beta)}{\partial\beta_3} = \sum_{i=1}^n(y_i - \pi_i(\beta))S_i$ In general, if we wanted to write a formula that worked for all $$j$$ we would write: $\frac{\partial\ell(\beta)}{\partial\beta_j} = \sum_{i=1}^n(y_i - \pi_i(\beta))x_{ij}$ ## Problem 2c A general formula that holds is the following: $\frac{\partial^2\ell(\beta)}{\partial\beta_j \partial \beta_k}= \sum_{i=1}^{n}(\pi_i(\beta))[\pi_i(\beta) - 1]x_{ij}x_{ik}$ The above is derived by setting $$d\beta$$ equal to the vector $$e_j$$ for the appropriate choice of $$j$$ and $$k$$ ## Problem 2d ## Problem 2d Part(1 and 2) data = read.table("blowBF.txt", header=TRUE) head(data) ## D S y ## 1 9 0.0242120 0 ## 2 11 0.0305947 0 ## 3 9 0.0305947 0 ## 4 9 0.0341815 0 ## 5 5 0.0341815 0 ## 6 8 0.0341815 0 ## Problem 2d Part3 #X = matrix(0,nrow(data), 3) rowlist = list() for(i in 1:nrow(data)) { current.row1 = 1 current.row2 = log(data[i,1]) current.row3 = data[i,2] rowlist[[i]] = c(current.row1, current.row2, current.row3) } X = matrix(unlist(rowlist),ncol=3, byrow=TRUE) head(X) ## [,1] [,2] [,3] ## [1,] 1 2.197225 0.0242120 ## [2,] 1 2.397895 0.0305947 ## [3,] 1 2.197225 0.0305947 ## [4,] 1 2.197225 0.0341815 ## [5,] 1 1.609438 0.0341815 ## [6,] 1 2.079442 0.0341815 ## Problem 2d Part 4 data <- read.table(file = "blowBF.txt", header = TRUE) n = dim(data)[1] rowlist = list() #----------------------------- for(i in 1:nrow(data)) { current.row1 = 1 current.row2 = log(data[i,1]) current.row3 = data[i,2] rowlist[[i]] = c(current.row1, current.row2, current.row3) } X = matrix(unlist(rowlist),ncol=3, byrow=TRUE) #----------------------------- beta = c(-9.562079,3.197561,4.5085) Hessian <- function(n, X = NULL, beta = NULL){ beta = matrix(beta, nrow = 3, ncol = 1) E = exp(X %*% beta) P = E / (1 + E) #--- begin writing the code for the Hessian #diagonal entries H = matrix(0, 3, 3) for(j in 1:3) { sum = 0 for(i in 1:nrow(X)) { val = P[i,]*(P[i,] - 1)*X[i,j]*X[i,j] sum = sum + val } H[j,j] = sum } for(j in 1:3) { sum = 0 if(j == 3) k = 1 else k = (j+1) for(i in 1:nrow(X)) { val = P[i,]*(P[i,] - 1)*X[i,j]*X[i,k] sum = sum + val } H[j,k] = sum H[k,j] = sum } #---- end of computing Hessian return(H) } out = Hessian(n, X, beta) out ## [,1] [,2] [,3] ## [1,] -90.10259 -202.59774 -37.17247 ## [2,] -202.59774 -467.02995 -82.39195 ## [3,] -37.17247 -82.39195 -19.21699 ## Problem 2f We check to see if the Hessian evaluated at the beta.star is negative definite by checking if all the eigenvalues are negative. If the Hessian evaluated at beta.star is negative definite, this ensures that beta.star is a local maximum. e.values = eigen(out)$values
e.values
## [1]   -1.408105   -4.843684 -570.097749

Therefore, we can conclude that the Hessian evaluted at beta.star is negative definite which means that beta.star is a local maximum.

## Problem 2h

We will appeal to a theorem in class that states the following. For large n we have:
$SE(\hat{\theta_i}) \approx \sqrt{[-\nabla^{2}\ell(\hat{\theta})^{-1}]_{ii}}$

h.inv = solve(-1*out)

se.vec = rep(0,3)
for(i in 1:3) {
se.vec[i] = sqrt(h.inv[i,i])
}

se.vec
## [1] 0.7498812 0.2998976 0.5158705

Therefore, $SE(\beta_1) \approx 0.7498\\ SE(\beta_2) \approx 0.2998\\ SE(\beta_3) \approx 0.5158$