DATA 621 blog 5

Newton’s Method and Gradient Descent

Ordinary least squares is a special case of machine learning algorithms because model parameters can be computed by explicit formulas. In general, machine learning algorithms seek to minimize a cost/risk/loss function for which optima cannot be computed explicitly. In these cases, we need a plan for efficiently searching the objective function for minima. One such plan is called gradient descent.

A second method for finding minima is Newton’s method. Newton’s method is an algorithm for determining an extreme value for a function of one variable. The algorithm generates successive approximations of the value of \(x\) that optimizes the function. Newton’s formula is the expression for the \(k + 1\)th approximation given the \(k\)th. The general formula is

\[x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)}\]

We can use Newton’s Method to find the minimum of the function \(f(x) = (3x^4-4x^3)/12\) in the range of \([-10, \, 10]\). For this function, Newton’s formula becomes:

\[x_{k+1} = x_k - \frac{x^3_k-x^2_k}{3x^2_k-2x_k}\]

Implementing Newton’s formula:

#Original function:
ex1 <- function(x){
  res <- (3 * x^4 - 4 * x^3) / 12
  return(res)
}

#Newton's formula:
Newt <- function(x){
  res <- x - ((x^3 - x^2)/(3 * x^2 - 2 * x))
  return(res)
}

We can build the function NewtLoop, which uses Newt to perform Newton’s algorithm for finding the value of \(x\) that minimizes \(f(x)\). When successive approximations differ by less than D, the loop stops. NewtLoop returns a list containing x, the near-optimum value of \(x\), value, the value of the function defined above, and iters, the number of itereations required to determine x.

#xk is the initial guess.

#D is the maximum difference between successive guesses permitted before terminating.

NewtLoop <- function(xk, D){
  res <- list()
  D <- D
  iters <- 0
  cond <- TRUE
  while(cond){
    xkp1 <- Newt(xk)
    d <- abs(xkp1 - xk)
    xk <- xkp1
    iters <- iters + 1
    cond <- d > D
  }
  res$x <- xk
  res$value <- ex1(xk)
  res$iters <- iters
  return(res)
}

print(NewtLoop(-3, 0.01))

## $x
## [1] -0.009076013
## 
## $value
## [1] 2.509056e-07
## 
## $iters
## [1] 10

With an initial guess less than \(0\), Newton’s Method returns a value for both \(x\) and \(f(x)\) near \(0\). Although \((0,0)\) is a critical point for the function, it is not an extremum. This is a common issue when using Newton’s Method to locate extrema: other methods must also be used to verify that the critical points located by Newton’s Method are in fact optimal.

print(NewtLoop(3, 0.01))

## $x
## [1] 1.000142
## 
## $value
## [1] -0.08333332
## 
## $iters
## [1] 6

With a positive initial guess, we are able to come very close to the true global minimum for the function. The computed values are very close to the true values, \((1, \, -1/12)\).

Newton’s method is a useful starting point for understanding iterative methods for minimizing cost functions. Generalizations and varieties of this process underlie machine learning algorithms across the field.

DATA 621 blog 5

Daniel Moscoe

11/28/2021

Newton’s Method and Gradient Descent