(Instructor : Nishant Panda)
We continue with root finding techniques. Today we will cover the most widely used root finding technique called Newton’s method (Newton Raphson).
We assume that \(f\) is smooth (differentiable). Our goal is to find a root of \(f\) i.e a \(x\) value for which \(f(x) = 0\). We illustrate the method with a simple example for which you know the root
First step : Plot.
x.seq <- seq(from = 0.1, to = 10, length.out = 101)
f <- sapply(x.seq, function(x) x^2 - 9)
library(ggplot2)
# put the x and y values in a data frame
plot_data <- data.frame("x" = x.seq, "f" = f)
# create the base layer of ggplot with geom_Line
plot_f <- ggplot(plot_data, aes(x = x, y = f)) +
geom_line(color = "red", size = 1.0) +
labs(x = "x", y = "f(x) = x^2 - 9")
# beautify : Increase font size in axes
plot_f + theme(axis.text.x =
element_text(face = "bold",size = 12),
axis.text.y =
element_text(face = "bold", size = 12),
axis.line = element_line(colour = "black",
size = 1, linetype = "solid"))
Looks like the function is 0 around 3. This matches up with what we expect. Let us zoom in.
plot_f +
geom_hline(yintercept = 0.00, color = "green", size = 1) +
coord_cartesian(xlim = c(2,4), ylim = c(-1, 3))
Newtons method starts with an initial guess and updates its guess by using the derivative of \(f\). The derivative of \(f\) is given by, \[ f'(x) = 2x \]
The equation of the tangent line to \(f\) at a point \(a\) is given by \[ y = f'(a)(x - a) + f(a). \] Can you derive this equation using the point-slope form of a line?
This is how Newton’s method works.
Since \(f\) is approximately equal to its tangent line \(y\), \(f\) will be \(0\) approximately when \(y\) will be \(0\). By setting \(y\) equal to \(0\) we get our updated guess \[ x^{(1)} = x^{(0)} - \frac{f\left(x^{(0)}\right)}{f'\left(x^{(0)}\right)} \]
In general starting at guess \(x^{(n)}\), our updated guess is \[ x^{(n+1)} = x^{(n)} - \frac{f\left(x^{(n)}\right)}{f'\left(x^{(n)}\right)} \]
We stop if \[ \lvert x^{(n+1)} - x^{(n)} \rvert < \epsilon. \] Or at a pre-set number of guesses \(M\) if our guesses diverges.
Let us visualize this to find the root of our function \(f(x) = x^2 - 9\).
First we need to create a function that returns the tangent line to the function \(f\) at some point \(a\).
# Let us create our tangent line function at some point a
f_tangent <- function(x, f, df, a){
# We need the function and its derivative df
# a is the point where we create the tangent to the function
# returns the tangent line
df(a)*(x - a) + f(a)
}
We plot the tangent line to \(f\) at \(x^{(0)} = 4\) which will be our first guess of the root.
# add the tangent line at 4 to our data frame
plot_data$f.tan <- sapply(plot_data$x, f_tangent, f = function(x) x^2 - 9, df = function(x) 2*x, a = 4)
# create the base layer of ggplot with geom_Line
plot_tangent_line <- ggplot(plot_data, aes(x = x)) +
geom_line(aes(y = f), color = "red", size = 0.95) +
geom_line(aes(y = f.tan), color = "blue", size = 0.95) +
labs(x = "x", y = "f(x) and tangent line at 4")
plot_tangent_line
Now let us zoom in on \(2\leq x \leq 4\) and try to see where the tangent line is equal to \(0\).
# zoom in between 3 and 4
plot_tangent_line +
# add a horizontal line at 0
geom_hline(yintercept = 0.00, color = "green", size = 1) +
coord_cartesian(xlim = c(2,4), ylim = c(-5, 10))
Looks like the tangent line is \(0\) around \(3.1\). This will be our updated guess at the root. Understand the rationale behind this. If \(f\) is approximately equal to its tangent line at \(4\), then the root of \(f\) is approximately equal to the root of the tangent line. Can you see this in the plot above?
The exact value when the tangent line is \(0\) is given by \[ x^{(1)} = 4 - \frac{f\left(4\right)}{f'\left(4\right)} = 3.125 \]
# add the tangent line at 3.125 to our data frame
plot_data$f.tan <- sapply(plot_data$x, f_tangent, f = function(x) x^2 - 9, df = function(x) 2*x, a = 3.125)
# create the base layer of ggplot with geom_Line
plot_tangent_line <- ggplot(plot_data, aes(x = x)) +
geom_line(aes(y = f), color = "red", size = 0.95) +
geom_line(aes(y = f.tan), color = "blue", size = 0.95) +
labs(x = "x", y = "f(x) and tangent line at 4")
plot_tangent_line
Now let us zoom in on around \(3\) and try to see where the tangent line is equal to \(0\).
# zoom in between 2.9 and 3.1
plot_tangent_line +
# add a horizontal line at 0
geom_hline(yintercept = 0.00, color = "green", size = 1) +
coord_cartesian(xlim = c(2.9,3.1), ylim = c(-2.5, 2.5))
Looks like the tangent line is \(0\) around \(3\). In just two steps we are very close. This will be our updated guess at the root. The exact value when the tangent line is \(0\) is given by \[
x^{(1)} = 3.125 - \frac{f\left(3.125\right)}{f'\left(3.125\right)}
= 3.0025
\]
We can continue until our guesses keep getting smaller and smaller.
# Newton's method from package NLRoot
library(NLRoot)
# first argument is the function whose root we need to find
# second argument is the the derivative of the function whose
# root we need to find.
# third argument is an initial guess
NIMfzero(function(x) x^2-9, function(x) 2*x, x0 = 4)
## [1] 3
## [1] 1.081801e-12
## [1] "finding root is successful"
Please read “Faliure Analysis” in https://en.wikipedia.org/wiki/Newton%27s_method
Recall that our motivation for studying root finding was to determine the maxima. In other words if \(g(x)\) is our function that we need to optimize then we check the roots for the function \(f(x) = g'(x)\). In order to use Newton’s method we need \(f'(x) = g''(x)\) and thus we need our function \(g\) to be atleast twice differentiable.
With this in mind, our update equation for the bisection method is \[ x^{(n+1)} = \left(\frac{a_{n+1} + b_{n+1}}{2}\right), \] where, \[ \left[a_{n+1}, b_{n+1}\right] = \] \[ \begin{cases} \left[a_{n}, x^{(n)}\right], & \text{if } g'(a_{n})g'(x^{(n)}) \leq 0 \\ \left[x^{(n)}, b_{n}\right], & \text{if } g'(a_{n})g'(x^{(n)}) > 0 \end{cases} \] and, our update equation for the Newton’s method is \[ x^{(n+1)} = x^{(n)} - \frac{g'\left(x^{(n)}\right)}{g''\left(x^{(n)}\right)} \]
So, far we have talked about optimizing functions whose input space is a subset of the Real numbers. We now extend this to real valued functions that take a vector as input.
Let \(g\) be a function that takes an input vector \(\mathbf{x} = (x_1, x_2, \dots, x_p)\) in \(\mathbb{R}^{p}\) and gives a real number as output. We wish to find the input vector \(\mathbf{x}^{\ast}\) where \(g\) is maximum (or minimum!). Let \(\nabla{g}(\mathbf{x})_{p\times 1}\) be the gradient vector (the first derivative (\(g'(\mathbf{x})\))) and let \(H_{p\times p}(\mathbf{x})\) be the Hessian matrix (the second derivative \(g''(\mathbf{x})\)). If \(p = 1\), we are at the familiar setting of a real valued input space.
The following steps outline Newton’s method for finding the maxima \(\mathbf{x}^{\ast}\) (optima) of \(g(\mathbf{x})\).
Let \(\mathbf{x}^{(n)}\) be the current guess for the optima \(\mathbf{x}^{\ast}\). Approximate \(g(\mathbf{x}^{\ast})\) by, \[ g(\mathbf{x}^{\ast}) \approx g(\mathbf{x^{(n)}}) + (\mathbf{x^{\ast}} - \mathbf{x}^{(n)})^{T}\nabla{g}(\mathbf{x^{(n)}})_{p\times 1} + (\mathbf{x^{\ast}} - \mathbf{x}^{(n)})^{T}H_{p\times p}(\mathbf{x^{(n)}})(\mathbf{x^{\ast}} - \mathbf{x}^{(n)}). \]
Take the gradient of this approximation which is approximately 0(Why?) \[ \nabla{g}(\mathbf{x^{(n)}})_{p\times 1} + H_{p\times p}(\mathbf{x^{(n)}})(\mathbf{x^{\ast}} - \mathbf{x}^{(n)}) \approx 0 \]
Solve for \(\mathbf{x}^{(n+1)}\) by setting the equation above exactly to \(0\) and replacing \(\mathbf{x}^{\ast}\) by \(\mathbf{x}^{(n+1)}\) . This amounts to solving the linear equation for \(x^{(n+1)}\). \[ A_{p\times p}\mathbf{x^{(n+1)}} = b_{p\times 1}, \] where \(A_{p\times p}\) is the Hessian matrix at \(\mathbf{x}^{(n)}\) i.e \(A_{p\times p} = H_{p\times p}(\mathbf{x^{(n)}})\) and \(b_{p\times 1}\) is the vector \(H_{p\times p}(\mathbf{x^{(n)}})\mathbf{x}^{(n)} - \nabla{g}(\mathbf{x^{(n)}})_{p\times 1}\).
Thus your updated guess \(\mathbf{x}^{(n + 1)}\) \[ \mathbf{x}^{(n + 1)} = \mathbf{x}^{(n)} - H_{p\times p}^{-1}(\mathbf{x^{(n)}})\nabla{g}(\mathbf{x^{(n)}})_{p\times 1} \] (Note that in real life we never compute the inverse of a matrix unless \(p\) is very small!)
Note that \(g\) is a constant and a difference of two squares. Squares are always non-negative. This means that an upper bound for \(g\) is given by \(10\). Now \(g\) can attain \(10\) if and only if both the square terms are \(0\) at the same vector \(\mathbf{x} = (x_1, x_2)\). Trial and error shows that the two square terms are \(0\) at \((3,2)\). Hence, without using optimization we found a maxima!
First step: Plot!
Use your functions(contour2D, surface3D) from Home Assignment in Lecture 1 to plot the contours and surface.
contour2d(10 - (x^2 + y -11)^2 - (x + y^2 - 7)^2, -5, 5, -5, 5, 101)
surface3d(10 - (x^2 + y -11)^2 - (x + y^2 - 7)^2, -5, 5, -5, 5, 101, phi = 30, theta = 30 )
Now, we write our own function to find a maxima. Our function would need
We will use r packaged-functions for these (Why invent the wheel?)
g <- function(x){
# x is a vector in R2
10 - (x[1]^2 + x[2] -11)^2 - (x[1] + x[2]^2 - 7)^2
}
library(numDeriv)
# Let us check the gradient and Hessian at (3,2)
a <- c(3,2)
round(grad(g, a), 5)
## [1] 0 0
round(hessian(g, a), 5)
## [,1] [,2]
## [1,] -74 -20
## [2,] -20 -34
Please check these by hand calculation! (Note that we found by investigation that the maxima is at (3,2). Thus, the gradient is \(0\) there!)
# Let us check the "two" norm
# We know that the "two" norm of (1,1) is sqrt(2).
norm(c(1,1), type = "2")
## [1] 1.414214
# our optimize function
# by default we will have eps = 1e-5 and norm type is Euclidean
# by default our max iteration number M will be 10000
my_optim_newtons <- function(g, norm.type = "2", eps = 1e-5, x0, max.iter = 1e4){
library(numDeriv)
# g is the function to optimize
# x0 is our starting guess
# set your iteration counter to 0
iter.count <- 0
# set your initial error to a large number
error <- 100
# set your current guess to x0
x <- x0
while(error > eps){
# HW. incorporate this check
# if current iteration is above max.iter
# if(iter.count >= max.iter) break("Reached max iter")
# update your guess
A <- hessian(g,x)
b <- A %*% x - grad(g, x)
x_new <- solve(A, b)
# get the error
error <- norm((x_new - x), type = norm.type)
# update your iteration counter
iter.count <- iter.count + 1
# set the current x to new
x <- x_new
}
# How do you modify the return statement if you run out of iterations (iter.max is reached)
return(x)
}
# Let us try a starting value x0 = (2,3)
x <- my_optim_newtons(g, x0 = c(2,3))
print(x)
## [,1]
## [1,] 3
## [2,] 2
g(x)
## [1] 10
This is indeed the maxima we discovered earlier.
# what if we try x0 = (-3,2)
x <- my_optim_newtons(g, x0 = c(-3,2))
print(x)
## [,1]
## [1,] -2.805118
## [2,] 3.131313
g(x)
## [1] 10
We found a different maxima!