(Instructor : Nishant Panda)
Last class we talked about maximum likelihood estimate. To find the maxima (point where maximum occurs) of the log likelihood \(l(\theta)\), we had to solve for \(\nabla{l(\theta)} = \textbf{0}\). This is a common task in statistics and falls under a large class of numerical anlaysis techniques called root finding .
Let us get back to the case of a real valued function with input space in \(\mathbb{R}\). Specifically, let \(g(x)\) be a function whose maxima we wish to find by solving for the equation \(g'(x) = 0\).
We assume that \(g\) is smooth. The optimum of \(g\) is a solution to \[g'(x) = 0.\]
Not that we haven’t specified a domain for \(x\). In this case, we are looking for a maxima \(x^{\ast}\) over the whole input space of \(f\) (where it is defined). In this case, it is \(x > 0\).
First step : Plot.
x.seq <- seq(from = 0.1, to = 10, length.out = 101)
g <- sapply(x.seq, function(x) log(x)/(1+x))library(ggplot2)
# put the x and y values in a data frame
plot_data <- data.frame("x" = x.seq, "g" = g)
# create the base layer of ggplot with geom_Line
plot_g <- ggplot(plot_data, aes(x = x, y = g)) +
geom_line(color = "red", size = 1.5) +
labs(x = "x", y = "g(x)")
# beautify : Increase font size in axes
plot_g + theme(axis.text.x =
element_text(face = "bold",size = 12),
axis.text.y =
element_text(face = "bold", size = 12),
axis.line = element_line(colour = "black",
size = 1, linetype = "solid")) Most of the “action” seems to be around \(1\leq x \leq 4\). Let us zoom in.
plot_g + coord_cartesian(xlim = c(1,4), ylim = c(-0.25, 0.35)) Looks like the maxima \(x^{\ast}\) is between \(3\) and \(4\).
If so, then \(g'(x)\) must be \(0\) somewhere in between \(3\) and \(4\). The derivative of \(g\) is given by (the division rule), \[ g'(x) = \frac{1}{x(1+x)} - \frac{\log(x)}{(1+x)^2} = \frac{(1+x) - x\log x}{x(1+x)^2} \]
Let us plot \(g'(x)\).
# Let us create our g'(x) function
dg.dx <- function(x){
# returns the derivative of g at x
((1 + x) - x*log(x))/(x*(1+x)^2)
}
plot_data$dg.dx <- sapply(plot_data$x, dg.dx)
# create the base layer of ggplot with geom_Line
plot_dg.dx <- ggplot(plot_data, aes(x = x, y = dg.dx)) +
geom_line(color = "blue", size = 0.95) +
labs(x = "x", y = "g'(x)")
plot_dg.dxNow let us zoom in on \(3\leq x \leq 4\) and try to see where \(g'(x)\) is equal to \(0\).
# zoom in between 3 and 4
plot_dg.dx +
# add a horizontal line at 0
geom_hline(yintercept = 0.00, color = "green", size = 1) +
coord_cartesian(xlim = c(3,4), ylim = c(-0.025, 0.025))Two key things to observe:
We will study three important iterative root finding algorithms. These are
1. Bisection method 2. Newton-Raphson’s method 3. Secant method.
Let us check if indeed \(g'(x) = 0\) at \(x\approx 3.60\). We will use the package “NLRoot”
library(NLRoot)
BFfzero(dg.dx, 3.25, 3.75, eps = 1e-5)## [1] 1
## [1] 3.591122
## [1] -1.621162e-08
## [1] "finding root is successful"
(Man! Was I close or what!)
How does the bisection method work? What did we supply to the BFfzero function.
The bisection method is an example of an iterative method. An iterative method starts by guessing (can you believe this!) and then successively refines its guess till it finds the right answer. Unfortunately their is no guarantee it will find an answer!
Here is how the bisection method works.
First the function \(f\) needs to be continuous.
Either \(f(a) < 0, f(b) > 0\) or \(f(a) > 0, f(b) < 0\). In other words the function has a different sign at the lower and upper end. For the sake of simplicity, assume \(f(a) < 0\) and \(f(b) > 0\).
Then it makes its first guess of where \(f(x) = 0\). Can you think of what this guess could be?
It makes a first guess of the root to be \(\frac{a + b}{2}\)! Well, this is intuitive. If the function has different signs at the lower and upper end, it must be the case (because the function is continuous!) that somewhere in the middle the function must become 0!
If \(f(\frac{a+b}{2}) = 0\), then TADA! we found the root. If not, only two things can happen. \(f(\frac{a+b}{2}) > 0\) or \(f(\frac{a+b}{2}) < 0\). If it is greater than \(0\) then just update your upper end to be \(\frac{a + b}{2}\), i.e check for the root in \((a, \frac{a+b}{2})\). If it is less than \(0\) then just update your lower end to \(\frac{a + b}{2}\), i.e check for the root in \((\frac{a+b}{2}, b)\).
Repeat!
Note that 3 things are getting updated everytime the algorithm tries to improve its guess. These being the root \(x\), the lower end \(a\) and the upper end \(b\). To keep track of these updates we index these quantities by a superscript \((n)\). Hence \(x^{(n)}\) would mean our \(n^{th}\) guess of the root and \(\left[a_n,b_n\right]\) would be the \(n^{th}\) guess at our lower and upper end.
Let us make this a bit rigorous (is that the British English spelling?). Let \(f\) be the function whose root we want to find. Some Math result : If \(f\) is continuous on \(\left[a,b\right]\) and \(f(a)f(b) \leq 0\) then the intermediate value theorem implies that there is atleast one point \(x^{\ast} \in \left[a,b\right]\) for which \(f(x^{\ast}) = 0\).
Bisection method algorithm:
Intial guesses- Let \(\left[ a_0, b_0\right]\) be the initial lower and upper bound where \(a_0 = a\) and \(b_0 = b\). Let \(x^{(0)}\) be our first guess of the root where \[ x^{(0)} = \frac{(a_0 + b_0)}{2}. \]
Our \((n+1)\) updated guess starting with \(n=0\) will be - \[ \left[a_{n+1}, b_{n+1}\right] = \] \[ \begin{cases} \left[a_{n}, x^{(n)}\right], & \text{if } f(a_{n})f(x^{(n)}) \leq 0 \\ \left[x^{(n)}, b_{n}\right], & \text{if } f(a_{n})f(x^{(n)}) > 0 \end{cases} \] and our updated guess for the root will be, \[ x^{(n+1)} = \left(\frac{a_{n+1} + b_{n+1}}{2}\right). \] How long do we keep guessing? We need a stopping criteria.
We stop if the guesses keep getting closer and closer or if they start diverging. Let \(\epsilon\) be our tolerance. We stop at the \((n+1)^{th}\) guess if \[ \lvert x^{(n+1)} - x^{(n)} \rvert < \epsilon. \] If the guesses start diverging we stop at some pre-set guess level \(M\). We can have different stopping criteria. A common stopping criteria is the relative error. We stop if, \[ \lvert\frac{(x^{(n+1)} - x^{(n)})}{x^{(n)}} \rvert< \epsilon \]
Let us visualize the Bisection Algorithm,
# zoom in between 3.0 and 4.0
plot_dg.dx +
# add a horizontal line at 0
geom_hline(yintercept = 0.00, color = "green", size = 1) +
coord_cartesian(xlim = c(3.0,4.0), ylim = c(-0.025, 0.025)) Lower end is \(3\), upper end is \(4\), our first guess for the root will be \(3.5 = \frac{3 + 4}{2}\).
# zoom in between 3.0 and 4.0
plot_dg.dx +
# add a horizontal line at 0
geom_hline(yintercept = 0.00, color = "green", size = 1) +
geom_vline(xintercept = 3.50) +
coord_cartesian(xlim = c(3.0,4.0), ylim = c(-0.025, 0.025))What will be the new upper and lower end. What will be the new guess for the root?
# zoom in between 3.50 and 4.0
plot_dg.dx +
# add a horizontal line at 0
geom_hline(yintercept = 0.00, color = "green", size = 1) +
geom_vline(xintercept = 3.50) +
coord_cartesian(xlim = c(3.50,4.0), ylim = c(-0.025, 0.025)) Lower end is \(3.5\), upper end is \(4\), our second guess for the root will be \(3.75 = \frac{3.5 + 4}{2}\).
# zoom in between 3.50 and 4.0
plot_dg.dx +
# add a horizontal line at 0
geom_hline(yintercept = 0.00, color = "green", size = 1) +
geom_vline(xintercept = 3.75) +
coord_cartesian(xlim = c(3.50,4.0), ylim = c(-0.025, 0.025)) Lower end is \(3.5\), upper end is \(3.75\), our third guess for the root will be \(3.625 = \frac{3.5 + 3.75}{2}\).
# zoom in between 3.50 and 3.75
plot_dg.dx +
# add a horizontal line at 0
geom_hline(yintercept = 0.00, color = "green", size = 1) +
geom_vline(xintercept = 3.625) +
coord_cartesian(xlim = c(3.50,3.75), ylim = c(-0.025, 0.025))Newton’s method (and its variants) take the cake as far as their usage goes in root finding. It is a faster method than the bisection method and is well suited for root finding in higher dimensions (i.e when the input space is a vector space).
If we need to find the root of a function \(f\) using Newton’s method, we need to know the derivative of \(f\) i.e \(f'\) and an initial guess for the root that is “close” to the root.
Probably the best illustration of how this method works is in https://en.wikipedia.org/wiki/Newton%27s_method#/media/File:NewtonIteration_Ani.gif
The main idea is to approximate \(f\) by its tangent at a point.