2023-02-08

What is Gradient Descent?

Gradient descent is an iterative optimization algorithm used to find a local minimum/maximum of any given function.

The goal of this technique is to minimize cost/loss functions in machine learning and deep learning.

Gradient descent is often used for solving linear regressions, but can also be used to find minimums and maximums on graphs – this is what we will explore in this presentation.

How does Gradient Descent Work?

For some function \(f(\mathbf{x})\) that takes a vector input, \(\mathbf{x}\), we start with an initial guess of the minimum, \(\mathbf{x_0}\), and iteratively update our guess as follows.

The formula for gradient descent:

\[ \mathbf{x_{i+1}} = \mathbf{x_i} - \alpha \nabla f(\mathbf{x_i}), \]

where \(\alpha\) is a “learning rate”, and \(\alpha\) < 1.

Our stopping criteria is defined as:

\[ \text{tolerance} > ||\mathbf{x_{i+1}} - \mathbf{x_i}|| \] In case the function has no maximum or minimums, a maximum amount of iterations is set in order to avoid looping infinitely.

Understanding the Gradient Vector

The gradient vector is a vector representation of the partial derivatives of each variable within a function.

For example: \[ \begin{align} \nabla f(\mathbf{x,y,\dots}) &= \begin{bmatrix} \partial f / \partial \mathbf{x} \\ \partial f / \partial \mathbf{y} \\ \vdots \\ \end{bmatrix} \end{align} \]

Taking the gradient vector at a point gives the direction of the greatest increase in \(f\) at that point.

This is then multiplied by the learning rate, \(\alpha\), and subtracted from the original/current point in order to get a new point that is slightly closer to the location of a local minimum. Add instead of subtracting to get closer to a local maximum.

Gradient Descent in R on \(f(\mathbf{x})=.5x^2 - 4\)

library(ggplot2)

#Set f to the desired function
f <- function(x) {
  .5*x^2 - 4
}

#Choose range of x and put into dataframe
x <- -5:5
df <- data.frame(x)

#Use ggplot2 to graph f(x)
ggplot(df,aes(x)) +
  stat_function(fun=function(x) .5*x^2 - 4) + ylim(-5,5)
#Initial graph of f(x) on next slide

Initial Graph of \(f(\mathbf{x})=.5x^2 - 4\)

Gradient Descent Implementation

GD <- function(x, alpha, maxIters, tol) {
  for (i in seq_len(maxIters)) {
    #Determine new x and y values
    delta_x <- alpha * f_x(x)
    new_x <- x - delta_x
    new_y <- f(new_x)
    
    #Add row to data frame with new x and y values
    dframe[nrow(dframe) + 1, ] <- c(new_x,new_y)
    
    #Calculate value of change and check against tolerance
    change <- x - new_x
    if (delta_x < tol) { break }
    
    #Update x for next iteration
    x <- new_x
  }
  return(dframe)
}

Demo on \(f(\mathbf{x})= .5x^2 - 4\)

dframe <- GD(x = 4.5, alpha = 0.2, maxIters = 100, tol = .0001)

ggplot(dframe, aes(x = xs, y = ys)) + 
  geom_function(fun = function(x) .5*x^2 - 4) +
  geom_point(colour = "green") + xlim(-5,5) + ylim(-5,5) 

Maximum Point on Multivariable Function

Gradient descent also applies to multivariable functions. The red point indicates the maximum value of the function \(f(x,y)=-x^2-y^2\).