As we learned in DAT 300, Gradient Descent (GD) and ascent algorithms allow us to find local extrema of functions.
They work by following the directional gradient vector at a particular point on the graph of a function, in discrete steps of size alpha (also known as the learning rate), until they reach a user-defined threshold gradient range centered around zero. In predictive models, cost functions calculate error, and finding their local minima and maxima allows us to optimize our models.
GD is of particular import to machine learning, as it is used to optimize a function “to minimize the sum of squares of the errors” (Cormen 1035). In other words, it aids in linear regression.