1. Definition

Gradient descent (Grad Des) is a first-order (by 1st derivitive) algorithm to find local minimum.

Goal: Grad Des, which takes zig-zag steps, can help machine learning algorithem to find when the prediction model (your hyperthesis function) will be of lowest error (Costs).

2. Math procedure of Gradient Descent:

e.g.: Find a local minimum of function: \[f(x)=x^4−3x^3+2\], with derivative \[f'(x)=4x^3−9x^2\]

x_old = 0
x_new = 6         # The algorithm starts at x=6
alpha = 0.01      # learning rate (step size)

x_set<-c()   # to store coordinates of each steps
y_set<-c()

precision = 0.00001    

# Original function
f <- function(x){
  x^4 - 3*x^3 - x^2
}

f_derivative<-function(x){
  4*x^3 - 9*x^2 - 2*x
}
    
while (abs(f(x_new) - f(x_old)) > precision) {
    
    x_set <- c(x_set, x_old)
    y_set <- c(y_set, f(x_old))
 
    x_old = x_new
    x_new = x_old - alpha * f_derivative(x_old)
}

x_new
## [1] 2.452928

or if you are using python, try:

x_old = 0
x_new = 6                # Start point
gamma = 0.01             # Step size
precision = 0.00001

def f(x):
  return x**4 - 3*x*3 - x**2

def f_derivative(x):
    return 4 * x**3 - 9 * x**2

while abs(f(x_new) - f(x_old)) > precision:
    x_old = x_new
    x_new = x_old - gamma * f_derivative(x_old)

print("Local minimum occurs at", x_new)
## ('Local minimum occurs at', 2.249998811801337)

Process by plot:

curve(expr = f, from = -1, to = 4, col = "#5bf4fe")

for (i in 0:20) {
   points(x_set[5*i+1], y_set[5*i+1], col="black")
   arrows(x_set[5*i+1], y_set[5*i+1], x_set[5*i+6], y_set[5*i+6], 
          length = 0.2,code=2)
}

3. Gradient Descent for Linear Regression with 1 variable

house_data <- read.csv("housingPrice.csv", sep=',', head=T)
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)")

* If you have a house with 2045 sq. feet, you want to know the corresponding market price, then just create a model (linear regression line).

\[ h(x) = \theta x \]
\(h(x)\) - hypothesis function, the predictive price
\(x\) - area of house
\(\theta\) - slope of regression line

So \(\theta\) will be the parameter we want to work out from existing data. One way to to make whole error at lowest level is to use Gradient Descent.

or in R, defined as:

J <- function(theta, x, y){
  1/(2*length(x)) * sum((theta * x - y)^2) 
}

which has derivitive: \[J'(\theta) = 1/m * \sum_{i=1}^m [x(i) * (\theta x(i) - y(i))] \]

J_dev <- function(theta, x, y){
  1/length(x) * sum((theta * x - y)*x) 
}
theta_old = 0
theta_new = 6        
alpha = 0.0000001      # learning rate (step size)
x <- house_data$area
y <- house_data$price/1000

theta_set<-c()   # to store coordinates of each steps
J_set <- c()     # to store cost changes when theta changes

while (abs(theta_new - theta_old) > precision) {
    
   theta_set <- c(theta_set, theta_old)
   J_set <- c(J_set, J(theta_old,x,y))
 
    theta_old = theta_new
    theta_new = theta_old - alpha * J_dev(theta_old, x, y)
}

theta_new  
## [1] 0.1653902
# Regression line:
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)")
abline(a = 0, b = theta_new, col = "red")

par(mfrow=c(2,2))
plot(x=theta_set, y=J_set, ylab = "Cost J", xlab = "Theta", main = "Cost Function")
for (i in 2:7) {
   arrows(theta_set[i], J_set[i], theta_set[i+1], J_set[i+1], 
          length = 0.2,code=2)
}

plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)", main = "6th step")
abline(a = 0, b = theta_set[6], col = "red")

plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)", main = "11th step")
abline(a = 0, b = theta_set[11], col = "red")
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)", main = "17th step")
abline(a = 0, b = theta_set[17], col = "red")

Notes: * If your cost increases during iteration, you may try to reduce learning rate! If step’s too big, you will never reach minimum.