GradDescent

1. Definition

Gradient descent (Grad Des) is a first-order (by 1st derivitive) algorithm to find local minimum.

Goal: Grad Des, which takes zig-zag steps, can help machine learning algorithem to find when the prediction model (your hyperthesis function) will be of lowest error (Costs).

2. Math procedure of Gradient Descent:

e.g.: Find a local minimum of function: \[f(x)=x^4−3x^3+2\], with derivative \[f'(x)=4x^3−9x^2\]

x_old = 0
x_new = 6         # The algorithm starts at x=6
alpha = 0.01      # learning rate (step size)

x_set<-c()   # to store coordinates of each steps
y_set<-c()

precision = 0.00001    

# Original function
f <- function(x){
  x^4 - 3*x^3 - x^2
}

f_derivative<-function(x){
  4*x^3 - 9*x^2 - 2*x
}
    
while (abs(f(x_new) - f(x_old)) > precision) {
    
    x_set <- c(x_set, x_old)
    y_set <- c(y_set, f(x_old))
 
    x_old = x_new
    x_new = x_old - alpha * f_derivative(x_old)
}

x_new

## [1] 2.452928

or if you are using python, try:

x_old = 0
x_new = 6                # Start point
gamma = 0.01             # Step size
precision = 0.00001

def f(x):
  return x**4 - 3*x*3 - x**2

def f_derivative(x):
    return 4 * x**3 - 9 * x**2

while abs(f(x_new) - f(x_old)) > precision:
    x_old = x_new
    x_new = x_old - gamma * f_derivative(x_old)

print("Local minimum occurs at", x_new)

## ('Local minimum occurs at', 2.249998811801337)

Process by plot:

curve(expr = f, from = -1, to = 4, col = "#5bf4fe")

for (i in 0:20) {
   points(x_set[5*i+1], y_set[5*i+1], col="black")
   arrows(x_set[5*i+1], y_set[5*i+1], x_set[5*i+6], y_set[5*i+6], 
          length = 0.2,code=2)
}

3. Gradient Descent for Linear Regression with 1 variable

Dataset of housing price by area:

house_data <- read.csv("housingPrice.csv", sep=',', head=T)
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)")

* If you have a house with 2045 sq. feet, you want to know the corresponding market price, then just create a model (linear regression line).

\[ h(x) = \theta x \]
\(h(x)\) - hypothesis function, the predictive price
\(x\) - area of house
\(\theta\) - slope of regression line

So \(\theta\) will be the parameter we want to work out from existing data. One way to to make whole error at lowest level is to use Gradient Descent.

The objective of linear regression is to minimize the error (cost function): \[J(\theta) = 1/2m* \sum_{i=1}^m [h(x(i)) - y(i)]^2\] or \[J(\theta) = 1/2m * \sum_{i=1}^m [(\theta x(i) - y(i)]^2\] \(m\) means there’re m observations in training dataset.

or in R, defined as:

J <- function(theta, x, y){
  1/(2*length(x)) * sum((theta * x - y)^2) 
}

which has derivitive: \[J'(\theta) = 1/m * \sum_{i=1}^m [x(i) * (\theta x(i) - y(i))] \]

J_dev <- function(theta, x, y){
  1/length(x) * sum((theta * x - y)*x) 
}

Last step: use gradient descent to find out local minimum of cost function:

theta_old = 0
theta_new = 6        
alpha = 0.0000001      # learning rate (step size)
x <- house_data$area
y <- house_data$price/1000

theta_set<-c()   # to store coordinates of each steps
J_set <- c()     # to store cost changes when theta changes

while (abs(theta_new - theta_old) > precision) {
    
   theta_set <- c(theta_set, theta_old)
   J_set <- c(J_set, J(theta_old,x,y))
 
    theta_old = theta_new
    theta_new = theta_old - alpha * J_dev(theta_old, x, y)
}

theta_new

## [1] 0.1653902

# Regression line:
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)")
abline(a = 0, b = theta_new, col = "red")

a step-by-step look of changing cost and corresponding regression line:

par(mfrow=c(2,2))
plot(x=theta_set, y=J_set, ylab = "Cost J", xlab = "Theta", main = "Cost Function")
for (i in 2:7) {
   arrows(theta_set[i], J_set[i], theta_set[i+1], J_set[i+1], 
          length = 0.2,code=2)
}

plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)", main = "6th step")
abline(a = 0, b = theta_set[6], col = "red")

plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)", main = "11th step")
abline(a = 0, b = theta_set[11], col = "red")
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)", main = "17th step")
abline(a = 0, b = theta_set[17], col = "red")

Notes: * If your cost increases during iteration, you may try to reduce learning rate! If step’s too big, you will never reach minimum.

GradDescent

Jie Hu

March 4, 2016

1. Definition

2. Math procedure of Gradient Descent:

3. Gradient Descent for Linear Regression with 1 variable