Gradient descent (Grad Des) is a first-order (by 1st derivitive) algorithm to find local minimum.
Goal: Grad Des, which takes zig-zag steps, can help machine learning algorithem to find when the prediction model (your hyperthesis function) will be of lowest error (Costs).
e.g.: Find a local minimum of function: \[f(x)=x^4−3x^3+2\], with derivative \[f'(x)=4x^3−9x^2\]
x_old = 0
x_new = 6 # The algorithm starts at x=6
alpha = 0.01 # learning rate (step size)
x_set<-c() # to store coordinates of each steps
y_set<-c()
precision = 0.00001
# Original function
f <- function(x){
x^4 - 3*x^3 - x^2
}
f_derivative<-function(x){
4*x^3 - 9*x^2 - 2*x
}
while (abs(f(x_new) - f(x_old)) > precision) {
x_set <- c(x_set, x_old)
y_set <- c(y_set, f(x_old))
x_old = x_new
x_new = x_old - alpha * f_derivative(x_old)
}
x_new
## [1] 2.452928
or if you are using python, try:
x_old = 0
x_new = 6 # Start point
gamma = 0.01 # Step size
precision = 0.00001
def f(x):
return x**4 - 3*x*3 - x**2
def f_derivative(x):
return 4 * x**3 - 9 * x**2
while abs(f(x_new) - f(x_old)) > precision:
x_old = x_new
x_new = x_old - gamma * f_derivative(x_old)
print("Local minimum occurs at", x_new)
## ('Local minimum occurs at', 2.249998811801337)
Process by plot:
curve(expr = f, from = -1, to = 4, col = "#5bf4fe")
for (i in 0:20) {
points(x_set[5*i+1], y_set[5*i+1], col="black")
arrows(x_set[5*i+1], y_set[5*i+1], x_set[5*i+6], y_set[5*i+6],
length = 0.2,code=2)
}
house_data <- read.csv("housingPrice.csv", sep=',', head=T)
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)")
* If you have a house with 2045 sq. feet, you want to know the corresponding market price, then just create a model (linear regression line).
\[ h(x) = \theta x \]
\(h(x)\) - hypothesis function, the predictive price
\(x\) - area of house
\(\theta\) - slope of regression line
So \(\theta\) will be the parameter we want to work out from existing data. One way to to make whole error at lowest level is to use Gradient Descent.
or in R, defined as:
J <- function(theta, x, y){
1/(2*length(x)) * sum((theta * x - y)^2)
}
which has derivitive: \[J'(\theta) = 1/m * \sum_{i=1}^m [x(i) * (\theta x(i) - y(i))] \]
J_dev <- function(theta, x, y){
1/length(x) * sum((theta * x - y)*x)
}
theta_old = 0
theta_new = 6
alpha = 0.0000001 # learning rate (step size)
x <- house_data$area
y <- house_data$price/1000
theta_set<-c() # to store coordinates of each steps
J_set <- c() # to store cost changes when theta changes
while (abs(theta_new - theta_old) > precision) {
theta_set <- c(theta_set, theta_old)
J_set <- c(J_set, J(theta_old,x,y))
theta_old = theta_new
theta_new = theta_old - alpha * J_dev(theta_old, x, y)
}
theta_new
## [1] 0.1653902
# Regression line:
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)")
abline(a = 0, b = theta_new, col = "red")
par(mfrow=c(2,2))
plot(x=theta_set, y=J_set, ylab = "Cost J", xlab = "Theta", main = "Cost Function")
for (i in 2:7) {
arrows(theta_set[i], J_set[i], theta_set[i+1], J_set[i+1],
length = 0.2,code=2)
}
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)", main = "6th step")
abline(a = 0, b = theta_set[6], col = "red")
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)", main = "11th step")
abline(a = 0, b = theta_set[11], col = "red")
plot(house_data$area, house_data$price/1000, xlab = "Area (squre feet)", ylab = "Price ($1000)", main = "17th step")
abline(a = 0, b = theta_set[17], col = "red")
Notes: * If your cost increases during iteration, you may try to reduce learning rate! If step’s too big, you will never reach minimum.