library(plotly)
f <- function(x, y) { 10 * x * y * exp(-x^4 - y^4) }
x <- seq(-2, 2, by = 0.05)
y <- x
z <- outer(x, y, f)
plot_ly(x = ~x, y = ~y, z = ~z, type = "surface")
The shape seems to consist of four main peaks: two that are upward-pointing (hills) and two that are downward-pointing (pits). Observing the function, this makes sense as the exponential term is very similar to a bell curve(gaussian distribution), with the 10xy term changing the signs depending on the quadrant.
Given the function:
\[ f(x,y) = 10xye^{-x^4-y^4} \]
To find the gradient, we need to differentiate with respect to both \(x\) and \(y\):
\[ \nabla f(x,y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) \]
# Define the function
f <- function(x, y) {
10 * x * y * exp(-x^4 - y^4)
}
# Partial derivative with respect to x
df_dx <- function(x, y) {
10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
# Partial derivative with respect to y
df_dy <- function(x, y) {
10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}
gradient_f <- list("df_dx" = df_dx, "df_dy" = df_dy)
gradient_f
## $df_dx
## function(x, y) {
## 10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
## }
##
## $df_dy
## function(x, y) {
## 10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
## }
The partial derivative with respect to \(x\) is: \[ \frac{\partial f}{\partial x} = 10ye^{-x^4-y^4} - 40x^4ye^{-x^4-y^4} \]
The partial derivative with respect to \(y\) is: \[ \frac{\partial f}{\partial y} = 10xe^{-x^4-y^4} - 40xy^4e^{-x^4-y^4} \]
Therefore, the gradient function for \(f(x,y)\) is: \[ \nabla f(x,y) = \left( 10ye^{-x^4-y^4} - 40x^5ye^{-x^4-y^4}, 10xe^{-x^4-y^4} - 40xy^5e^{-x^4-y^4} \right) \]
The rate of change will be the largest at P(0.5,-1) if the direction is the same as the direction of the gradient at point P. Therefore, evaluating the gradient at point P will give the direction that maximizes the rate of change.
x_val <- 0.5
y_val <- -1
list(df_dx(x_val,y_val), df_dy(x_val,y_val))
## [[1]]
## [1] -2.591931
##
## [[2]]
## [1] -5.183861
Therefore, moving in the direction of \[ \nabla f(x,y) \approx \left \langle-2.591931, -5.183861 \right \rangle \], this will result in the largest rate of change at point P.
library(tidyverse)
library(ggplot2)
library(ggquiver)
library(viridis)
f <- function(x, y) {
10 * x * y * exp(-x^4 - y^4)
}
x_seq <- seq(-2, 2, length.out = 100)
y_seq <- seq(-2, 2, length.out = 100)
grid <- expand.grid(x = x_seq, y = y_seq)
grid$z <- with(grid, f(x, y))
ggplot(grid, aes(x = x, y = y, z = z)) +
geom_contour_filled(aes(fill = stat(level))) + # Updated to use ..level..
scale_fill_viridis_d() # For a nice color scale
Adding gradients to the contour plot,
f <- expression(10 * x * y * exp(-x^4 - y^4))
df_dx <- D(f, "x")
df_dy <- D(f, "y")
grid <- grid %>%
mutate(
dx = map2_dbl(x, y, ~eval(df_dx, list(x = .x, y = .y))),
dy = map2_dbl(x, y, ~eval(df_dy, list(x = .x, y = .y)))
)
ggplot(grid, aes(x = x, y = y, z = z)) +
geom_contour_filled(aes(fill = stat(level))) +
geom_quiver(
aes(u = 0.1 * dx, v = 0.1 * dy),
scale = 2000, # Adjust this scale factor to increase the arrow size
color = "white", # A color that stands out against your fill
arrow = arrow(type = "open", length = unit(5, "inches")) # Customize arrow appearance
) +
scale_fill_viridis_d() +
theme_minimal() +
labs(title = "Contour Map with Gradients of f(x, y)")
To find the critical points of the function, set the derivative of the function with respect to x and y to 0, and solve for the set of points.
\[ \frac{\partial f}{\partial x} = 10ye^{-x^4-y^4} - 40x^4ye^{-x^4-y^4}=0 \] \[ \frac{\partial f}{\partial y} = 10xe^{-x^4-y^4} - 40xy^4e^{-x^4-y^4} =0 \]
For \[ \frac{\partial f}{\partial x}\]: \[10ye^{-x^4-y^4} - 40x^4ye^{-x^4-y^4}=0 \] \[10 \left (ye^{-x^4-y^4} - 4x^4ye^{-x^4-y^4} \right) =0 \] \[ye^{-x^4-y^4} - 4x^4ye^{-x^4-y^4} =0 \] \[ e^{-x^4-y^4} \left ( y - 4x^4y \right ) =0 \] Because \(e^{-x^4-y^4}\) is always non-zero, \[\left ( y - 4x^4y \right ) =0 \] \[\left ( y(1- 4x^4) \right ) =0 \]Which means either \(y=0\) or \(x= \pm \left (\frac{1}{4} \right )^\frac{1}{4}\)
Which means either \(x=0\) or \(y=\pm \left (\frac{1}{4} \right )^\frac{1}{4}\)
\(y(1- 4x^4) =0\) for \(\frac{\partial f}{\partial x}\)
If \(y=0\), \(f_x=0\), therefore \(x=0\). Therefore, there is a critical point
at (0,0). The converse is also true, \(x=0\), \(f_y=0\), therefore \(y=0\)
If \((1- 4x^4) =0\) and \((1- 4y^4)=0\), \(y=\pm \left (\frac{1}{4} \right
)^\frac{1}{4}\) and \(x= \pm \left
(\frac{1}{4} \right )^\frac{1}{4}\). Therefore, there are
critical points at: \[\left( \pm
\sqrt[4]{\frac{1}{4}},\pm\sqrt[4]{\frac{1}{4}} \right) \]
# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}
dx <- function(x, y) {
10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}
grad <- function(x,y) { c( dx(x,y) , dy(x,y) ) }
x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))
point <- c(x = 0.25, y = -1)
step_size <- 0.2
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 100
v <- ggplot(dat, aes(x=x, y=y, z=f(x,y)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw()
counter <- 1
while(increment > epsilon & counter <= 100){
prev_point <- point
point <- point + grad(point[1], point[2]) * step_size
v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
counter <- counter + 1
increment <- sqrt(sum((point-prev_point)^2))
print(point)
}
## x y
## -0.471439 -1.549668
## x y
## -0.4788455 -1.4876997
## x y
## -0.4954716 -1.3616967
## x y
## -0.5579864 -0.9794777
## x y
## -0.9916229 0.1025142
## x y
## -1.2151691 -0.6512106
## x y
## -0.2658583 -0.7155912
## x y
## -1.3395505 -0.6957011
## x y
## -0.8169780 -0.7010352
## x y
## -0.2654156 -0.7289061
## x y
## -1.3375380 -0.6774715
## x y
## -0.8098003 -0.6913666
## x y
## -0.2943525 -0.7635564
## x y
## -1.3408752 -0.6139719
## x y
## -0.8394379 -0.6535889
## x y
## -0.1857254 -0.8835246
## x y
## -1.1407574 -0.5935678
## x y
## -0.02749867 -0.78013450
## x y
## -1.1047896 -0.7618457
## x y
## 0.1113875 -0.6382619
## x y
## -0.9691015 -0.5748330
## x y
## 0.1095633 -0.9799973
## x y
## -0.6691308 -1.2142681
## x y
## -0.7139103 -0.2557746
## x y
## -0.6985717 -1.3334866
## x y
## -0.7027915 -0.7904332
## x y
## -0.7230696 -0.3719574
## x y
## -0.6712065 -1.3687125
## x y
## -0.6837810 -0.9413619
## x y
## -0.7704105 0.1316532
## x y
## -0.8461271 -0.9500447
## x y
## -0.31688416 0.06365504
## x y
## -0.1959366 -0.5637034
## x y
## -1.2075646 -0.7745567
## x y
## -0.2400122 -0.6861858
## x y
## -1.3213064 -0.7295744
## x y
## -0.7375460 -0.7169841
## x y
## -0.5871604 -0.6689175
## x y
## -1.0972370 -0.8388986
## x y
## 0.05410101 -0.53097047
## x y
## -0.9266577 -0.4628094
## x y
## -0.1021701 -1.1542323
## x y
## -0.4932442 -0.9429909
## x y
## -1.10853695 -0.03094206
## x y
## -1.0396373 -0.5206731
## x y
## 0.06528567 -0.94475948
## x y
## -0.7864742 -1.0734778
## x y
## -0.5806297 0.1525330
## x y
## -0.4322106 -0.8811612
## x y
## -1.2335486 -0.2363698
## x y
## -0.8491638 -0.4761485
## x y
## -0.2684164 -1.2380911
## x y
## -0.4985388 -0.8101908
## x y
## -1.2439689 -0.3694236
## x y
## -0.6765497 -0.5755593
## x y
## -0.8120411 -1.1272307
## x y
## -0.59734357 0.01466485
## x y
## -0.5846715 -1.0371994
## x y
## -0.8936403 0.1496474
## x y
## -1.1388414 -0.7925268
## x y
## -0.0006447416 -0.6274888082
## x y
## -1.0753912 -0.6279083
## x y
## 0.1521594 -0.8107123
## x y
## -0.8976821 -0.9544531
## x y
## -0.203001087 -0.005764464
## x y
## -0.2144323 -0.4110777
## x y
## -1.0050118 -0.7794818
## x y
## 0.1920167 -0.5406863
## x y
## -0.7940371 -0.3089555
## x y
## -0.5512372 -1.3278933
## x y
## -0.6194044 -0.8147282
## x y
## -0.9916518 -0.2900142
## x y
## -0.3635972 -1.0175894
## x y
## -1.0002034 -0.2132295
## x y
## -0.5303918 -0.9409532
## x y
## -1.072993 0.014791
## x y
## -1.1068027 -0.5553194
## x y
## 0.01972965 -0.83341277
## x y
## -1.0091631 -0.8560589
## x y
## 0.1076639 -0.3759515
## x y
## -0.6288723 -0.1817747
## x y
## -0.7451449 -1.2515482
## x y
## -0.7082719 -0.4216844
## x y
## -0.7040742 -1.3538253
## x y
## -0.7053289 -0.8777028
## x y
## -0.71291473 -0.04184215
## x y
## -0.7107649 -1.1430669
## x y
## -0.70406552 0.02124286
## x y
## -0.7034975 -1.0801060
## x y
## -0.7122818 0.1748170
## x y
## -0.7202740 -0.9213169
## x y
## -0.66781501 0.08642782
## x y
## -0.6388549 -1.0079967
## x y
## -0.8416937 0.1976211
## x y
## -1.0824154 -0.8137069
## x y
## 0.1123161 -0.5470109
## x y
## -0.8872156 -0.4151961
## x y
## -0.2458794 -1.2319679
## x y
## -0.4875517 -0.8298892
## x y
## -1.2430660 -0.3153016
v
For \(\eta=0.2\), the algorithm does
not converge. The plotted path zig zags across the plot, crossing itself
numerous times. This means the algorithm is not moving in a clear path
towards a maximum and oscillates with 100 iterations. One can infer that
the step size is too high, causing the algorithm to overshoot, resulting
in the zig zag pattern.
For \(\eta=0.001\), the algorithm is as follows:
# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}
dx <- function(x, y) {
10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}
grad <- function(x,y) { c( dx(x,y) , dy(x,y) ) }
x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))
point <- c(x = 0.25, y = -1)
step_size <- 0.001
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 100
v <- ggplot(dat, aes(x=x, y=y, z=f(x,y)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw()
counter <- 1
while(increment > epsilon & counter <= 100){
prev_point <- point
point <- point + grad(point[1], point[2]) * step_size
v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
counter <- counter + 1
increment <- sqrt(sum((point-prev_point)^2))
print(point)
}
## x y
## 0.2463928 -1.0027483
## x y
## 0.2428114 -1.0054673
## x y
## 0.2392559 -1.0081563
## x y
## 0.2357262 -1.0108148
## x y
## 0.2322223 -1.0134423
## x y
## 0.2287443 -1.0160383
## x y
## 0.225292 -1.018602
## x y
## 0.2218654 -1.0211341
## x y
## 0.2184644 -1.0236331
## x y
## 0.2150888 -1.0260990
## x y
## 0.2117384 -1.0285317
## x y
## 0.2084132 -1.0309307
## x y
## 0.205113 -1.033296
## x y
## 0.2018374 -1.0356271
## x y
## 0.1985864 -1.0379241
## x y
## 0.1953597 -1.0401868
## x y
## 0.192157 -1.042415
## x y
## 0.1889781 -1.0446086
## x y
## 0.1858228 -1.0467676
## x y
## 0.1826907 -1.0488919
## x y
## 0.1795815 -1.0509816
## x y
## 0.176495 -1.053037
## x y
## 0.1734308 -1.0550568
## x y
## 0.1703887 -1.0570425
## x y
## 0.1673683 -1.0589935
## x y
## 0.1643692 -1.0609100
## x y
## 0.1613913 -1.0627921
## x y
## 0.158434 -1.064640
## x y
## 0.1554972 -1.0664533
## x y
## 0.1525803 -1.0682326
## x y
## 0.1496832 -1.0699780
## x y
## 0.1468054 -1.0716895
## x y
## 0.1439466 -1.0733673
## x y
## 0.1411064 -1.0750115
## x y
## 0.1382846 -1.0766224
## x y
## 0.1354806 -1.0782001
## x y
## 0.1326942 -1.0797447
## x y
## 0.129925 -1.081257
## x y
## 0.1271727 -1.0827357
## x y
## 0.1244368 -1.0841823
## x y
## 0.1217171 -1.0855967
## x y
## 0.1190132 -1.0869791
## x y
## 0.1163247 -1.0883295
## x y
## 0.1136512 -1.0896483
## x y
## 0.1109925 -1.0909356
## x y
## 0.1083481 -1.0921916
## x y
## 0.1057178 -1.0934166
## x y
## 0.1031011 -1.0946107
## x y
## 0.1004978 -1.0957741
## x y
## 0.09790739 -1.09690704
## x y
## 0.09532964 -1.09800973
## x y
## 0.09276418 -1.09908236
## x y
## 0.09021066 -1.10012511
## x y
## 0.08766876 -1.10113819
## x y
## 0.08513814 -1.10212178
## x y
## 0.08261847 -1.10307606
## x y
## 0.0801094 -1.1040012
## x y
## 0.07761063 -1.10489747
## x y
## 0.07512181 -1.10576495
## x y
## 0.07264263 -1.10660385
## x y
## 0.07017276 -1.10741433
## x y
## 0.06771189 -1.10819657
## x y
## 0.06525969 -1.10895072
## x y
## 0.06281585 -1.10967695
## x y
## 0.06038006 -1.11037540
## x y
## 0.057952 -1.111046
## x y
## 0.05553136 -1.11168958
## x y
## 0.05311785 -1.11230559
## x y
## 0.05071115 -1.11289439
## x y
## 0.04831095 -1.11345611
## x y
## 0.04591696 -1.11399088
## x y
## 0.04352888 -1.11449881
## x y
## 0.0411464 -1.1149800
## x y
## 0.03876924 -1.11543462
## x y
## 0.03639708 -1.11586270
## x y
## 0.03402965 -1.11626438
## x y
## 0.03166664 -1.11663973
## x y
## 0.02930777 -1.11698885
## x y
## 0.02695274 -1.11731181
## x y
## 0.02460127 -1.11760870
## x y
## 0.02225307 -1.11787958
## x y
## 0.01990784 -1.11812452
## x y
## 0.01756531 -1.11834357
## x y
## 0.01522519 -1.11853679
## x y
## 0.0128872 -1.1187042
## x y
## 0.01055105 -1.11884591
## x y
## 0.00821645 -1.11896190
## x y
## 0.005883128 -1.119052202
## x y
## 0.003550799 -1.119116855
## x y
## 0.00121918 -1.11915587
## x y
## -0.001112009 -1.119169270
## x y
## -0.003443052 -1.119157051
## x y
## -0.005774229 -1.119119219
## x y
## -0.008105822 -1.119055768
## x y
## -0.01043811 -1.11896669
## x y
## -0.01277138 -1.11885197
## x y
## -0.01510591 -1.11871158
## x y
## -0.01744198 -1.11854549
## x y
## -0.01977988 -1.11835368
## x y
## -0.02211989 -1.11813610
v
For \(\eta=0.001\), the algorithm does
not converge in 100 iterations. This is evident due to the plotted path,
which does not reach the maximum. In fact, looking at the trajectory of
the plotted gradient ascent path, it’s evident that it hasn’t converged
on a value. In order for the algorithm to converge with the constrained
step size, more iterations would be needed for convergence.
To find a \(\eta\) that converges in less than 13 iterations, the step size needs to be orders of magnitude larger than 0.001. Because we are decreasing the number of iterations from 100 to 13, a reasonable initial assumption is to increase the step size by a factor of 10, 0.01 roughly assuming that the gradient ascent plot path is somewhat linear.
# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}
dx <- function(x, y) {
10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}
grad <- function(x,y) { c( dx(x,y) , dy(x,y) ) }
x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))
point <- c(x = 0.25, y = -1)
step_size <- 0.01
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 13
v <- ggplot(dat, aes(x=x, y=y, z=f(x,y)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw()
counter <- 1
while(increment > epsilon & counter <= 100){
prev_point <- point
point <- point + grad(point[1], point[2]) * step_size
v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
counter <- counter + 1
increment <- sqrt(sum((point-prev_point)^2))
}
v
This is a clear case of convergence. The plotted path takes direct steps
towards the maximum, and eventually, reaches the maximum. In fact,
observing the last iteration, this is approximately equal to the
critical point found earlier, \(\left(
-\sqrt[4]{\frac{1}{4}},-\sqrt[4]{\frac{1}{4}} \right)\), which is
classified as an absolute maximum.
To find a \(\eta\) that converges in less than 12 iterations with the new point P, one can observe the contour plot. \(\acute{P}\) is in the 4th quadrant. In order for convergence, the plotted path will most likely go into the first quadrant, as it is the closest way to reach a maximum. Because a \(\eta=0.001\) was found to be too small for 100 iterations, the \(\eta=0.01\) would still be a reasonable initial assumption due to the fact that the iteration size is decreased by roughly a factor of 10.
# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}
dx <- function(x, y) {
10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}
grad <- function(x,y) { c( dx(x,y) , dy(x,y) ) }
x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))
point <- c(x = 1, y = -0.25)
step_size <- 0.01
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 12
v <- ggplot(dat, aes(x=x, y=y, z=f(x,y)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw()
counter <- 1
while(increment > epsilon & counter <= 100){
prev_point <- point
point <- point + grad(point[1], point[2]) * step_size
v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
counter <- counter + 1
increment <- sqrt(sum((point-prev_point)^2))
}
v
Convergence has been acheived with a \(\eta=0.01\). The plotted path takes small
steps along the gradient, eventually reaching a maximum. In this case,
observing the final point, this can be approximated to \(\left( \sqrt[4]{\frac{1}{4}},\sqrt[4]{\frac{1}{4}}
\right)\), a critical point which is classified as a maximum.
Due to the nature of the function f(x,y), maximums are observed when both the x and y term have the same sign. This is due to the \(10xy\) term in the function. With differing signs, the result is negative. The gradient ascent algorithm is focused on taking steps towards the fastest rate of increase to the maxima of the function, which depends on the initial point. With \(P=(0.25,-1)\), this initial point is located very close to the boundary between the fourth and third quadrant. There is a maximum in the third quadrant due to both x and y having the same sign. As a result, the gradient ascent plot will traverse into the third quadrant as it is the steepest rate of increase.
For point \(P \acute =(1,-0.25)\), this point is much closer towards the boundary of the fourth and first quadrant. There is a maximum in the first quadrant due to both x and y being positive. As a result, the plotted path will traverse into the first quadrant, following the steepest rate of increase from the initial point. While both points in the algorithm reach the maximum value of f(x,y), they traverse different paths due to the initial points used, and their relative proximity to a maximum. Because both algorithms reach the maxmium value in less than 13 iterations, both initial points are good choices to yield the maximum value of the function.
To convert the gradient ascent algorithm to a gradient descent algorithm, it is simply changing the signage so the algorithm goes in the opposite direction of the steepest rate of change. This is expressed as: \[P(x_{i+1},y_{i+1})= P(x_{i},y_{i}) - \nabla f(x_{i},y_{i}) \times \eta\]
# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}
dx <- function(x, y) {
10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}
grad <- function(x,y) { c( dx(x,y) , dy(x,y) ) }
x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))
point <- c(x = 0.25, y = -1)
step_size <- 0.2
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 100
v <- ggplot(dat, aes(x=x, y=y, z=f(x,y)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw()
counter <- 1
while(increment > epsilon & counter <= 100){
prev_point <- point
point <- point - grad(point[1], point[2]) * step_size # Gradient Descent conversion, change the signage of the gradient
v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
counter <- counter + 1
increment <- sqrt(sum((point-prev_point)^2))
}
v
To find a \(\eta\) that converges to the local minimum \((0.71, −0.71)\), one can use the results of convergence with 100 iterations with a \(\eta=0.2\) for a gradient ascent, and apply the same rationale for gradient descent.A step rate of 0.2 is far too large, a reasonable assumption for a step rate could be 0.01 for gradient descent. Because this step rate has been successful in determining convergence for gradient ascent, one can reasonably assume it will work for gradient descent.
# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}
dx <- function(x, y) {
10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}
grad <- function(x,y) { c( dx(x,y) , dy(x,y) ) }
x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))
point <- c(x = 0.5, y = -0.25)
step_size <- 0.01
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 100
v <- ggplot(dat, aes(x=x, y=y, z=f(x,y)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw()
counter <- 1
while(increment > epsilon & counter <= 100){
prev_point <- point
point <- point - grad(point[1], point[2]) * step_size # Gradient Descent conversion, change the signage of the gradient
v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
counter <- counter + 1
increment <- sqrt(sum((point-prev_point)^2))
}
v
Observing the plot, this clearly converges at the local minimum,
(0.71,-0.71). The plotted path takes small steps opposite to the
steepest rate of change, reaching the local minimum.
The local minimum (−0.71, 0.71) is located in the second quadrant. Thus any initial point must be closer to (−0.71, 0.71) than (0.71, -0.71) to converge to the local minimum (−0.71, 0.71). Thus, a good initial point would be (-0.5,0.25). One can note that this is simply the opposite signage of point P, as this Point P is in quadrant 4, and the opposite signage applied would shift it to the second quadrant.
# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}
dx <- function(x, y) {
10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}
grad <- function(x,y) { c( dx(x,y) , dy(x,y) ) }
x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))
point <- c(x = -0.5, y = 0.25)
step_size <- 0.01
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 100
v <- ggplot(dat, aes(x=x, y=y, z=f(x,y)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw()
counter <- 1
while(increment > epsilon & counter <= 100){
prev_point <- point
point <- point - grad(point[1], point[2]) * step_size # Gradient Descent conversion, change the signage of the gradient
v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
counter <- counter + 1
increment <- sqrt(sum((point-prev_point)^2))
}
v
The plot pattern converges to the local minimum (−0.71, 0.71), taking
gradual steps opposite to the greatest rate of change.
Calculating the 1st directional derivative requires finding the tangent vector at the point \(P(1, \sqrt{5}, 0)\) for the curve defined by \(y^2 + z^2 = 5\) and \(x + z = 1\).
In the direction of decreasing x, x can be paramaterized/expressed as 1-t, where t is any real number. Consequently, z=t given by one of the equations of the curve. Solving for y:
\[y^2+z^2=5\] \[y^2+t^2=5\] \[y^2=5-t^2\] \[y=\sqrt{5-t^2}\] Given that z=t, and at point P z=0, therefore t=0 at point P. Differentiating x,y,z with respect to t:
# Calculate derivatives
t <- 0 # At point P, t will be 0
dx_dt <- -1
dy_dt <- -t / sqrt(5 - t^2)
dz_dt <- 1
# At t = 0
dy_dt_at_P <- eval(dy_dt)
# The tangent vector at P
tangent_vector <- c(dx_dt, dy_dt_at_P, dz_dt)
tangent_vector
## [1] -1 0 1
Therefore the tangent vector at point P is \(\overrightarrow{T}=\left\langle -1,0,1 \right\rangle\) Finding the gradient at f,
\[\frac{\partial f}{\partial x}=-5 cos(5 x - y^2 + 7 z)\] \[\frac{\partial f}{\partial y}=2 y cos(5 x - y^2 + 7 z)\] \[\frac{\partial f}{\partial x}=-7 cos(5 x - y^2 + 7 z)\]
Therefore, \[\nabla f(x,y,z)=\left \langle -5 cos(5 x - y^2 + 7 z), 2 y cos(5 x - y^2 + 7 z), -7 cos(5 x - y^2 + 7 z) \right \rangle \] At point P, \[\nabla f((1, √5, 0)=\left \langle -5 cos(0), 2 √5 cos(0), -7 cos(0) \right \rangle \] \[\nabla f((1, √5, 0)=\left \langle -5, 2 √5, -7 \right \rangle \]
At point P, the directional derivative is: \[D_\overrightarrow{T}f(x,y,z)=\left \langle -5, 2 √5, -7 \right \rangle \cdot \left \langle -1,0,1 \right \rangle \] \[D_\overrightarrow{T}f(x,y,z)=-2 \]
The tangent/direction vector within the directional derivative computation stays the same in higher order directional derivatives. Higher order directional derivatives involve getting the higher order gradient function derivative and performing the dot product with the tangent vector. the first order general gradient is \[\nabla f(x,y,z)=\left \langle -5 cos(5 x - y^2 + 7 z), 2 y cos(5 x - y^2 + 7 z), -7 cos(5 x - y^2 + 7 z) \right \rangle \] Differentiating once to get the second order derivative; \[\frac {\partial\nabla f(x,y,z)}{\partial (x,y,z)}=\left \langle 25 sin(5 x - y^2 + 7 z), 4 y^2 sin(5 x - y^2 + 7 z), 49 sin(5 x - y^2 + 7 z) \right \rangle \]
Thus, the second directional derivative is: \[D^2_\overrightarrow{T}f(x,y,z)=\left \langle 25 sin(5 x - y^2 + 7 z), 4 y^2 sin(5 x - y^2 + 7 z), 49 sin(5 x - y^2 + 7 z) \right \rangle \cdot \left \langle -1,0,1 \right \rangle \] At point P, \[D^2_\overrightarrow{T}f(x,y,z)=\left \langle 25 sin(0), 4*5 sin(0), 49 sin(0) \right \rangle \cdot \left \langle -1,0,1 \right \rangle \] \[D^2_\overrightarrow{T}f(x,y,z)=\left \langle 0, 0, 0 \right \rangle \cdot \left \langle -1,0,1 \right \rangle \] \[D^2_\overrightarrow{T}f(x,y,z)=0\]
To determine the fourth directional derivative, no computations are needed. Continuously differentiating will not change the inner term of either the sine or cosine function; \(5 x - y^2 + 7 z\). At point P, this is 0. \(sin(0)=0\). Due to the periodic nature of the sine and cosine functions, differentiating an even amount of times will always result in the gradients consisting of sin(0)*by some constant after differentiation. The initial function was a sine function, the first differentiation is then a cosine function, and the pattern continues. The second directional derivative has the sine function in its gradient. For every even number directional derivative, the result will be: \[D^n_\overrightarrow{T}f(x,y,z)=\left \langle k_1 sin(0), k_2 sin(0), k_3 sin(0) \right \rangle \cdot \left \langle -1,0,1 \right \rangle \] \[D^n_\overrightarrow{T}f(x,y,z)=\left \langle 0, 0, 0 \right \rangle \cdot \left \langle -1,0,1 \right \rangle \] \[D^n_\overrightarrow{T}f(x,y,z)=0\]
Where n is an even number, \(k_1, k_2, k_3\) are the resultant factors after differentiation.
Therefore, \[D^4_\overrightarrow{T}f(x,y,z)=0\]
The 21st directional derivative is structurally identical to the first directional derivative due to the fact they are both odd. \[D^{21}_\overrightarrow{T}f(x,y,z)=\left \langle k_1 cos(0), k_2 cos(0), k_3 cos(0) \right \rangle \cdot \left \langle -1,0,1 \right \rangle \] Where \(k_1,k_2,k_3\) are factors as a result of differentiation. Observing the first directional derivative,
\[\nabla f(x,y,z)=\left \langle -5 cos(5 x - y^2 + 7 z), 2 y cos(5 x - y^2 + 7 z), -7 cos(5 x - y^2 + 7 z) \right \rangle \] Due to the periodic nature of the sine and cosine function, further differentiation will change whether the function of each component is cosine or sine, and the factor will continuously be multiplied, as the input of the sinusoidal function will continuously be differentiated with respect to x, y and z. The factors for each component will be multiplied by itself a total of 21 times. Therefore, the 21st directional derivative can be expressed as; \[D^{21}_\overrightarrow{T}f(x,y,z)=\left \langle (-5)^{21} cos(0), (2y)^{21} cos(0), (-7)^{21 }cos(0) \right \rangle \cdot \left \langle -1,0,1 \right \rangle \] \[D^{21}_\overrightarrow{T}f(x,y,z)=\left \langle (-5)^{21}, (2y)^{21}, (-7)^{21 } \right \rangle \cdot \left \langle -1,0,1 \right \rangle \] \[D^{21}_\overrightarrow{T}f(x,y,z)= -(-5)^{21}+(-7)^{21} \]