Assignment 1

1.

1.1: Plot \(f(x,y)\) using R

library(plotly)

f <- function(x, y) { 10 * x * y * exp(-x^4 - y^4) }

x <- seq(-2, 2, by = 0.05)
y <- x
z <- outer(x, y, f)

plot_ly(x = ~x, y = ~y, z = ~z, type = "surface")

The shape seems to consist of four main peaks: two that are upward-pointing (hills) and two that are downward-pointing (pits). Observing the function, this makes sense as the exponential term is very similar to a bell curve(Gaussian distribution), with the 10xy term changing the signs depending on the quadrant.

There is one pit at intervals of \(x=[-1.3,-0.1]\), \(y=[0.1,1.3]\). The other pit is at the intervals of \(x=[0.1,1.3]\), \(y=[-1.3,0.1]\). There is one hill at intervals of \(x=[0.1,1.3]\), \(y=[0.1,1.3]\). The other hill at intervals of \(x=[-1.3,-0.1]\), \(y=[-1.3,0.1]\).

1.2 Find the gradient function

Given the function:

\[ f(x,y) = 10xye^{-x^4-y^4} \]

To find the gradient, we need to differentiate with respect to both \(x\) and \(y\):

\[ \nabla f(x,y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) \]

# Define the function
f <- function(x, y) {
  10 * x * y * exp(-x^4 - y^4)
}

# Partial derivative with respect to x
df_dx <- function(x, y) {
  10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}

# Partial derivative with respect to y
df_dy <- function(x, y) {
  10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}

gradient_f <- list("df_dx" = df_dx, "df_dy" = df_dy)
gradient_f

## $df_dx
## function(x, y) {
##   10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
## }
## 
## $df_dy
## function(x, y) {
##   10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
## }

The partial derivative with respect to \(x\) is: \[ \frac{\partial f}{\partial x} = 10ye^{-x^4-y^4} - 40x^4ye^{-x^4-y^4} \]

The partial derivative with respect to \(y\) is: \[ \frac{\partial f}{\partial y} = 10xe^{-x^4-y^4} - 40xy^4e^{-x^4-y^4} \]

Therefore, the gradient function for \(f(x,y)\) is: \[ \nabla f(x,y) = \left( 10ye^{-x^4-y^4} - 40x^4ye^{-x^4-y^4}, 10xe^{-x^4-y^4} - 40xy^4e^{-x^4-y^4} \right) \]

1.3: Finding the direction for which the rate of change is the largest at P(0.5,-1)

The rate of change will be the largest at P(0.5,-1) if the direction is the same as the direction of the gradient at point P. Therefore, evaluating the gradient at point P will give the direction that maximizes the rate of change.

x_val <- 0.5
y_val <- -1

list(df_dx(x_val,y_val), df_dy(x_val,y_val))

## [[1]]
## [1] -2.591931
## 
## [[2]]
## [1] -5.183861

Therefore, moving in the direction of \[ \nabla f(x,y) \approx \left \langle-2.591931, -5.183861 \right \rangle \], this will result in the largest rate of change at point P.

1.4: Draw the contour plot of f(x,y)

library(tidyverse)
library(ggplot2)
library(ggquiver)
library(viridis)

f <- function(x, y) {
  10 * x * y * exp(-x^4 - y^4)
}

x_seq <- seq(-2, 2, length.out = 100)
y_seq <- seq(-2, 2, length.out = 100)
grid <- expand.grid(x = x_seq, y = y_seq)
grid$z <- with(grid, f(x, y))

ggplot(grid, aes(x = x, y = y, z = z)) +
geom_contour_filled(aes(fill = stat(level))) +   # Updated to use ..level..
scale_fill_viridis_d()  # For a nice color scale

Adding gradients to the contour plot,

f <- expression(10 * x * y * exp(-x^4 - y^4))
df_dx <- D(f, "x")
df_dy <- D(f, "y")

grid <- grid %>%
  mutate(
    dx = map2_dbl(x, y, ~eval(df_dx, list(x = .x, y = .y))),
    dy = map2_dbl(x, y, ~eval(df_dy, list(x = .x, y = .y)))
  )


ggplot(grid, aes(x = x, y = y, z = z)) +
  geom_contour_filled(aes(fill = stat(level))) +
  geom_quiver(
    aes(u = 0.1 * dx, v = 0.1 * dy), 
    scale = 2000,  # Adjust this scale factor to increase the arrow size
    color = "white",  # A color that stands out against your fill
    arrow = arrow(type = "open", length = unit(5, "inches"))  # Customize arrow appearance
  ) +
  scale_fill_viridis_d() +
  theme_minimal() +
  labs(title = "Contour Map with Gradients of f(x, y)")

The values of x, y of \(\left [\pm 1, \pm 1 \right] \rightarrow \left [\pm 0.8, \pm 1 \right ]\) are areas in the contour plot where the rate of change seems larger than any other regions in the plot. This is clear by observing the distance between z-levels at this point starts to decrease, meaning the function is changing at a far faster and greater rate within the plot.

2. Find the relative maxima and minima of f(x,y)

To find the critical points of the function, set the derivative of the function with respect to x and y to 0, and solve for the set of points.

\[ \frac{\partial f}{\partial x} = 10ye^{-x^4-y^4} - 40x^4ye^{-x^4-y^4}=0 \] \[ \frac{\partial f}{\partial y} = 10xe^{-x^4-y^4} - 40xy^4e^{-x^4-y^4} =0 \]

For \[ \frac{\partial f}{\partial x}\]: \[10ye^{-x^4-y^4} - 40x^4ye^{-x^4-y^4}=0 \] \[10 \left (ye^{-x^4-y^4} - 4x^4ye^{-x^4-y^4} \right) =0 \] \[ye^{-x^4-y^4} - 4x^4ye^{-x^4-y^4} =0 \] \[ e^{-x^4-y^4} \left ( y - 4x^4y \right ) =0 \] Because \(e^{-x^4-y^4}\) is always non-zero, \[\left ( y - 4x^4y \right ) =0 \] \[\left ( y(1- 4x^4) \right ) =0 \]

Which means either \(y=0\) or \(x= \pm \left (\frac{1}{4} \right )^\frac{1}{4}\)

For \[ \frac{\partial f}{\partial y}\]: \[10xe^{-x^4-y^4} - 40xy^4e^{-x^4-y^4}=0 \] \[10 \left (xe^{-x^4-y^4} - 4xy^4e^{-x^4-y^4} \right) =0 \] \[xe^{-x^4-y^4} - 4xy^4e^{-x^4-y^4} =0 \] \[ e^{-x^4-y^4} \left ( x - 4xy^4 \right ) =0 \] Because \(e^{-x^4-y^4}\) is always non-zero, \[\left ( x - 4xy^4 \right ) =0 \] \[\left ( x(1- 4y^4) \right ) =0 \]

Which means either \(x=0\) or \(y=\pm \left (\frac{1}{4} \right )^\frac{1}{4}\)

Therefore, \[ \frac{\partial f}{\partial x} = 10ye^{-x^4-y^4} - 40x^4ye^{-x^4-y^4}=0 \] \[ \frac{\partial f}{\partial y} = 10xe^{-x^4-y^4} - 40xy^4e^{-x^4-y^4} =0 \]

\(y(1- 4x^4) =0\) for \(\frac{\partial f}{\partial x}\)

\(x(1- 4y^4)=0\) for \(\frac{\partial f}{\partial y}\)

If \(y=0\), \(f_x=0\), therefore \(x=0\). Therefore, there is a critical point at (0,0). The converse is also true, \(x=0\), \(f_y=0\), therefore \(y=0\)
If \((1- 4x^4) =0\) and \((1- 4y^4)=0\), \(y=\pm \left (\frac{1}{4} \right )^\frac{1}{4}\) and \(x= \pm \left (\frac{1}{4} \right )^\frac{1}{4}\). Therefore, there are critical points at: \[\left( \pm \sqrt[4]{\frac{1}{4}},\pm\sqrt[4]{\frac{1}{4}} \right) \] and at \[\left(0,0\right) \]

Classifying the critical points,

\[ f(x,y) = 10xye^{-x^4-y^4} \] Let A=\(f_{xx}\), B=\(f_{xy}\), and let C=\(f_{yy}\) \[ \frac{\partial f}{\partial x} = 10ye^{-x^4-y^4} - 40x^4ye^{-x^4-y^4}\] \[ \frac{\partial f}{\partial y} = 10xe^{-x^4-y^4} - 40xy^4e^{-x^4-y^4} \]

\[A=f_{xx}=40 e^{-x^4 - y^4} x^3 (-5 + 4 x^4) y \] \[B=f_{xy}=10 e^{-x^4 - y^4} (-1 + 4 x^4) (-1 + 4 y^4) \] \[C=f_{yy}=40 e^{-x^4 - y^4} x y^3 (-5 + 4 y^4) \]

\[\pm \sqrt[4]{\frac{1}{4}} \approx \pm 0.71 \]

library(dplyr)
library(knitr)
library(kableExtra)

# Define the second derivative functions for A, B, and C
A_function <- function(x, y) {
  40*exp(-x^4 - y^4)*x^3*(-5 + 4*x^4)*y
}

B_function <- function(x, y) {
  10*exp(-x^4 - y^4)*(-1 + 4*x^4)*(-1 + 4*y^4)
}


C_function <- function(x, y) {
  40*exp(-x^4 - y^4)*x*y^3*(-5 + 4*y^4)
}

a=A_function(0.71,-0.71)
b=B_function(0.71,-0.71)
c=C_function(0.71,-0.71)

hessian=b^2-a*c
a1=A_function(0.71,0.71)
b1=B_function(0.71,0.71)
c1=C_function(0.71,0.71)
hessian1=b1^2-a1*c1

a2=A_function(-0.71,0.71)
b2=B_function(-0.71,0.71)
c2=C_function(-0.71,0.71)
hessian2=b2^2-a2*c2

a3=A_function(-0.71,-0.71)
b3=B_function(-0.71,-0.71)
c3=C_function(-0.71,-0.71)
hessian3=b3^2-a3*c3

values <- data.frame(
  CP = c("(0.71, -0.71)", "(0.71, 0.71)", "(-0.71, 0.71)", "(-0.71, -0.71)"),
  A = c(a, a1, a2, a3),
  B = c(b, b1, b2, b3),
  C = c(c, c1, c2, c3),
  Hessian = c(hessian, hessian1, hessian2, hessian3)
)

# Assuming the nature of the point is determined by the Hessian test for min/max
values$Nature <- ifelse(values$Hessian < 0 & values$A > 0, "Local Min",
                        ifelse(values$Hessian < 0 & values$A < 0, "Local Max",
                        ifelse(values$Hessian > 0, "Saddle Point", "Inconclusive")))

names(values)[names(values) == "Hessian"] <- "B²-AC"

kable(values, "html", booktabs = TRUE, escape = FALSE, align = 'c') %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE, position = "center") %>%
  add_header_above(c(" " = 1, "Classification of Critical Points" = 3, " " = 2)) %>%
  pack_rows("Critical Points", 1, 4)

	Classification of Critical Points
CP	A	B	C	B²-AC	Nature
Critical Points
(0.71, -0.71)	24.35784	0.0016312	24.35784	-593.3042	Local Min
(0.71, 0.71)	-24.35784	0.0016312	-24.35784	-593.3042	Local Max
(-0.71, 0.71)	24.35784	0.0016312	24.35784	-593.3042	Local Min
(-0.71, -0.71)	-24.35784	0.0016312	-24.35784	-593.3042	Local Max

Therefore, \((0.71,0.71)\) is a local Max, \((-0.71,-0.71)\) is a local max, \((0.71,-0.71)\) is a local minimum, and \((-0.71,0.71)\) is a local minimum, which is in agreement with the plot of f(x,y). As for the point \((0,0)\), the classification “test” of the critical point is inconclusive.

3: Gradient Ascent algorithm to maximize f(x,y)

3.1: Set initial point to P (x0, y0) = (0.25, −1.0)

# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}

dx <- function(x, y) {
  10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
  10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}

grad <- function(x,y) { c( dx(x,y) , dy(x,y)  ) }

x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))

point <- c(x = 0.25, y = -1)
step_size <- 0.2
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 100

v <- ggplot(dat, aes(x=x, y=y, z=f(x,y))) 
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw() 
counter <- 1
while(increment > epsilon & counter <= 100){
  prev_point <- point
  point <- point + grad(point[1], point[2]) * step_size
  
  v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
  v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
  
  counter   <- counter + 1
  increment <- sqrt(sum((point-prev_point)^2))
  print(point)
}

##         x         y 
## -0.471439 -1.549668 
##          x          y 
## -0.4788455 -1.4876997 
##          x          y 
## -0.4954716 -1.3616967 
##          x          y 
## -0.5579864 -0.9794777 
##          x          y 
## -0.9916229  0.1025142 
##          x          y 
## -1.2151691 -0.6512106 
##          x          y 
## -0.2658583 -0.7155912 
##          x          y 
## -1.3395505 -0.6957011 
##          x          y 
## -0.8169780 -0.7010352 
##          x          y 
## -0.2654156 -0.7289061 
##          x          y 
## -1.3375380 -0.6774715 
##          x          y 
## -0.8098003 -0.6913666 
##          x          y 
## -0.2943525 -0.7635564 
##          x          y 
## -1.3408752 -0.6139719 
##          x          y 
## -0.8394379 -0.6535889 
##          x          y 
## -0.1857254 -0.8835246 
##          x          y 
## -1.1407574 -0.5935678 
##           x           y 
## -0.02749867 -0.78013450 
##          x          y 
## -1.1047896 -0.7618457 
##          x          y 
##  0.1113875 -0.6382619 
##          x          y 
## -0.9691015 -0.5748330 
##          x          y 
##  0.1095633 -0.9799973 
##          x          y 
## -0.6691308 -1.2142681 
##          x          y 
## -0.7139103 -0.2557746 
##          x          y 
## -0.6985717 -1.3334866 
##          x          y 
## -0.7027915 -0.7904332 
##          x          y 
## -0.7230696 -0.3719574 
##          x          y 
## -0.6712065 -1.3687125 
##          x          y 
## -0.6837810 -0.9413619 
##          x          y 
## -0.7704105  0.1316532 
##          x          y 
## -0.8461271 -0.9500447 
##           x           y 
## -0.31688416  0.06365504 
##          x          y 
## -0.1959366 -0.5637034 
##          x          y 
## -1.2075646 -0.7745567 
##          x          y 
## -0.2400122 -0.6861858 
##          x          y 
## -1.3213064 -0.7295744 
##          x          y 
## -0.7375460 -0.7169841 
##          x          y 
## -0.5871604 -0.6689175 
##          x          y 
## -1.0972370 -0.8388986 
##           x           y 
##  0.05410101 -0.53097047 
##          x          y 
## -0.9266577 -0.4628094 
##          x          y 
## -0.1021701 -1.1542323 
##          x          y 
## -0.4932442 -0.9429909 
##           x           y 
## -1.10853695 -0.03094206 
##          x          y 
## -1.0396373 -0.5206731 
##           x           y 
##  0.06528567 -0.94475948 
##          x          y 
## -0.7864742 -1.0734778 
##          x          y 
## -0.5806297  0.1525330 
##          x          y 
## -0.4322106 -0.8811612 
##          x          y 
## -1.2335486 -0.2363698 
##          x          y 
## -0.8491638 -0.4761485 
##          x          y 
## -0.2684164 -1.2380911 
##          x          y 
## -0.4985388 -0.8101908 
##          x          y 
## -1.2439689 -0.3694236 
##          x          y 
## -0.6765497 -0.5755593 
##          x          y 
## -0.8120411 -1.1272307 
##           x           y 
## -0.59734357  0.01466485 
##          x          y 
## -0.5846715 -1.0371994 
##          x          y 
## -0.8936403  0.1496474 
##          x          y 
## -1.1388414 -0.7925268 
##             x             y 
## -0.0006447416 -0.6274888082 
##          x          y 
## -1.0753912 -0.6279083 
##          x          y 
##  0.1521594 -0.8107123 
##          x          y 
## -0.8976821 -0.9544531 
##            x            y 
## -0.203001087 -0.005764464 
##          x          y 
## -0.2144323 -0.4110777 
##          x          y 
## -1.0050118 -0.7794818 
##          x          y 
##  0.1920167 -0.5406863 
##          x          y 
## -0.7940371 -0.3089555 
##          x          y 
## -0.5512372 -1.3278933 
##          x          y 
## -0.6194044 -0.8147282 
##          x          y 
## -0.9916518 -0.2900142 
##          x          y 
## -0.3635972 -1.0175894 
##          x          y 
## -1.0002034 -0.2132295 
##          x          y 
## -0.5303918 -0.9409532 
##         x         y 
## -1.072993  0.014791 
##          x          y 
## -1.1068027 -0.5553194 
##           x           y 
##  0.01972965 -0.83341277 
##          x          y 
## -1.0091631 -0.8560589 
##          x          y 
##  0.1076639 -0.3759515 
##          x          y 
## -0.6288723 -0.1817747 
##          x          y 
## -0.7451449 -1.2515482 
##          x          y 
## -0.7082719 -0.4216844 
##          x          y 
## -0.7040742 -1.3538253 
##          x          y 
## -0.7053289 -0.8777028 
##           x           y 
## -0.71291473 -0.04184215 
##          x          y 
## -0.7107649 -1.1430669 
##           x           y 
## -0.70406552  0.02124286 
##          x          y 
## -0.7034975 -1.0801060 
##          x          y 
## -0.7122818  0.1748170 
##          x          y 
## -0.7202740 -0.9213169 
##           x           y 
## -0.66781501  0.08642782 
##          x          y 
## -0.6388549 -1.0079967 
##          x          y 
## -0.8416937  0.1976211 
##          x          y 
## -1.0824154 -0.8137069 
##          x          y 
##  0.1123161 -0.5470109 
##          x          y 
## -0.8872156 -0.4151961 
##          x          y 
## -0.2458794 -1.2319679 
##          x          y 
## -0.4875517 -0.8298892 
##          x          y 
## -1.2430660 -0.3153016

For \(\eta=0.2\), the algorithm does not converge. The plotted path zig zags across the plot, crossing itself numerous times. This means the algorithm is not moving in a clear path towards a maximum and oscillates with 100 iterations. One can infer that the step size is too high, causing the algorithm to overshoot, resulting in the zig zag pattern.

For \(\eta=0.001\), the algorithm is as follows:

# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}

dx <- function(x, y) {
  10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
  10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}

grad <- function(x,y) { c( dx(x,y) , dy(x,y)  ) }

x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))

point <- c(x = 0.25, y = -1)
step_size <- 0.001
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 100

v <- ggplot(dat, aes(x=x, y=y, z=f(x,y))) 
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw() 
counter <- 1
while(increment > epsilon & counter <= 100){
  prev_point <- point
  point <- point + grad(point[1], point[2]) * step_size
  
  v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
  v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
  
  counter   <- counter + 1
  increment <- sqrt(sum((point-prev_point)^2))
  print(point)
}

##          x          y 
##  0.2463928 -1.0027483 
##          x          y 
##  0.2428114 -1.0054673 
##          x          y 
##  0.2392559 -1.0081563 
##          x          y 
##  0.2357262 -1.0108148 
##          x          y 
##  0.2322223 -1.0134423 
##          x          y 
##  0.2287443 -1.0160383 
##         x         y 
##  0.225292 -1.018602 
##          x          y 
##  0.2218654 -1.0211341 
##          x          y 
##  0.2184644 -1.0236331 
##          x          y 
##  0.2150888 -1.0260990 
##          x          y 
##  0.2117384 -1.0285317 
##          x          y 
##  0.2084132 -1.0309307 
##         x         y 
##  0.205113 -1.033296 
##          x          y 
##  0.2018374 -1.0356271 
##          x          y 
##  0.1985864 -1.0379241 
##          x          y 
##  0.1953597 -1.0401868 
##         x         y 
##  0.192157 -1.042415 
##          x          y 
##  0.1889781 -1.0446086 
##          x          y 
##  0.1858228 -1.0467676 
##          x          y 
##  0.1826907 -1.0488919 
##          x          y 
##  0.1795815 -1.0509816 
##         x         y 
##  0.176495 -1.053037 
##          x          y 
##  0.1734308 -1.0550568 
##          x          y 
##  0.1703887 -1.0570425 
##          x          y 
##  0.1673683 -1.0589935 
##          x          y 
##  0.1643692 -1.0609100 
##          x          y 
##  0.1613913 -1.0627921 
##         x         y 
##  0.158434 -1.064640 
##          x          y 
##  0.1554972 -1.0664533 
##          x          y 
##  0.1525803 -1.0682326 
##          x          y 
##  0.1496832 -1.0699780 
##          x          y 
##  0.1468054 -1.0716895 
##          x          y 
##  0.1439466 -1.0733673 
##          x          y 
##  0.1411064 -1.0750115 
##          x          y 
##  0.1382846 -1.0766224 
##          x          y 
##  0.1354806 -1.0782001 
##          x          y 
##  0.1326942 -1.0797447 
##         x         y 
##  0.129925 -1.081257 
##          x          y 
##  0.1271727 -1.0827357 
##          x          y 
##  0.1244368 -1.0841823 
##          x          y 
##  0.1217171 -1.0855967 
##          x          y 
##  0.1190132 -1.0869791 
##          x          y 
##  0.1163247 -1.0883295 
##          x          y 
##  0.1136512 -1.0896483 
##          x          y 
##  0.1109925 -1.0909356 
##          x          y 
##  0.1083481 -1.0921916 
##          x          y 
##  0.1057178 -1.0934166 
##          x          y 
##  0.1031011 -1.0946107 
##          x          y 
##  0.1004978 -1.0957741 
##           x           y 
##  0.09790739 -1.09690704 
##           x           y 
##  0.09532964 -1.09800973 
##           x           y 
##  0.09276418 -1.09908236 
##           x           y 
##  0.09021066 -1.10012511 
##           x           y 
##  0.08766876 -1.10113819 
##           x           y 
##  0.08513814 -1.10212178 
##           x           y 
##  0.08261847 -1.10307606 
##          x          y 
##  0.0801094 -1.1040012 
##           x           y 
##  0.07761063 -1.10489747 
##           x           y 
##  0.07512181 -1.10576495 
##           x           y 
##  0.07264263 -1.10660385 
##           x           y 
##  0.07017276 -1.10741433 
##           x           y 
##  0.06771189 -1.10819657 
##           x           y 
##  0.06525969 -1.10895072 
##           x           y 
##  0.06281585 -1.10967695 
##           x           y 
##  0.06038006 -1.11037540 
##         x         y 
##  0.057952 -1.111046 
##           x           y 
##  0.05553136 -1.11168958 
##           x           y 
##  0.05311785 -1.11230559 
##           x           y 
##  0.05071115 -1.11289439 
##           x           y 
##  0.04831095 -1.11345611 
##           x           y 
##  0.04591696 -1.11399088 
##           x           y 
##  0.04352888 -1.11449881 
##          x          y 
##  0.0411464 -1.1149800 
##           x           y 
##  0.03876924 -1.11543462 
##           x           y 
##  0.03639708 -1.11586270 
##           x           y 
##  0.03402965 -1.11626438 
##           x           y 
##  0.03166664 -1.11663973 
##           x           y 
##  0.02930777 -1.11698885 
##           x           y 
##  0.02695274 -1.11731181 
##           x           y 
##  0.02460127 -1.11760870 
##           x           y 
##  0.02225307 -1.11787958 
##           x           y 
##  0.01990784 -1.11812452 
##           x           y 
##  0.01756531 -1.11834357 
##           x           y 
##  0.01522519 -1.11853679 
##          x          y 
##  0.0128872 -1.1187042 
##           x           y 
##  0.01055105 -1.11884591 
##           x           y 
##  0.00821645 -1.11896190 
##            x            y 
##  0.005883128 -1.119052202 
##            x            y 
##  0.003550799 -1.119116855 
##           x           y 
##  0.00121918 -1.11915587 
##            x            y 
## -0.001112009 -1.119169270 
##            x            y 
## -0.003443052 -1.119157051 
##            x            y 
## -0.005774229 -1.119119219 
##            x            y 
## -0.008105822 -1.119055768 
##           x           y 
## -0.01043811 -1.11896669 
##           x           y 
## -0.01277138 -1.11885197 
##           x           y 
## -0.01510591 -1.11871158 
##           x           y 
## -0.01744198 -1.11854549 
##           x           y 
## -0.01977988 -1.11835368 
##           x           y 
## -0.02211989 -1.11813610

For \(\eta=0.001\), the algorithm does not converge in 100 iterations. This is evident due to the plotted path, which does not reach the maximum. In fact, looking at the trajectory of the plotted gradient ascent path, it’s evident that it hasn’t converged on a value. In order for the algorithm to converge with the constrained step size, more iterations would be needed for convergence.

To find a \(\eta\) that converges in less than 13 iterations, the step size needs to be orders of magnitude larger than 0.001. Because we are decreasing the number of iterations from 100 to 13, a reasonable initial assumption is to increase the step size by a factor of 10, 0.01 roughly assuming that the gradient ascent plot path is somewhat linear.

# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}

dx <- function(x, y) {
  10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
  10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}

grad <- function(x,y) { c( dx(x,y) , dy(x,y)  ) }

x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))

point <- c(x = 0.25, y = -1)
step_size <- 0.01
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 13

v <- ggplot(dat, aes(x=x, y=y, z=f(x,y))) 
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw() 
counter <- 1
while(increment > epsilon & counter <= 100){
  prev_point <- point
  point <- point + grad(point[1], point[2]) * step_size
  
  v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
  v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
  
  counter   <- counter + 1
  increment <- sqrt(sum((point-prev_point)^2))
}

v

This is a clear case of convergence. The plotted path takes direct steps towards the maximum, and eventually, reaches the maximum. In fact, observing the last iteration, this is approximately equal to the critical point found earlier, \(\left( -\sqrt[4]{\frac{1}{4}},-\sqrt[4]{\frac{1}{4}} \right)\), which is classified as a local maximum.

3.2: Set the initial point to P ′ = (x0, y0) = (1, −0.25)

To find a \(\eta\) that converges in less than 12 iterations with the new point P, one can observe the contour plot. \(\acute{P}\) is in the 4th quadrant. In order for convergence, the plotted path will most likely go into the first quadrant, as it is the closest way to reach a maximum. Because a \(\eta=0.001\) was found to be too small for 100 iterations, the \(\eta=0.01\) would still be a reasonable initial assumption due to the fact that the iteration size is decreased by roughly a factor of 10.

# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}

dx <- function(x, y) {
  10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
  10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}

grad <- function(x,y) { c( dx(x,y) , dy(x,y)  ) }

x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))

point <- c(x = 1, y = -0.25)
step_size <- 0.01
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 12

v <- ggplot(dat, aes(x=x, y=y, z=f(x,y))) 
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw() 
counter <- 1
while(increment > epsilon & counter <= 100){
  prev_point <- point
  point <- point + grad(point[1], point[2]) * step_size
  
  v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
  v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
  
  counter   <- counter + 1
  increment <- sqrt(sum((point-prev_point)^2))
}

v

Convergence has been acheived with a \(\eta=0.01\). The plotted path takes small steps along the gradient, eventually reaching a maximum. In this case, observing the final point, this can be approximated to \(\left( \sqrt[4]{\frac{1}{4}},\sqrt[4]{\frac{1}{4}} \right)\), a critical point which is classified as a maximum.

3.3: Comparing P and P’

Due to the nature of the function f(x,y), maximums are observed when both the x and y term have the same sign. This is due to the \(10xy\) term in the function. With differing signs, the result is negative. The gradient ascent algorithm is focused on taking steps towards the fastest rate of increase to the maxima of the function, which depends on the initial point. With \(P=(0.25,-1)\), this initial point is located very close to the boundary between the fourth and third quadrant. There is a maximum in the third quadrant due to both x and y having the same sign. As a result, the gradient ascent plot will traverse into the third quadrant as it is the steepest rate of increase.

For point \(P \acute =(1,-0.25)\), this point is much closer towards the boundary of the fourth and first quadrant. There is a maximum in the first quadrant due to both x and y being positive. As a result, the plotted path will traverse into the first quadrant, following the steepest rate of increase from the initial point. While both points in the algorithm reach the maximum value of f(x,y), they traverse different paths due to the initial points used, and their relative proximity to a maximum. Because both algorithms reach the maxmium value in less than 13 iterations, both initial points are good choices to yield the maximum value of the function.

4: Gradient Descent of f(x,y)

4.1

To convert the gradient ascent algorithm to a gradient descent algorithm, it is simply changing the signage so the algorithm goes in the opposite direction of the steepest rate of change. This is expressed as: \[P(x_{i+1},y_{i+1})= P(x_{i},y_{i}) - \nabla f(x_{i},y_{i}) \times \eta\]

# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}

dx <- function(x, y) {
  10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
  10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}

grad <- function(x,y) { c( dx(x,y) , dy(x,y)  ) }

x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))

point <- c(x = 0.25, y = -1)
step_size <- 0.2
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 100

v <- ggplot(dat, aes(x=x, y=y, z=f(x,y))) 
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw() 
counter <- 1
while(increment > epsilon & counter <= 100){
  prev_point <- point
  point <- point - grad(point[1], point[2]) * step_size # Gradient Descent conversion, change the signage of the gradient
  
  v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
  v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
  
  counter   <- counter + 1
  increment <- sqrt(sum((point-prev_point)^2))
}

v

4.2: Gradient descent with initial point P(0.5,-0.25)

To find a \(\eta\) that converges to the local minimum \((0.71, −0.71)\), one can use the results of convergence with 100 iterations with a \(\eta=0.2\) for a gradient ascent, and apply the same rationale for gradient descent.A step rate of 0.2 is far too large, a reasonable assumption for a step rate could be 0.01 for gradient descent. Because this step rate has been successful in determining convergence for gradient ascent, one can reasonably assume it will work for gradient descent.

# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}

dx <- function(x, y) {
  10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
  10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}

grad <- function(x,y) { c( dx(x,y) , dy(x,y)  ) }

x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))

point <- c(x = 0.5, y = -0.25)
step_size <- 0.01
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 100

v <- ggplot(dat, aes(x=x, y=y, z=f(x,y))) 
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw() 
counter <- 1
while(increment > epsilon & counter <= 100){
  prev_point <- point
  point <- point - grad(point[1], point[2]) * step_size # Gradient Descent conversion, change the signage of the gradient
  
  v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
  v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
  
  counter   <- counter + 1
  increment <- sqrt(sum((point-prev_point)^2))
}

v

Observing the plot, this clearly converges at the local minimum, (0.71,-0.71). The plotted path takes small steps opposite to the steepest rate of change, reaching the local minimum.

4.3:Find another initial point that makes the algorithm to converge to the local minimum, (x, y) =(−0.71, 0.71).

The local minimum (−0.71, 0.71) is located in the second quadrant. Thus any initial point must be closer to (−0.71, 0.71) than (0.71, -0.71) to converge to the local minimum (−0.71, 0.71). Thus, a good initial point would be (-0.5,0.25). One can note that this is simply the opposite signage of point P, as this Point P is in quadrant 4, and the opposite signage applied would shift it to the second quadrant.

# install.packages('ggplot2')
library('ggplot2')
f <- function(x,y){(10 * x * y * exp(-x^4 - y^4))}

dx <- function(x, y) {
  10 * y * exp(-x^4 - y^4) - 40 * x^4 * y * exp(-x^4 - y^4)
}
dy <- function(x, y) {
  10 * x * exp(-x^4 - y^4) - 40 * x * y^4 * exp(-x^4 - y^4)
}

grad <- function(x,y) { c( dx(x,y) , dy(x,y)  ) }

x <- seq(-3,3,length=50)
y <- seq(-3,3,length=50)
dat <- data.frame(expand.grid(x=x,y=y))
dat$z <- with(dat, f(x,y))

point <- c(x = -0.5, y = 0.25)
step_size <- 0.01
epsilon <- 0.0001
increment <- 1000 # Any value greater than epsilon
max_iteration <- 100

v <- ggplot(dat, aes(x=x, y=y, z=f(x,y))) 
v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
v <- v + geom_contour() + theme_bw() 
counter <- 1
while(increment > epsilon & counter <= 100){
  prev_point <- point
  point <- point - grad(point[1], point[2]) * step_size # Gradient Descent conversion, change the signage of the gradient
  
  v <- v + geom_line(aes(x,y), data = data.frame(x=c(point[1], prev_point[1]), y = c(point[2], prev_point[2]), z = c(0,0)))
  v <- v + geom_point(aes(x,y), dat = data.frame(x=point[1],y=point[2], z=0) )
  
  counter   <- counter + 1
  increment <- sqrt(sum((point-prev_point)^2))
}

v

The plot pattern converges to the local minimum (−0.71, 0.71), taking gradual steps opposite to the greatest rate of change.

5. Directional Derivatives

5.1: Calculate the 1st directional derivative of f in direction of decreasing x on C at P (1, √5, 0)

Calculating the 1st directional derivative requires finding the tangent vector at the point \(P(1, \sqrt{5}, 0)\) for the curve defined by \(y^2 + z^2 = 5\) and \(x + z = 1\).

In the direction of decreasing x, x can be paramaterized/expressed as 1-t, where t is any real number. Consequently, z=t given by one of the equations of the curve. Solving for y:

\[y^2+z^2=5\] \[y^2+t^2=5\] \[y^2=5-t^2\] \[y=\sqrt{5-t^2}\] Given that z=t, and at point P z=0, therefore t=0 at point P. Differentiating x,y,z with respect to t:

# Calculate derivatives
t <- 0  # At point P, t will be 0
dx_dt <- -1
dy_dt <- -t / sqrt(5 - t^2)
dz_dt <- 1

# At t = 0
dy_dt_at_P <- eval(dy_dt)

# The tangent vector at P
tangent_vector <- c(dx_dt, dy_dt_at_P, dz_dt)
tangent_vector

## [1] -1  0  1

Therefore the tangent vector at point P is \(\overrightarrow{T}=\left\langle -1,0,1 \right\rangle\) The unit tangent vector at point P is therefore \(\hat{T}=\left\langle -1,0,1 \right\rangle * \frac{1}{\left| \overrightarrow{T} \right|}=\left\langle -1,0,1 \right\rangle * \frac{1}{\sqrt{(-1)^2 +0^2 +1^2}}=\left\langle \frac{-1}{\sqrt{2}},0,\frac{1}{\sqrt{2}} \right\rangle\)
Finding the gradient of f,

\[\frac{\partial f}{\partial x}=-5 cos(5 x - y^2 + 7 z)\] \[\frac{\partial f}{\partial y}=2 y cos(5 x - y^2 + 7 z)\] \[\frac{\partial f}{\partial z}=-7 cos(5 x - y^2 + 7 z)\]

Therefore, \[\nabla f(x,y,z)=\left \langle -5 cos(5 x - y^2 + 7 z), 2 y cos(5 x - y^2 + 7 z), -7 cos(5 x - y^2 + 7 z) \right \rangle \] At point P, \[\nabla f((1, √5, 0)=\left \langle -5 cos(0), 2 √5 cos(0), -7 cos(0) \right \rangle \] \[\nabla f((1, √5, 0)=\left \langle -5, 2 √5, -7 \right \rangle \]

At point P, the directional derivative is: \[D_\hat{T}f(x,y,z)=\left \langle -5, 2 √5, -7 \right \rangle \cdot \left\langle \frac{-1}{\sqrt{2}},0,\frac{1}{\sqrt{2}} \right\rangle \] \[D_\hat{T}f(x,y,z)=\frac{5}{\sqrt{2}}-\frac{7}{\sqrt{2}} \] \[D_\hat{T}f(x,y,z)=\frac{-2}{\sqrt{2}} \] \[D_\hat{T}f(x,y,z)=-\sqrt{2} \]

5.2: Calculate the second directional derivative of f in direction of decreasing x on C at P (1, √5, 0)

The tangent/direction vector within the directional derivative computation stays the same in higher order directional derivatives. Higher order directional derivatives involve getting the higher order gradient function derivative and performing the dot product with the tangent vector. the first order general gradient is \[\nabla f(x,y,z)=\left \langle -5 cos(5 x - y^2 + 7 z), 2 y cos(5 x - y^2 + 7 z), -7 cos(5 x - y^2 + 7 z) \right \rangle \] Differentiating once to get the second order derivative; \[\frac {\partial\nabla f(x,y,z)}{\partial (x,y,z)}=\left \langle 25 sin(5 x - y^2 + 7 z), 4 y^2 sin(5 x - y^2 + 7 z), 49 sin(5 x - y^2 + 7 z) \right \rangle \]

Thus, the second directional derivative is: \[D^2_\hat{T}f(x,y,z)=\left \langle 25 sin(5 x - y^2 + 7 z), 4 y^2 sin(5 x - y^2 + 7 z), 49 sin(5 x - y^2 + 7 z) \right \rangle \cdot \left\langle \frac{-1}{\sqrt{2}},0,\frac{1}{\sqrt{2}} \right\rangle \] At point P, \[D^2_\hat{T}f(x,y,z)=\left \langle 25 sin(0), 4*5 sin(0), 49 sin(0) \right \rangle \cdot \left\langle \frac{-1}{\sqrt{2}},0,\frac{1}{\sqrt{2}} \right\rangle \] \[D^2_\hat{T}f(x,y,z)=\left \langle 0, 0, 0 \right \rangle \cdot \left\langle \frac{-1}{\sqrt{2}},0,\frac{1}{\sqrt{2}} \right\rangle \] \[D^2_\hat{T}f(x,y,z)=0\]

5.3: Calculate the fourth directional derivative of f in direction of decreasing x on C at P (1, √5, 0)

To determine the fourth directional derivative, no computations are needed. Continuously differentiating will not change the inner term of either the sine or cosine function; \(5 x - y^2 + 7 z\). At point P, this is 0. \(sin(0)=0\). Due to the periodic nature of the derivatives of sine and cosine functions, differentiating an even amount of times will always result in the gradients consisting of sin(0)*by some constant after differentiation. The initial function was a sine function, the first differentiation is then a cosine function, the second is a sine function, the third is a cosine, the fourth is a sine and the pattern continues. This alternation in differentiation means that the eventh directional derivative will always have a sine function in the 3 components of the gradient. The second directional derivative has the sine function in its gradient. For every even number directional derivative, the result will be: \[D^n_\hat{T}f(x,y,z)=\left \langle k_1 sin(0), k_2 sin(0), k_3 sin(0) \right \rangle \cdot \left\langle \frac{-1}{\sqrt{2}},0,\frac{1}{\sqrt{2}} \right\rangle \] \[D^n_\hat{T}f(x,y,z)=\left \langle 0, 0, 0 \right \rangle \cdot \left\langle \frac{-1}{\sqrt{2}},0,\frac{1}{\sqrt{2}} \right\rangle \] \[D^n_\hat{T}f(x,y,z)=0\]

Where n is an even number, \(k_1, k_2, k_3\) are the resultant factors after differentiation.

Therefore, \[D^4_\hat{T}f(x,y,z)=0\]

5.4: Calculate the 21st directional derivative of f in direction of decreasing x on C at P (1, √5, 0)

The 21st directional derivative is structurally identical to the first directional derivative due to the fact they are both odd. \[D^{21}_\hat{T}f(x,y,z)=\left \langle k_1 cos(0), k_2 cos(0), k_3 cos(0) \right \rangle \cdot \left\langle \frac{-1}{\sqrt{2}},0,\frac{1}{\sqrt{2}} \right\rangle \] Where \(k_1,k_2,k_3\) are factors as a result of differentiation. Due to the periodic nature of the sine and cosine function, further differentiation will change whether the function of each component is cosine or sine, and the factor will continuously be multiplied, as the input of the sinusoidal function will continuously be differentiated with respect to x, y and z. The factors for each component will be multiplied by itself a total of 21 times. Observing the first directional derivative:

\[D_\hat{T}f(x,y,z)=\left \langle -5cos(5 x - y^2 + 7 z), 2 √5cos(5 x - y^2 + 7 z), -7cos(5 x - y^2 + 7 z) \right \rangle \cdot \left\langle \frac{-1}{\sqrt{2}},0,\frac{1}{\sqrt{2}} \right\rangle \] \[D_\hat{T}f(x,y,z)= \frac{5cos(5 x - y^2 + 7 z)}{\sqrt{2}} - \frac{7cos(5 x - y^2 + 7 z)}{\sqrt{2}} \] \[D_\hat{T}f(x,y,z)= -\sqrt{2} cos(5 x - y^2 + 7 z)\]

Because 1 and 21 are both odd, \(\sqrt{2}\) will be continuously multiplied a total of 21 times, along with a \(-cos(5 x - y^2 + 7 z)\). The alternating differentiation of the sine and cosine functions means the sign will always alternate from positive to negative. Therefore, the 21st directional derivative can be expressed as; \[D^{21}_\hat{T}f(x,y,z)=-\sqrt{2}^{21} \cdot cos(5 x - y^2 + 7 z)\] At point P, \[D^{21}_\hat{T}f(1, √5, 0)=-(\sqrt{2})^{21} \cdot cos(0)\] \[D^{21}_\hat{T}f(1, √5, 0)=-1024\sqrt{2}\]