Fitting Sigmoid Model with Gradient Descent

The growth pattern when a value starts from a slow increase; then increases with a growing acceleration; and then declines with a negative acceleration until it reaches a zero growth rate is called S-shaped or sigmoid growth pattern. Those processes could be described by the sigmoid function and its derivative has the bell shape.

I came across with the issue to find a parametric model for the sigmoid shaped process. The first idea that came to my mind was to use the logistic regression function but for the logistic regression model you need to know your absolute minimum and maximum in advance in order to normalize your \(\hat{y}\) to [0,1] range… like when you know that probabilities are always between 0 and 1.

That is why I decided to fit another model - the algebraic sigmoid function: \[f(x) = \frac{x}{\sqrt{1+x^2}} \]

After adding additional coefficients to make it adjustable to real world situations I came up with this this model:

\[ S(x) = \frac{(a+bx)c}{\sqrt{1+(a+bx)^2}} + d\]

The cost function for fitting the above curve is asf: \[ \theta{(a,b,c)} = \sum_{i=1}^{m}[y_{i} - (\frac{(a+bx)c}{\sqrt{1+(a+bx)^2}} + d) ]^{2} \] The above cost function is a convex one, so we can use a simple gradient descent to find its global minimum. Let’s take the partial derivatives.

\[\frac{~d\theta}{~da}=\sum_{i=1}^{m} \frac{-2c( (y - d)\sqrt{(a+bx)^2 +1 }-c(a+bx))}{ ((a+bx)^2 + 1)^2}\]

\[\frac{~d\theta}{~db}=\sum_{i=1}^{m} \frac{-2cx( (y - d)\sqrt{(a+bx)^2 +1 }-c(a+bx))}{ ((a+bx)^2 + 1)^2} \]

\[\frac{~d\theta}{~dc}=\sum_{i=1}^{m} \frac{2((bx+a)c- (y-d)\sqrt{(bx+a)^2+1})}{(bx+a)+1}\]

\[ \frac{~d\theta}{~dd}=\sum_{i=1}^{m} -2(-d+y -\frac{c(bx+a)}{\sqrt{(bx+a)^2 + 1}} )\] Now we can use gradient descesnt to minimize the cost function. Below is an example for R

# Simulating data 
x = seq(-1, 1, 0.01)
y = 100/(1+exp(-(1 + 3*x))) + rnorm(length(x), 0, 10)


#  Defining cost function
cost_f <- function(a, b, c, d){
  sum((   y  - (a+b*x)*c/sqrt(1+(a+b*x)^2) - d)^2)
}

# Defining gradient functions for the equetion coefficients 

gradient_a <- function(a){
  sum(   -2*c*( (y - d)*sqrt((a+b*x)^2 +1 )-c*(a+b*x)) /   ((a+b*x)^2 + 1)^2  )
}

gradient_b <- function(b){
  sum(   -2*c*x*( (y - d)*sqrt((a+b*x)^2 +1 )-c*(a+b*x)) /   ((a+b*x)^2 + 1)^2  )
}

gradient_c <- function(c){
  sum(  (2*((b*x+a)*c- sqrt((b*x+a)^2+1)*(y-d)))/((b*x+a)+1)   )
}

gradient_d <- function(d){
  sum( -2*(-d+y -(c*(b*x+a))/sqrt((b*x+a)^2 + 1)    ) )
}


# Setting initial values for coefficients:
a = 0
b = 0
c = 0
d = 0

# Setting learning rate:
alpha = 0.0000001


for( i in 1:1000000){
  new_a = a - alpha*gradient_a(a)
  new_b = b - alpha*gradient_b(b)
  new_c = c - alpha*gradient_c(c)
  new_d = d - alpha*gradient_d(d)
  a = new_a
  b = new_b
  c = new_c
  d = new_d
}


# Defining sigmoid function:
sigmoid_function <- function(x){
 (a+b*x)*c/sqrt(1+(a+b*x)^2) + d
}

# Adding theoretical values to the scatter plot:
plot(x,y); lines(x, mapply(sigmoid_function,x),col = "blue")

Fitting Sigmoid Model with Gradient Descent

Roman Vodonenko

January 26, 2017