1 Model

Here we are going to simulate data from the next model.

\[\begin{align*} Y_i &\sim Bernoulli(p_i), \\ logit(p_i) &= -4 + 2 x_1 , \\ x_1 &\sim U(1, 5). \end{align*}\]

The parameter vector is \(\boldsymbol{\theta}=(-4, 2)^\top\).

To generate the data we can use the next code:

n <- 100
x_data <- runif(n=n, min=1, max=5)
p <- exp(-4+2*x_data)/(1+exp(-4+2*x_data))
y_data <- rbinom(n=n, size=1, prob=p)
plot(x=x_data, y=y_data, las=1)

Now we are going to load the tensorflow package, initialize the values for \(\beta_0\) and \(\beta_1\) with \(0.0\) and define the placeholders for which it is not necessary to define initial values.

library(tensorflow)

beta0 <- tf$Variable(0.0, name="b0")
beta1 <- tf$Variable(0.0, name="b1")
X <- tf$placeholder(dtype=tf$float32, name = "x-data")
Y <- tf$placeholder(dtype=tf$float32, name = "y-data")


The objective is to find the values of \(\beta_0\) and \(\beta_1\) that maximizes the log-likelihood function.

\[\ell \left(\beta_0, \beta_1\right) = \sum_{i=1}^n y_i(\beta_0 + \beta_1 x_i)-\log(1+\exp(\beta_0 -\beta_1 x_i))\]
In this case we will work with the negative log-likelihood function as the loss function and we will find the values of the parameters that minimize it.

loss <- tf$reduce_sum(-Y*(beta0+beta1*X)+tf$log(1+tf$exp(beta0+beta1*X)))

Now we need to define the optimization algorithm we want to use in order to find the values of the parameters. In this case we are going to use the gradient descent algorithm with a learning rate of \(0.001\).

optimizer <- tf$train$GradientDescentOptimizer(0.001)
train <- optimizer$minimize(loss)

# Launch the graph and initialize the variables.
sess = tf$Session()
sess$run(tf$global_variables_initializer())

#Create dictionary to feed the data to optimize the coefficients
fd <- dict(X = x_data, Y = y_data)

# Fit the line
n_iter <- 10000
for (step in 1:n_iter) {
  sess$run(train, feed_dict=fd)
  if (step %% 1000 == 0 | step <=5)
    cat(step, "\t", sess$run(beta0), sess$run(beta1), "\n")
}
## 1     0.018 0.09033136 
## 2     0.02890658 0.1566386 
## 3     0.03488583 0.2063628 
## 4     0.03742615 0.2446092 
## 5     0.03750854 0.2747135 
## 1000      -3.667967 1.740756 
## 2000      -4.743862 2.172863 
## 3000      -5.233989 2.373509 
## 4000      -5.489808 2.478984 
## 5000      -5.631336 2.537535 
## 6000      -5.711951 2.570945 
## 7000      -5.758598 2.590296 
## 8000      -5.785826 2.601598 
## 9000      -5.801799 2.60823 
## 10000     -5.811193 2.612132

To obtain the estimated parameters \(\beta_0\) and \(\beta_1\) we can use:

c(sess$run(beta0), sess$run(beta1))
## [1] -5.811193  2.612132


2 Comparing with glm

mod <- glm(y_data ~ x_data, family=binomial())
coef(mod)
## (Intercept)      x_data 
##   -5.824786    2.617778

The two estimated vectors using TF and glm are close.