Here we are going to simulate data from the next model.
\[\begin{align*} y_i &\sim P(\lambda_i), \\ \lambda_i &= \exp(2 + 0.5 x_1), \\ x_1 &\sim U(0, 1). \end{align*}\]
The parameter vector is \(\boldsymbol{\theta}=(2, 0.5)^\top\).
To generate the data we can use the next code:
n <- 100
x_data <- runif(n=n, min=0, max=1)
lambda <- exp(2 + 0.5 * x_data)
y_data <- rpois(n=n, lambda = lambda)
plot(x=x_data, y=y_data, las=1)
Now we are going to load the tensorflow package, initialize the values for \(\beta_0\) and \(\beta_1\) with \(0.0\), define the placeholders for which it is not necessary to define initial values and define the model.
library(tensorflow)
beta0 <- tf$Variable(0.0, name="b0")
beta1 <- tf$Variable(0.0, name="b1")
X <- tf$placeholder(dtype=tf$float32, name = "x-data")
Y <- tf$placeholder(dtype=tf$float32, name = "y-data")
y_pred <- exp(beta0 + beta1 * X)
The last object Y is not the same as y_data, Y is a tensor and y_data is a vector with the response values.
The objective is to find the values of \(\beta_0\) and \(\beta_1\) that maximizes the log-likelihood function.
\[\ell (\beta_0, \beta_1) = \sum_{i=1}^n y_i(\beta_0 + \beta_1 x_i)-\exp(\beta_0 -\beta_1 x_i)-\log(y_{i}!)\]
As the interest is the estimation of the vector of regression parameters \(\boldsymbol{\beta}\) it is possible to remove the term \(\log(y_{i}!)\). On the other hand, we will work with the negative log-likelihood function as the loss function and we will find the values of the parameters that minimize it.
loss <- tf$reduce_sum(-Y*(tf$log(y_pred))+y_pred)
Now we need to define the optimization algorithm we want to use in order to find the values of the parameters. In this case we are going to use the gradient descent algorithm with a learning rate of \(0.001\).
optimizer <- tf$train$GradientDescentOptimizer(0.001)
train <- optimizer$minimize(loss)
# Launch the graph and initialize the variables.
sess = tf$Session()
sess$run(tf$global_variables_initializer())
#Create dictionary to feed the data to optimize the coefficients
fd <- dict(X = x_data, Y = y_data)
# Fit the line
n_iter <- 10000
for (step in 1:n_iter) {
sess$run(train, feed_dict=fd)
if (step %% 1000 == 0 | step <=5)
cat(step, "\t", sess$run(beta0), sess$run(beta1), "\n")
}
## 1 0.841 0.4648679
## 2 1.486941 0.8190682
## 3 1.741346 0.9359385
## 4 1.734222 0.8903039
## 5 1.758765 0.8675538
## 1000 1.979909 0.503171
## 2000 1.979909 0.503171
## 3000 1.979909 0.503171
## 4000 1.979909 0.503171
## 5000 1.979909 0.503171
## 6000 1.979909 0.503171
## 7000 1.979909 0.503171
## 8000 1.979909 0.503171
## 9000 1.979909 0.503171
## 10000 1.979909 0.503171
To obtain the estimated parameters \(\beta_0\) and \(\beta_1\) we can use:
c(sess$run(beta0), sess$run(beta1))
## [1] 1.979909 0.503171
glmmod <- glm(y_data ~ x_data, family=poisson())
coef(mod)
## (Intercept) x_data
## 1.9799092 0.5031706
The two estimated vectors using TF and glm are close.