1 Model

Here we are going to simulate data from the next model.

\[\begin{align*} y_i &\sim N(\mu_i, \sigma^2), \\ \mu_i &= -2 + 3 x_1, \\ \sigma &= 2.5, \\ x_1 &\sim U(-2, 2). \end{align*}\]

The parameter vector is \(\boldsymbol{\theta}=(-2, 3, \sigma^2=2.5^2)^\top\).

To generate the data we can use the next code:

n <- 100
x_data <- runif(n=n, min=-2, max=2)
mu <- -2 + 3 * x_data
y_data <- rnorm(n=n, mean=mu, sd=2.5)
plot(x=x_data, y=y_data, las=1)

2 Estimating \(\beta_0\) and \(\beta_1\) with tensorflow

The first step is to load the tensorflow package.

library(tensorflow)

Now we are going to initialize the values for \(\beta_0\) and \(\beta_1\) using random values from \(U(-10, 10)\) and define the model.

beta0 <- tf$Variable(tf$random_uniform(shape(1L), -10, 10), name="b0")
beta1 <- tf$Variable(tf$random_uniform(shape(1L), -10, 10), name="b1")
y <- beta0 + beta1 * x_data

The last object y is not the same as y_data, y is a tensor and y_data is a vector with the response values.

In the simple linear regression model the objective is to obtain values for \(\beta_0\) and \(\beta_1\) to minimize the Sum Squared Errors:

\[SSE=\sum_{i=1}^{i=n} \left( y_i-(\hat{\beta}_0 - \hat{\beta}_1 x_i) \right)^2\]

As \(SSE\) is the function we want to minimize, we define it as the loss function as follows:

loss <- tf$reduce_sum((y - y_data) ^ 2)

Note that y and y_data are different, the former has the data \(y\) and the later has the estimated values \(\hat{y}\).

Now we need to define the optimization algorithm we want to use in order to find the values of the parameters. In this case we are going to use the gradient descent algorithm with a learning rate of \(0.001\).

optimizer <- tf$train$GradientDescentOptimizer(learning_rate=0.001)
train <- optimizer$minimize(loss)

# Initialize the variables and open the session
init <- tf$global_variables_initializer()
sess <- tf$Session()
sess$run(init)

# Iterative process to obtain the parameters
n_iter <- 10000
for (step in 1:n_iter) {
  sess$run(train)
  if (step %% 1000 == 0 | step <=5)
    cat(step, "\t", sess$run(beta0), sess$run(beta1), "\n")
}
## 1     -6.987316 1.261725 
## 2     -5.915263 1.909564 
## 3     -5.084353 2.350492 
## 4     -4.437819 2.646415 
## 5     -3.932804 2.841353 
## 1000      -1.989917 3.020473 
## 2000      -1.989917 3.020473 
## 3000      -1.989917 3.020473 
## 4000      -1.989917 3.020473 
## 5000      -1.989917 3.020473 
## 6000      -1.989917 3.020473 
## 7000      -1.989917 3.020473 
## 8000      -1.989917 3.020473 
## 9000      -1.989917 3.020473 
## 10000     -1.989917 3.020473

To obtain the estimated parameters \(\beta_0\) and \(\beta_1\) we can use:

c(sess$run(beta0), sess$run(beta1))
## [1] -1.989917  3.020473

3 Comparing with lm

Now we are going to estimate the parameters via lm function to use them as a reference.

mod <- lm(y_data ~ x_data)
coef(mod)
## (Intercept)      x_data 
##   -1.989916    3.020473

The two estimated vectors using TF andl lm are close.

4 Estimating \(\beta_0\), \(\beta_1\) and \(\sigma\) with tensorflow

First we need to initialize the values for \(\beta_0\), \(\beta_1\) and \(\sigma^{2}\), using random values from \(U(-10, 10)\) for the first two parameters and random values from \(U(0, 10)\) for \(\sigma^{2}\).

beta0 <- tf$Variable(tf$random_uniform(shape(1L), -10, 10), name="b0")
beta1 <- tf$Variable(tf$random_uniform(shape(1L), -10, 10), name="b1")
sigma2 <- tf$Variable(tf$random_uniform(shape(1L), 0, 10), name="sigma2")
y_pred <- beta0 + beta1 * x_data

The objective is to find the values of \(\beta_0\), \(\beta_1\) and \(\sigma^2\) that maximizes the log-likelihood function.

\[L(\beta_0, \beta_1, \sigma^{2})= \frac{1}{(2\pi)^{n/2}\sigma^n}\exp \left( -\frac{1}{2\sigma^{2}} \sum_{i=1}^{i=n} \left( y_i - \beta_0 -\beta_1 x_i\right)^{2} \right)\]

In this case we will work with the negative log-likelihood function as the loss function and we will find the values of the parameters that minimize it.

loss <- (n/2)*log(sigma2)+(1/(2*sigma2))*tf$reduce_sum((y_data-y_pred)^2)

Next, we determine the algorithm and the learning rate we want to use for the optimization process.

optimizer <- tf$train$GradientDescentOptimizer(learning_rate=0.001)
train <- optimizer$minimize(loss)

# Initialize the variables and open the session
init <- tf$global_variables_initializer()
sess <- tf$Session()
sess$run(init)

# Iterative process to obtain the parameters
n_iter <- 10000
for (step in 1:n_iter) {
  sess$run(train)
  if (step %% 1000 == 0 | step <=5)
    cat(step, "\t", sess$run(beta0), sess$run(beta1), sess$run(sigma2), "\n")
}
## 1     3.928763 0.504822 2.490246 
## 2     3.711931 0.5826174 2.814499 
## 3     3.527214 0.6495693 3.049913 
## 4     3.362358 0.7098467 3.237629 
## 5     3.211768 0.7653424 3.39448 
## 1000      -1.989912 3.020467 5.771633 
## 2000      -1.989912 3.020467 5.850492 
## 3000      -1.989912 3.020467 5.868387 
## 4000      -1.989912 3.020467 5.872546 
## 5000      -1.989912 3.020467 5.873504 
## 6000      -1.989912 3.020467 5.873661 
## 7000      -1.989912 3.020467 5.873661 
## 8000      -1.989912 3.020467 5.873661 
## 9000      -1.989912 3.020467 5.873661 
## 10000     -1.989912 3.020467 5.873661

To obtain the estimated parameters \(\beta_0\), \(\beta_1\) and \(\sigma^2\) we use:

c(sess$run(beta0), sess$run(beta1), sess$run(sigma2))
## [1] -1.989912  3.020467  5.873661

5 Comparing with lm

Now we are going to estimate the parameters via lm function to use them as a reference.

c(coef(mod), sigma2=summary(mod)$sigma^2)
## (Intercept)      x_data      sigma2 
##   -1.989916    3.020473    5.993700

The two estimated vectors using TF andl lm are close.