1 Model

Here we are going to simulate data from next model.

\[\begin{align*} y_i &\sim N(\mu_i, \sigma^2), \\ \mu_i &= -2 + 3 x_1 - 5 x_2 + 2 x_3, \\ \sigma &= 1, \\ x_j &\sim U(0, 1) \, \text{for} \, j=1, 2, 3. \end{align*}\]

The parameter vector is \(\boldsymbol{\theta}=(-2, 3, -5, 2, \sigma^2=9)^\top\).

To generate some data according the model we can use the next code.

nobs <- 1000
x <- matrix(runif(3 * nobs), nrow=nobs)
y <- matrix(-2 + 3 * x[, 1] - 5 * x[, 2] + 2 * x[, 3] + (rnorm(nobs, sd=3)))

We know the values for n and p in this example but that information could be obtained automatically as follows:

n <- dim(x)[1]
p <- dim(x)[2]

Estimating \(\beta_k\) and \(\sigma^2\) with tensorflow The objects X and Y are placeholders for input data (covariates \(x\)’s), output data (response variable).

library(tensorflow)
X <- tf$placeholder("float", shape=shape(NULL, p), name="x-data")
Y <- tf$placeholder("float", name="y-data")

Note that X depends on p.

Define the weights for each column in X. Since we will have \(p=3\) predictors in this example, our W matrix of coefficients will have p elements. We use a \(3 \times 1\) matrix here, rather than a vector, to ensure TensorFlow understands how to perform matrix multiplication on W and X. The object b corresponds to the intercept \(\beta_0\). Initially, the elements in W and b are zeros but we can initializate them in random values. The object log_sigma2 is unrestricted (\(\in \Re\)) and it will be used to obtain \(\sigma^2\) using the \(\log()\) function.

W <- tf$Variable(tf$zeros(list(p, 1L)))
b <- tf$Variable(tf$zeros(list(1L)))
log_sigma2 <- tf$Variable(tf$zeros(list(1L)), name="log_sigma2")

Remember that in multiple linear regression we have \(\hat{\mu}_i = \hat{\beta}_0 + \hat{\beta}_1 x_{1i} + \ldots + \hat{\beta}_p x_{pi}\), for this reason we use tf$matmul to obtain multiply X and W and then we use tf$add to obtain the sum with the intercept b.

mu_hat <- tf$add(tf$matmul(X, W), b)

The objective is to find the values of \(\beta\)’s and \(\sigma^2\) that maximizes the log-likelihood function.

\[L(\beta_0, \beta_1, \beta_2, \beta_3, \sigma^{2})= \frac{1}{(2\pi)^{n/2}\sigma^n}\exp \left( -\frac{1}{2\sigma^{2}} \sum_{i=1}^{i=n} \left( y_i - \beta_0 -\beta_1 x_{1i} -\beta_2 x_{2i} -\beta_3 x_{3i} \right)^{2} \right)\]

In this case we will work with the negative log-likelihood function as the loss function and we will find the values of the parameters that minimize it.

loss <- (n/2)*log_sigma2+(1/(2*exp(log_sigma2)))*tf$reduce_sum((mu_hat-Y)^2)

Next, we determine the algorithm and the learning rate we want to use for the optimization process.

generator <- tf$train$GradientDescentOptimizer(learning_rate=0.001)
optimizer <- generator$minimize(loss)

# Initialize the variables and open the session
init <- tf$global_variables_initializer()
session <- tf$Session()
session$run(init)
feed_dict <- dict(X=x, Y=y)
epsilon <- .Machine$double.eps
last_loss <- Inf
step <- 0 # para llevar la cuenta

while (TRUE) {
  session$run(optimizer, feed_dict=feed_dict)
  current_loss <- session$run(loss, feed_dict=feed_dict)
  if (last_loss - current_loss < epsilon) break
  last_loss <- current_loss
  step <- step + 1
  if (step %% 100 == 0 | step <=5)
    cat(step, "\t", session$run(b), session$run(W), session$run(log_sigma2), "\n")
}
## 1     -2.08581 -0.7751619 -1.440458 -0.8820306 7.254592 
## 2     -2.084729 -0.7744112 -1.44013 -0.8813359 6.759276 
## 3     -2.082958 -0.7731805 -1.439593 -0.880197 6.266959 
## 4     -2.080065 -0.7711701 -1.438719 -0.8783368 5.779519 
## 5     -2.075372 -0.7679053 -1.437303 -0.8753168 5.299941 
## 100   -1.477789 1.381267 -3.25326 0.7002281 2.159447 
## 200   -1.685129 2.356488 -4.475603 1.352511 2.095276 
## 300   -1.799888 2.754481 -4.92242 1.624814 2.085341 
## 400   -1.862068 2.913887 -5.073452 1.737222 2.0839 
## 500   -1.895947 2.97936 -5.121142 1.785133 2.083679

To obtain the estimated parameters we use:

c(session$run(b), session$run(W), exp(session$run(log_sigma2)))
## [1] -1.905689  2.994788 -5.129359  1.796793  8.033768

2 Comparing with lm

Now we are going to estimate the parameters via lm function to use them as a reference.

mod <- lm(y ~ x)
c(coef(mod), sigma2=summary(mod)$sigma^2)
## (Intercept)          x1          x2          x3      sigma2 
##   -1.937535    3.031232   -5.133608    1.826213    8.065842

The two estimated vectors using TF andl lm are close.