Here we are going to simulate data from next model.
\[\begin{align*} y_i &\sim N(\mu_i, \sigma^2), \\ \mu_i &= -2 + 3 x_1 - 5 x_2 + 2 x_3, \\ \sigma &= 1, \\ x_j &\sim U(0, 1) \, \text{for} \, j=1, 2, 3. \end{align*}\]
The parameter vector is \(\boldsymbol{\theta}=(-2, 3, -5, 2, \sigma^2=9)^\top\).
To generate some data according the model we can use the next code.
nobs <- 1000
x <- matrix(runif(3 * nobs), nrow=nobs)
y <- matrix(-2 + 3 * x[, 1] - 5 * x[, 2] + 2 * x[, 3] + (rnorm(nobs, sd=3)))
We know the values for n and p in this example but that information could be obtained automatically as follows:
n <- dim(x)[1]
p <- dim(x)[2]
Estimating \(\beta_k\) and \(\sigma^2\) with tensorflow The objects X and Y are placeholders for input data (covariates \(x\)’s), output data (response variable).
library(tensorflow)
X <- tf$placeholder("float", shape=shape(NULL, p), name="x-data")
Y <- tf$placeholder("float", name="y-data")
Note that X depends on p.
Define the weights for each column in X. Since we will have \(p=3\) predictors in this example, our W matrix of coefficients will have p elements. We use a \(3 \times 1\) matrix here, rather than a vector, to ensure TensorFlow understands how to perform matrix multiplication on W and X. The object b corresponds to the intercept \(\beta_0\). Initially, the elements in W and b are zeros but we can initializate them in random values. The object log_sigma2 is unrestricted (\(\in \Re\)) and it will be used to obtain \(\sigma^2\) using the \(\log()\) function.
W <- tf$Variable(tf$zeros(list(p, 1L)))
b <- tf$Variable(tf$zeros(list(1L)))
log_sigma2 <- tf$Variable(tf$zeros(list(1L)), name="log_sigma2")
Remember that in multiple linear regression we have \(\hat{\mu}_i = \hat{\beta}_0 + \hat{\beta}_1 x_{1i} + \ldots + \hat{\beta}_p x_{pi}\), for this reason we use tf$matmul to obtain multiply X and W and then we use tf$add to obtain the sum with the intercept b.
mu_hat <- tf$add(tf$matmul(X, W), b)
The objective is to find the values of \(\beta\)’s and \(\sigma^2\) that maximizes the log-likelihood function.
\[L(\beta_0, \beta_1, \beta_2, \beta_3, \sigma^{2})= \frac{1}{(2\pi)^{n/2}\sigma^n}\exp \left( -\frac{1}{2\sigma^{2}} \sum_{i=1}^{i=n} \left( y_i - \beta_0 -\beta_1 x_{1i} -\beta_2 x_{2i} -\beta_3 x_{3i} \right)^{2} \right)\]
In this case we will work with the negative log-likelihood function as the loss function and we will find the values of the parameters that minimize it.
loss <- (n/2)*log_sigma2+(1/(2*exp(log_sigma2)))*tf$reduce_sum((mu_hat-Y)^2)
Next, we determine the algorithm and the learning rate we want to use for the optimization process.
generator <- tf$train$GradientDescentOptimizer(learning_rate=0.001)
optimizer <- generator$minimize(loss)
# Initialize the variables and open the session
init <- tf$global_variables_initializer()
session <- tf$Session()
session$run(init)
feed_dict <- dict(X=x, Y=y)
epsilon <- .Machine$double.eps
last_loss <- Inf
step <- 0 # para llevar la cuenta
while (TRUE) {
session$run(optimizer, feed_dict=feed_dict)
current_loss <- session$run(loss, feed_dict=feed_dict)
if (last_loss - current_loss < epsilon) break
last_loss <- current_loss
step <- step + 1
if (step %% 100 == 0 | step <=5)
cat(step, "\t", session$run(b), session$run(W), session$run(log_sigma2), "\n")
}
## 1 -2.08581 -0.7751619 -1.440458 -0.8820306 7.254592
## 2 -2.084729 -0.7744112 -1.44013 -0.8813359 6.759276
## 3 -2.082958 -0.7731805 -1.439593 -0.880197 6.266959
## 4 -2.080065 -0.7711701 -1.438719 -0.8783368 5.779519
## 5 -2.075372 -0.7679053 -1.437303 -0.8753168 5.299941
## 100 -1.477789 1.381267 -3.25326 0.7002281 2.159447
## 200 -1.685129 2.356488 -4.475603 1.352511 2.095276
## 300 -1.799888 2.754481 -4.92242 1.624814 2.085341
## 400 -1.862068 2.913887 -5.073452 1.737222 2.0839
## 500 -1.895947 2.97936 -5.121142 1.785133 2.083679
To obtain the estimated parameters we use:
c(session$run(b), session$run(W), exp(session$run(log_sigma2)))
## [1] -1.905689 2.994788 -5.129359 1.796793 8.033768
lmNow we are going to estimate the parameters via lm function to use them as a reference.
mod <- lm(y ~ x)
c(coef(mod), sigma2=summary(mod)$sigma^2)
## (Intercept) x1 x2 x3 sigma2
## -1.937535 3.031232 -5.133608 1.826213 8.065842
The two estimated vectors using TF andl lm are close.