Sára Mód & Jasper Ginn
26/09/2018
Appendices
Presentation: http://rpubs.com/jhginn/mvsuu
Execute the following in a terminal:
docker pull jhginn/multivariate_statistics_uu
Then:
docker run -e PASSWORD=stats -p 8787:8787 jhginn/multivariate_statistics_uu
Go to http://localhost:8787
Visit: http://maraudingmeerkat.nl/practical/yourfirstname/
\[ Total\:error = Bias + Variance + Var(\epsilon) \]
Log-likelihood function is what we try to optimize over successive iterations of the algorithm
\[ \mathcal{L}(\hat{y}, y) = y \log(\hat{y}) + (1-y) \log(1-\hat{y}) \\ \]
We try to minimize the log-likelihood through successive iterations.
Much like we do with Newton-Rhapson
\[ y = \begin{bmatrix} y_0 \\ \vdots \\ y_m \end{bmatrix}, \: y_i \in \{1, 0\} \\ X = \begin{bmatrix} x_1 & x_2 & \dots & x_n \end{bmatrix}, \: \dim(X) = (m, n) \]
\[ w = \begin{bmatrix} w_1 \\ \vdots \\ w_n \end{bmatrix}, \: b \in \mathbb{R}, \: \hat{y} = \sigma(wX^{T} + b) \\ \mathcal{L}(\hat{y}, y) = y \log(\hat{y}) + (1-y) \log(1-\hat{y}) \\ \mathcal{J}(w, b) = -\frac{1}{m} \sum_{i=1}^m \mathcal{L}(\hat{y}, y) + \frac{\lambda}{2m} ||w||^2_2 \\ ||w||^2_2 = w^{T} \cdot w, \: \lambda \in \mathbb{R} \]
## Set parameters w and b to 0
w = matrix(0L, ncol=ncol(X))
b = 0
## Update parameters
for i in max_iterations:
## Linear combination & sigmoid function
model = sigmoid(w %*% t(X) + b)
## Compute cost
cost = -(1/m) * sum(y*log(yhat) + (1-y) * log(1-yhat)) + (lambda/2m * norm(w))
## Compute gradients
dw = (1/m) * (t(X) %*% matrix(A-Y, ncol=1)) + t((lambda / m) * w)
db = (1/m) * sum(A-Y)
## Update parameters
w = w - t(learning_rate * dw)
b = as.vector(b - t(learning_rate * db))
mnlr R package used for multinomial logistic regression