C. Donovan
06 April 2018
If it's not in the lecture or lab, it's not in the exam
Everybody on the Left
I want build some software to predict someone's age of death
Everybody on the Right
I want build some software to decide whether I should give someone a loan
Questions
\[ \mathbf{y} = f(\mathbf{X}) + noise \] we want to usefully approximate \( f \)
This a very brief overview of NNs (although the multitude of minor details makes detailed views difficult). For further information:
Some contentions/comments to start:
The problem is similar to previous:
All very familiar - let's begin.
\[ \ln\left(\frac{\hat{p}}{(1-\hat{p})}\right) = \hat{\beta}_0 + \hat{\beta}_1z_1 + \hat{\beta}_2z_2 + \hat{\beta}_3z_3 \]
where
\[ \begin{align*} z_1 &= \tanh( \hat{\alpha}_4 + \hat{\alpha}_5x_1 + \hat{\alpha}_6x_2)\\ z_2 &= \tanh( \hat{\alpha}_7 + \hat{\alpha}_8x_1 + \hat{\alpha}_9x_2)\\ z_3 &= \tanh( \hat{\alpha}_{10} + \hat{\alpha}_{11}x_1 + \hat{\alpha}_{12}x_2) \end{align*} \]
The output will be a (fitted) probability-like thing \[ \hat{p} = \frac{1}{1+e^{-\theta}} \]
where \( \theta \) is a linear weighted sum of \( z_i \) terms, with fitted parameters (weights) \( \hat{\beta}_i \)
There is an additional fitted weight \( \hat{\beta}_0 \) that is an intercept or bias term
The \( z_i \) are formed by
Let's look at this graphically in R (example code is on Moodle)
For ease of understanding for non-mathematicians
Weights and biases: from a statistical perspective, these weights are simply parameters of a potentially non-linear function, and the biases are the intercept terms for the linear components.
Combination Functions: in our example equations above these are the linear combinations expressed in matrix form, they combine the input variables or the hidden nodes.
Activation functions: these are the functions wrapping the combination functions, and several variants are commonly used:
Identity Function - does not alter the value of the argument. The resulting range may be \( \in \mathcal{R} \).
Sigmoid Functions - \( S \)-shaped functions with the logistic or hyperbolic tangent functions being common. The resulting values will be bounded - \( (0,1) \) or \( (-1, 1) \) respectively. The logistic is given by: \[ \phi(\theta)=\frac{1}{1+e^{-\theta}} \] for some argument value \( \theta \).
\( \tanh \) - hyperbolic tangent gives real values within \( (-1,1) \)
Others: Gaussian functions (bell-shaped); functions bounded below by zero but unbounded above, e.g. Exponential and Reciprocal Functions.
Google has a great set of tools called tensorFlow, which you can also call from Python or R.
Lets look at some building blocks in R
Start with arbitrary weights and biases. Define an error function. Search for update values that reduce the error. Iterate until convergence (hopefully).
This is numerical optimisation
we'll return to this!