C. Donovan
09 April 2018
NB: If it's not in the lecture or lab, it's not in the exam
We need to have a common language to discuss these
\[ \hat{y} = \hat{\beta}_0 + \hat{\alpha}_1z_{2,1} + \hat{\alpha}_2z_{2,2} + \hat{\alpha}_3z_{2,3} \]
where
\[ \begin{align*} z_{21} &= \tanh( \hat{\alpha}_4 + \hat{\alpha}_5z_{1,1} + \hat{\alpha}_6z_{1,2})\\ z_{22} &= \tanh( \hat{\alpha}_7 + \hat{\alpha}_8z_{1,1} + \hat{\alpha}_9z_{1,2})\\ z_{23} &= \tanh( \hat{\alpha}_{10} + \hat{\alpha}_{11}z_{1,1} + \hat{\alpha}_{12}z_{1,2}) \end{align*} \]
and
\[ \begin{align*} z_{1,1} &= \tanh( \hat{\alpha}_{13} + \hat{\alpha}_{14}x_1 + \hat{\alpha}_{15}x_2)\\ z_{1,2} &= \tanh( \hat{\alpha}_{16} + \hat{\alpha}_{17}x_1 + \hat{\alpha}_{18}x_2)\\ \end{align*} \]
Google has a great set of tools called tensorFlow, which you can also call from Python or R.
Lets look again at some building blocks in R
Assume we have an architecture. We have data and lots of parameters - what value should the parameters take?
Start with arbitrary weights and biases. Define an error function. Search for update values that reduce the error. Iterate until convergence (hopefully).
This is numerical optimisation
we have seens such things before (although not known it)
Broadly fitting is similar to other models we have seen:
We need sensible ways to do this search through the parameter space
Many models are well-behaved
Nasty ones (like NNs)
Consider a simple set of numeric \( x \) and \( y \) data. We want \( y=f(x)=\beta_0 + \beta_1 x \), but how do we determine the \( \beta \)?
[drawing ensues]
We can imagine ways to do this without much effort:
There are a lot of optimisation approaches for NNs
We look at one (the classic) - the back-propagation (BP) algorithm
Simple in principle:
It's a search over multiple dimensions (dictated by number of parameters/weights).
Simple in principle:
This is a gradient search, iterating over multiple dimensions (dictated by number of parameters/weights).
Refer H, T & F sections 11.3 & 11.4. Simplified version follows.
Refer H, T & F sections 11.3 & 11.4. Simplified version follows.
Refer H, T & F sections 11.3 & 11.4. Simplified version follows.
Note:
You should note:
Source: Bernacki & Wlodarczyk
Source: Bernacki & Wlodarczyk
Source: Bernacki & Wlodarczyk
You do not have to do this by hand for assessment in this course
Coming up: