C. Donovan
20 April 2018
NB: If it's not in the lecture or lab, it's not in the exam
Wrap up
Have clusters - now what?
In keeping with the two over-arching uses of statistical models:
Prediction is conceptually easy
Description is relatively hard
(Covered if time) We can use dimension reduction to explore/help.
The type of response implies a type of output function (and loss function)
For multi-class responses, likely go for softmax:
\[ y_k = \frac{e^{t_k}}{\sum_{l=1}^K e^{t_l}} \]
Note, sums to 1 - so like a probability distribution over classes.
Recent years have seen a NN renaissance, largely in image processing:
There has been a recent upsurge in NN popularity. Mainly due to great performance in image classification problems. The NNs in question can be massively complicated:
Hence the term deep-learning - very deep/complex NNs
The distinction between shallow and deep learning is a bit vague - but the extremes are clearly different
These are typically used in the context of image processing. We have looked at a simple/naive approach already.
Recall MNIST numbers NN:
This is clearly naive - the pixels are arranged spatially! Which is not reflected in the architecture.
Basic idea is simple:
CNNs have many features we're familiar with from other NNs, but also:
Typically we're talking images - if colour, think 3 values per pixel (RGB).
Read Krizhevsky et al (2012) - they describe their CNN for the imagenet problem:
Read Krizhevsky et al (2012) - they describe their CNN for the imagenet problem:
Tricky! Later CNNs made this look small (see previous)
Further reading/resources - there is a massive amount of good stuff out there - selection follows: