C. Donovan
20 August 2019
Image processing - the poster boy of ML:
https://www.dropbox.com/s/bmmrhtyn88hnnar/CNN_classifier.mp4?dl=0
https://www.youtube.com/watch?v=qrzQ_AB1DZk
Predictive models on satellite data:
The term was coined in the 50s - naively/tellingly: Machine + Learning
Basically algorithms informed by data. This sounds familiar…
Why not ask Wikipedia? (Dr Paxton - are you here?):
Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. … Machine learning algorithms build a mathematical model based on sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task
Sounds very familiar…
There are lots of these - great fun
Machine learning is frequently statistical modelling, with some disregard for inference, focussed on predictive performance.
\[ y = f(x_1, ..., x_p; \theta_1, ..., \theta_k) + e \]
\( y \) is the target (response), \( x \) are the features (covariates), \( f \) signal with parameters \( \theta \), \( e \) noise
In stats:
This is the core of what we're doing, by whatever means necessary to get a good estimate
Usually a supervised problem (we have observations of both \( y \) and a set of \( x \) to estimate \( f \))
This all sounds very familar…
This all sounds very familiar. Model selection is widespread over statistics, but we might use AIC rather than a hold-out approach (a staple of ML).
[See this summary document as an example: https://st4.ning.com/topology/rest/1.0/file/get/2808312886?profile=original]
Building complex models from simple blocks
In short:
In short:
\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1z_1 + \hat{\beta}_2z_2 + \hat{\beta}_3z_3 \]
where
\[ \begin{align*} z_1 &= \tanh( \hat{\alpha}_4 + \hat{\alpha}_5x_1 + \hat{\alpha}_6x_2)\\ z_2 &= \tanh( \hat{\alpha}_7 + \hat{\alpha}_8x_1 + \hat{\alpha}_9x_2)\\ z_3 &= \tanh( \hat{\alpha}_{10} + \hat{\alpha}_{11}x_1 + \hat{\alpha}_{12}x_2) \end{align*} \]
Estimate parameters as you'd expect
If you want to get to grips with NNs as a newbie - this is highly recommended:
https://playground.tensorflow.org/
I'll now play with it….
Recent years have seen a NN renaissance, largely in image processing:
There has been a recent upsurge in NN popularity. Mainly due to great performance in image classification problems. The NNs in question can be massively complicated:
Hence the term deep-learning - very deep/complex NNs
The distinction between shallow and deep learning is a bit vague - but the extremes are clearly different
Consider a simple handwriting problem (e.g. MNIST data - recognise numbers):
Josef Steppan - https://commons.wikimedia.org/w/index.php?curid=64810040
This is clearly naive - the pixels are arranged spatially! Which is not reflected in the architecture.
Basic idea is simple:
CNNs have many features we're familiar with from other NNs, but also:
Typically we're talking images - if colour, think 3 values per pixel (RGB).
See Krizhevsky et al (2012) - they describe their CNN for the imagenet problem:
See Krizhevsky et al (2012) - they describe their CNN for the imagenet problem:
Tricky! Later CNNs made this look small (see previous)
Transfer learning is very popular now.
ML is far from magical - can get poor performance for all the usual reasons models don't work
Examples
“When your only tool is a hammer, every problem looks like a nail”
Corollary (pers. comm. D. O'reilly 2019, MSc student)
“When your only tool is a hammer, it behooves you to treat each problem as a nail”
The excitement around NNs/RNNs/LSTM/CNNs seems to be creating an odd shift:
There is actually (expensive) hardware specifically for a class of models!:
https://www.nvidia.com/en-us/deep-learning-ai/
https://www.nvidia.com/en-us/data-center/dgx-2/
They're great for images, but often pretty poor elsewhere, but progressively used everywhere.
I've talked about ML models - this is usually a small part to make something useful:
You might consider the ML “approach” (and/or somewhat non-statistical models) if: