class: center, middle, inverse, title-slide .title[ # Lab/Lecture .mono[00X] ] .subtitle[ ## Deep Learning and You ] .author[ ### Connor Lennon ] .date[ ### 17 March 2023 ] --- exclude: true --- class: inverse, middle # Deep Learning and Neural Nets --- # Hype Machine ## ChatGPT/GPT1-4 <img src="images/gpt3v4.png" width="70%" style="display: block; margin: auto;" /> --- #Hype Machine ## Stable Diffusion .center[ <img src="images/stabdiff1.jpeg" width="15%" style="display: block; margin: auto;" /> ] -- <img src="images/brainscN.jpg" width="60%" style="display: block; margin: auto;" /> --- # Why so exciting? - Aside from having a name that sounds like it came straight out of Neuromancer, why is everyone so excited about Neural Networks? -- - Imagine trying to map a complex pattern to some outcome. Maybe you're trying to recognize whether an image is a dog, or a blueberry muffin. -- <img src="images/dogmuff.jpeg" width="30%" style="display: block; margin: auto;" /> --- # Why so exciting? <img src="images/dogmuff.jpeg" width="30%" style="display: block; margin: auto;" /> - How do we find the dog/muffin functional form? -- What kind of expert dog-ology field can we turn to in order to solve our problem? -- - However, .hi-orange[you] can recognize which is which, right? -- Have you taken a course in dog-ology? -- .hi[Probably not.] --- # Why so exciting? You've seen dogs, you've seen muffins, and somehow your brain has found an unknown way to tell the difference. What about letters/numbers? -- <img src="images/mnist_gen.png" width="50%" style="display: block; margin: auto;" /> -- That's what neural networks are trying to imitate - that process of taking one set of sensory inputs, and through repetition, finding patterns that map to some .hi[latent] set of labels. -- But, in order to understand neural networks, we should do a little foundational history on them. --- name: building-block # The building block: perceptron ## Where did the first 'neural network' come from - Neural networks (in their most basic form) are actually one of the oldest (if not THE oldest) machine learning tool still in use today. -- - Invented by McCulloch and Pitts in 1943.orange[*], the perceptron (the most fundamental element of a neural network) conceptually predate Alan Turing's machine. .footnote[ .orange[*] McCulloch, W. S. and Pitts, W. 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5:115–133.. ] -- You might ask yourself, why would anyone bother with designing an algorithm that takes hours of compute time on a high-end machine .hi[TODAY] when at the time, they lacked even a rudimentary computer? -- The intent was to build a model of a single neuron. --- name: admin # The building block: perceptron In order to understand how the intuition for how a perceptron works mathematically, having a basic understanding of the .hi[biological mechanics] of a neuron is useful. -- <img src="images/neuron.png" width="60%" style="display: block; margin: auto;" /> .footnote[ .orange[*] Source: wikipedia. No, they didn't endorse my usage of this diagram. ] --- name: admin # The building block: perceptron In order to understand how the intuition for how a perceptron works mathematically, having a basic understanding of the .hi[biological mechanics] of a neuron is useful. <img src="images/neuron_diagram.png" width="60%" style="display: block; margin: auto;" /> .footnote[ .orange[*] Source: wikipedia. No, they didn't endorse my usage of this diagram. .blue[*] Moar JPEG ] --- name: admin # The building block: perceptron Ok, so how do we model this process using only the mathematical tools we have available? -- - We need a function that takes a .hi[consistent] set of inputs and maps them efficiently to some (potentially unknown) outputs. -- - Let's start by simplifying the problem, and only modeling 'activated/not activated' once some .hi-orange[threshold] of charge is reached. -- --- # Perceptron, formally .hi[Rules] 1. `$$output = y \in \{0,1\}$$` 2. $$inputs = x_n \in \{0,1\},\ and \ |X| = N $$ 3. $$Threshold\ value = \Theta $$ 4. `$$\sigma(X) = 1 \ if \sum_{k=1}^N x_k > \Theta, \ else = 0$$` --- # Perceptron, formally .hi-orange[Put another way:] -- We're going to: -- - take Xs (inputs), multiply each `\(x_i\)` by a weight `\(w_i\)`.hi-orange[*], and then scale that value from 0-1, using a step function where `\(\sum_i w_i = \Theta\)` .footnote[ .orange[*] This was a slight modification by Frank Rosenblatt who built the first computer-based perceptron ] -- <img src="images/arti_neuron.png" width="60%" style="display: block; margin: auto;" /> -- Now, we're going to make a .hi[minor] change --- # Perceptron, formally <img src="images/adaline.png" width="60%" style="display: block; margin: auto;" /> -- Two things to notice: -- - we now have an activation function that replaces `\(\Theta\)`. -- - .hi[T?] Why is that there? That's an in-built assumption that learning takes .hi[time]. It also is critical in how we go about solving this thing. --- # Perceptron, how do they work? - You might have wondered: how in the world do we solve this thing? -- - The answer: -- .hi[we guess.] We start with some random set of weights. -- - Then we update, using information from the size of our errors to infer how well our weights are informing those guesses. Eventually, we'll get something meaningful -- - This is why the change from the original step-wise function is so important: -- .hi[The linear activation function] buys us something that the simple step-wise function could not: -- .hi-orange[differentiability.] -- - This means we can identify some cost function and attempt to identify a method to minimize that cost function - using the derivative `\(\frac{\partial{L}}{\partial{w^t_i}}\)` to update the weights from `\(w^t_i\)` to `\(w^{t+1}_i\)`. --- # Perceptron, how do they work? Lucky for us, we can define `\(\frac{1}{2}*SSE\)` as our loss function, and then `\(\frac{\partial{L}}{\partial{w^t_i}}\)` is fairly easy. for each training sample... in each iteration -- **1.)** `\(z = w^t_0 + w^t_1x_1 \dots w^t_nx_n = (W^t)^Tx.\)` `\(w^t_0\ \text{is a scalar (bias). You will sometimes see this as}\ b^t\)` -- **2.)** `\(\phi(z^{(i)}) = \widehat{y}^{(i)}\)` -- **3.)** `\(\eta = \text{learning rate}\)` -- Loss function: `\(J(w_i^t) =\frac{1}{2}\sum_i(y^{(i)}-\phi(z)^{(i)})^2\)` -- `\(\frac{\partial{J}}{\partial{w^t_j}} = - \sum(y^{(i)} - \phi(z)^{(i)})x^{(i)}_j\)`, `\(W^{t+1} = W^t - \eta \nabla J(W^t)\)` --- # Perceptron, how do they work? Let's see this in action: -- <center> <img src="https://i.redd.it/omhr2r7vb8i41.gif" alt="reddit perceptron", width = "60%"> Source: Reddit </center> --- # Perceptron, how do they work? You'll notice that the linear 'activation function' produces a linear decision boundary. -- - So what's the .hi-orange[big deal?] We could do that, and more, with SVMs. -- - Unlike with SVM, perceptrons .hi-orange[improve over time.] -- Because a perceptron can hypothetically continue to improve so long as it keeps seeing any loss, the only upper limit to a perceptron's performance is how flexible the activation function is. -- But there's more than that: because we're using derivatives to update the weights, we can link a bunch of neurons together and use .hi-red[the chain rule] to optimize the model weights `\(W\)` as they .hi[work together] to find the pattern that maps `\(X \to Y\)`. -- Well what do you get when you glue a bunch of neurons together? -- .hi[A brain!].hi-orange[*] --- # Neural Networks By breaking down a complex mapping task into a series of steps, we can use large collections of modified perceptrons to universally approximate .hi-orange[any] functional form. -- - You .hi[already] know how to do this.orange[*], you can extend the model yourself! .footnote[ .orange[*] with a liiitle more calculus ] -- - Imagine making several perceptrons that each learn patterns in parallel that simultaneously are trying to minimize your loss function. -- These are called .hi-blue[layers] --- #Layer Gif <img src="images/neuron_to_layer.gif" width="80%" style="display: block; margin: auto;" /> Source: Imperial College --- #Layer? <center> <img src="images/neuron_to_layer.gif" width="30%" style="display: block; margin: auto;" /> </center> .hi[Layer] seems to imply we could have multiple sets of perceptrons? -- - .hi-orange[We can!] -- All we do is make the `\(y_0,\ y_1,\ y_2\)` feed into a .hi[new] set of perceptrons and have those new perceptrons find the .hi-blue[patterns] in the output of our old ones. -- that makes the output a little more complex, because now `\(\widehat{y}\)` is equal to `\(\sigma_2(\sigma_1(X))\)` -- Where `\(\sigma_i()\)` is the perceptron weighting and activation function for layer .hi[i] --- #Visualizing a Neural Network <img src="images/neural_net_diagram.png" width="80%" style="display: block; margin: auto;" /> --- # Backpropagation .hi-orange[How] do we update our weights using this new function? The main difference is that we need a way of creating an .hi[error] for EVERY layer. This requires... -- - .hi-red[the chain rule] (the scariest of rules) `$$F'(g(x)) = f'(g(x))g'(x)$$` -- For the output layer, we can define our error `\((\delta^L_j)\)` by `$$\delta^L_j = \frac{\partial{C}}{\partial{a^L_j}}\sigma'(z^L_j)$$` -- We can compute everything in here --- #Backpropagation `$$\delta^L_j = \frac{\partial{C}}{\partial{a^L_j}}\sigma'(z^L_j)$$` `\(\frac{\partial{C}}{\partial{a^L_j}} = (a^L_j - y_j)\)` (for .hi[our] cost function) -- we already have `\(z^L_j\)`, and plugging it into `\(\sigma'\)` is not difficult. The vector form of this is... `\(\delta^L = (a^L - y) \circ \sigma'(z^L)\)` Which we can also compute. --- #Backpropagation Now all that's left is to find the derivatives for the .hi[other layer errors] - however, those are going to be functions of the output error- let's look at some arbitrary layer `\(l \in \{1,\dots, L\}\)` where `\(l\)` is the layer before `\(l + 1\)` `$$\delta^l = ((w^{l+1})^T\delta^{l+1}) \circ \sigma'(z^l)$$` Where `\(w^{l+1} \equiv \text{weight matrix for layer }l+1\)` -- Now, we can use what we found to calculate `\(\delta^L\)`, which can be plugged in to find `\(\delta^{L-1}\)` and so on, until we reach the first layer. -- But what we really need is `\(\frac{\partial{C}}{\partial{w^l_j}}\)` to update the weights. --- #Backpropagation In order to update the weights, we really need to know how our costs are changing as a result of our chosen weights. This is actually super simple to do given what we already know: - `\(\frac{\partial{C}}{\partial{w^l_{j,k}}} = a^{l-1}_k\delta^l_j\)` Or, -- - `\(\frac{\partial{C}}{\partial{w^l}} = a_{input}\delta^l_{output}\)` Where `\(a_{input}\)` is the .hi[input] to the weight for layer `\(l\)` and `\(\delta_{output}\)` is the .hi-orange[error of the output] from the weight `\(w^l\)` -- Thus, the MSE for the guess `\(\widehat{y}\)` of `\(y\)` will trickle through the weights and update them as the algorithm learns. -- The error from the output trickles .hi[back] through the layers, .hi-orange[propagating] changes across all the weights at once --- #Backpropagation Let's see this in action: <center> <img src="https://miro.medium.com/proxy/1*mTTmfdMcFlPtyu8__vRHOQ.gif" alt="backprop", width = "70%"> </center> .footnote[ .orange[*] Source: Medium.com ] --- # Your errors You can think of this process as finding the .hi[bottom of a bowl] by rolling a ball (with no momentum) down the side of it. The method we used before is called .hi[gradient descent], but there are others (we'll see those later.) -- As the ball gets close to the bottom of the bowl, it will slow down and eventually stop at the lowest point. -- - Let's look at how gradient descent finds its way to the optimal point ``` #> [1] TRUE ``` <img src="descent_2D_sphere.gif" width="40%" style="display: block; margin: auto;" /> --- # A good job! Start at the point -6,-2 and see how we do! <img src="005-slides-2023_files/figure-html/graddesc1-1.svg" style="display: block; margin: auto;" /> --- # Sensitive! What if we move that point a little to the right? Now, let's start at -6,-3 <img src="005-slides-2023_files/figure-html/graddesc2-1.svg" style="display: block; margin: auto;" /> --- # Learning Rate ## Too fast!! <img src="005-slides-2023_files/figure-html/graddesc4-1.svg" style="display: block; margin: auto;" /> --- # Other Elements Unlike other machine learning models we've seen so far, Neural Networks learn over what are called .hi-orange[epochs.] -- - .hi-orange[dfn] Epoch: One full pass of the model over your training data set (updating the weights for each pass) -- - They do this because they can still learn from data they have already seen before, so long as the network output has a non-zero error and they haven't reached the optimization point for the data they've seen. Let's watch some neural networks .hi[do their thing.] Go here and play around with an app from stanford's cs dept: <a href = 'https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html'>playground</a>. - Check out the 'random data' to see the 'hardest' pattern to fit. --- # Some pitfalls of neural networks - They are extremely flexible - and as we know already, that means they are prone to .hi[overfitting.] Even more so than the algorithms you've seen so far. -- - They also have a ton of components to consider. You have an activation function to choose, you have a number of layers, you have the number of nodes .hi[IN] those layers... --- #Some pitfalls of neural networks <img src="images/snoop.jpg" width="50%" style="display: block; margin: auto;" /> --- #Some pitfalls of neural networks - this makes cross-validation much more difficult. They are much more reliant on either guessing and checking (.hi[bad]) or having experience with them (.hi-orange[better]). -- - Further, because they climb/fall to an optimum/minimum, there's no guarantee that the solution you find is the .hi[best] one you could have found for your data. It's highly dependent on how you're updating your weights (.hi-orange[optimization method]) and where you started. -- How we've updated our weights so far has used something called .hi[gradient descent]. -- These two reasons are why the methods often used to minimize a cost function are .hi[super bizarre.] --- #Common Optimization Techniques An example, .hi[Stochastic Gradient Descent]: -- Rather than calculate the error across all points in the sample, only calculate the error for a single point at a time to speed up the process and avoid overfit/getting stuck at the same time. Let's see how some of these weird functions act next to gradient descent .hi-red[red ball.] -- <img src="images/optims.gif" width="30%" style="display: block; margin: auto;" /> --- #Programming a Neural Network It is perfectly possible to code a neural network by hand, but by far the most common tool used to write neural networks for production is to use either .hi[Tensorflow]/.hi-orange[keras] or pytorch. -- The main downside of these is that they're both written in C, and R accesses them through Python which means it may take some wrangling to get tensorflow for Rstudio to work on your computer. -- Both of these work together and use a special 2+ dimension matrix called a 'tensor'. -- I don't have enough time to explain exactly what a tensor is, but luckily this adorable human does a much better job than I ever could: <a href = 'https://www.youtube.com/watch?v=f5liqUk0ZTw'> tensors </a> -- .hi[Tensorflow] lets us build a model sequentially, layer by layer. This is very similar to how tidymodels lets you build your data process. --- #Tensorflow & Keras Like in .hi-orange[tidymodels], .hi[tensorflow] starts with a model-type object. -- For our purposes, this is `keras_model_sequential()`. -- This tells tensorflow we're going to build our neural network sequentially. -- Just like .hi-orange[parsnip], we build an abstration of a model that will be fed data to train on later. --- #MNIST dataset First, let's remind ourselves what our neural network is trying to do: -- - We want our neural network to read .hi[handwritten numbers] and tell us what number they are .hi[supposed] to represent. -- Just like with all machine learning, we need to really understand our data to do a good job predicting. Let's take a closer look at one observation in our dataset. --- #MNIST dataset ```r mnist <- dataset_fashion_mnist() mnist$train$x ``` - .hi[WOOOO] that was a bad idea. Here's an illustration of what each `\(x_i\)` looks like: <img src="images/mnistdat.png" width="50%" style="display: block; margin: auto;" /> --- #MNIST data You don't trust me, fine. We can look at our stuff too. <img src="005-slides-2023_files/figure-html/unnamed-chunk-19-1.svg" width="50%" style="display: block; margin: auto;" /> --- #MNIST data How about the first 25 <img src="005-slides-2023_files/figure-html/unnamed-chunk-20-1.svg" width="50%" style="display: block; margin: auto;" /> --- #MNIST Ok, that's great, but remember we need to set up this problem so we can turn a picture into an outcome (the number in the set). How do we do it? -- Well, we can squish the data so that each of these matrices in just a very long row of Xs. (28*28 = 784 different variables.) -- <img src="images/flatten.png" width="20%" style="display: block; margin: auto;" /> In practice, we can do this with color images as well, with each RGB value acting as a different pixel score. --- #Tensorflow & Keras Ok- let's do this in order. ```r model <- keras_model_sequential() ``` --- #Tensorflow & Keras Ok- let's do this in order. ```r model <- keras_model_sequential() %>% layer_flatten(input_shape = c(28,28)) ``` -- This layer takes our 28x28 pixel image and flatttens it into a vector so the model can read it. Think of this like a recipe step. --- #Tensorflow & Keras Ok- let's do this in order. ```r model <- keras_model_sequential() %>% layer_flatten(input_shape = c(28,28)) %>% layer_dense(units = 128, activation = "relu") ``` -- This is our first layer in our Neural network! It creates 128 different 'neurons' (or perceptrons) and sets their activation function to be a relu. -- - What's a relu? It stands for: .hi[Re]ctified .hi[L]inear .hi[U]nit. It's all the rage these days, but it is super simple: `$$ReLU(X) = max(0, X)$$` <img src="images/relu.png" width="28%" style="display: block; margin: auto;" /> --- #Tensorflow & Keras Ok- let's do this in order. ```r model <- keras_model_sequential() %>% layer_flatten(input_shape = c(28,28)) %>% layer_dense(units = 128, activation = "relu", name= 'hiddenlayer') %>% layer_dense(10, activation = "softmax", name = 'outputlayer') ``` -- This is our output layer, and the activation it's using is simply - "classify this into one of 10 (number of nodes) classes, and guess the one with the largest probability." -- And that's a neural network! I gave the layers names, because it will be easier to see what they do if I come back later. How do I do that? --- #Tensorflow & Keras ```r summary(model) ``` you can see how many parameters they have and check - it should be `\(784\text{(28pix*28pix)}*128\text{(number of hidden neurons)} + 128(w_0 \ or \ b) = 100,480\)` --- #Tensorflow & Keras you can see how many parameters they have and check - it should be `\(784\text{(28pix*28pix)}*128\text{(number of hidden neurons)} + 128(w_0 \ or \ b) = 100,480\)` -- This is one reason why models like .hi-orange[ChatGPT] have billions or trillions of parameters. --- #Tensorflow & Keras Now we need to .hi[prepare] the model. -- This is done with a `compile` command. We need to give that command a loss function to use, an optimizer (we'll use Adam which I described earlier) and any metrics we're interested in (let's look at accuracy.) -- ```r model %>% compile( loss = "sparse_categorical_crossentropy", optimizer = "Adam", metrics = "accuracy" ) ``` --- #Tensorflow & Keras Now, we can use keras/tf to fit our model. First, let's separate our data. ```r mnist$train$x = mnist$train$x/255 #normalizes to (0,1) mnist$test$x = mnist$test$x/255 #normalizes to (0,1) ``` I want you to see what it looks like when it runs, so I'm going to move to a new slide. --- #Tensorflow & Keras ```r model %>% fit( x = mnist$train$x, y = mnist$train$y, epochs = 5, validation_split = 0.3 ) ``` <img src="images/Tensorflow1.png" width="50%" style="display: block; margin: auto;" /> --- #Tensorflow & Keras <img src="images/tensorflowgraph.png" width="80%" style="display: block; margin: auto;" /> --- #Tensorflow And that's how you run a neural network! Ours is getting a ~ 92% accuracy rate classifying pictures into one of .hi[10 different categories]. If you run this for 40 epochs, you'll get ~ 95% accuracy. -- It's .hi[super] easy. -- It's .hi-orange[too] easy. -- Be careful... remember all of those things you've learned about model selection? They matter 10x more with deep learning. --- # Other types of networks There are TONS of other cool structures, but they all work exactly like these under the hood - they just add fancy math that allows us to process our inputs in a slightly different way. -- .hi[CNN:] Convolutional Neural Network. You know that flattening step? That kind of sucks. We're actually losing information on the location of the pixels when we do that. If only we could hold onto that information... -- <img src="images/convolve.jpeg" width="50%" style="display: block; margin: auto;" /> -- .hi[By using filters] and feeding in data in a special way, we can hold onto the positional information of our numbers. --- # Other types of networks What about timeseries? Wouldn't it be nice to be able to tell a neural network the 'order' the data should occur in? -- This ordering of data by time, in ML language, is called 'inductive bias' and can be a good or a bad thing. -- This can be explicitly be done by a class of estimators called RNN **R**ecurrent **N**eural **N**etworks. -- Sometimes, however, you don't actually know how to sort a set of data-and this is where .hi-green[transformers] come into play. These powerful structures are what have powered LLM and recent advancements in Computer Vision. let's go through how these work. --- # Grocery Store Problem You run a store, and you have two sections: vegetables and fruits. All of the things you sell need to be placed in those two categories. Let's code some items we want to sort in the following way: ## Vegetable **0**: Not at all a vegetable **1**: Definitely a vegetable ## Fruit **0**: Not at all a fruit **1**: Definitely a fruit --- # Grocery Store Problem ## Carrot? <img src="images/AL012-02 carotte_0.jpg" width="60%" style="display: block; margin: auto;" /> --- # Grocery Store Problem ## Orange? <img src="images/orange.jpg" width="40%" style="display: block; margin: auto;" /> --- # Grocery Store Problem ## V8? <img src="images/v8.jpeg" width="40%" style="display: block; margin: auto;" /> --- # Grocery Store Problem ## GI Joe Action Figure? <img src="images/gijoe.jpg" width="40%" style="display: block; margin: auto;" /> --- # Grocery Store Problem ## Tomato? <img src="images/tomato.jpg" width="40%" style="display: block; margin: auto;" /> --- # Grocery Store Problem ## Neat Numerical Trick Suppose we want items that are similar to one another to be 'closer' to each other on our two dimensional space. **see code** We can **organize** these in fairly arbitrary ways using these categories with a simple trick. `$$Veggie-Fruit (VF) vectors = <fruit-score,veggie-score>; VF matrix = <VF-vector-1...VF-vector-x>$$` `$$sorting scores = VF matrix\ x\ VF matrix^T\ x\ 'Values Matrix'$$` --- # Why'd you make us do this? We, as a group, just devised our own .hi[self-attention] mechanism! ### Imagine We have some dataset **X**. -- - we **learn** the matrix from **X**, in a high dimensional space using neural network weights. We'll call this matrix **keys** - we **learn** the inverted matrix to tell us what should be near to each other item. We'll call this second matrix **queries** - we **learn** the relative weighting of each item, and we call this third matrix **values** -- This is, mechanically, all most modern transformers are. - For items where we care about the ordering of the items, we need to give that information directly in **X**. --- # Attention <img src="images/attention.gif" width="80%" style="display: block; margin: auto;" /> --- # Transformer <img src="images/transformer.png" width="60%" style="display: block; margin: auto;" /> ## Multihead? If we think we want multiple different sorting techniques at once, we can stack multiple matrices on top of one another - each matrix is called a 'head'. --- # Why so popular? ## 1. They work -- ## 2. They scale --- #Other types of Networks What if we literally don't know anything about the data, just how each object relates to one another (think DNA structures)? -- .hi[GCN:] Graph Convolutional Networks. These read in graph data, and using the techniques you saw for CNNs, can predict new patterns. <img src="images/gcn.png" width="50%" style="display: block; margin: auto;" /> --- # Causal Discovery in Neural Networks maybe you want an **idea** of what the causal structure of your dataset looks like. Under some assumptions, **GOLEM** can find you that: ```r knitr::include_graphics('/Users/connor/Desktop/golem-results.png') ``` <img src="../../../../../../golem-results.png" width="50%" style="display: block; margin: auto;" /> --- #Other types of Networks Oh yeah, remember this? <img src="images/mnist_gen.png" width="50%" style="display: block; margin: auto;" /> -- That was generated by a neural network. Called a .hi[GAN] or Generative Adversarial Network, it is trained by having two neural networks duke it out. - One of the networks tries to .hi-orange[imitate] the hand-drawn pictures - The other network tries to .hi-orange[detect] compute-generated pictures. This model is used in the often cited deep-fake videos. -- GAN's are a little out-of-fashion now for generating content- replaced with 'Stable Diffusion', but it's also a neural network. --- #the only good use of a deepfake Imagine Nicholas Cage was in every movie, playing every part. <center> <img src="https://emaragkos.gr/wp-content/uploads/2019/10/280px-Deepfake_example.gif" alt="deepfake1", width = "70%"> </center> --- #Make Bob Ross Nightmare Fuel <center> <img src="https://hips.hearstapps.com/pop.h-cdn.co/assets/17/14/1491507280-bob2.gif" alt="backprop", width = "70%"> </center> The only difference between these is what data the model is trained on/what the objective function is -- .hi[But you basically understand how to do this yourself now.] --- #GAN training process: <img src="images/GANs.png" width="80%" style="display: block; margin: auto;" /> -- Just by looking at the diagram for a while, and learning how convolutional neural nets work, you could figure this out. --- #With great power... These models are .hi[powerful.] -- However, they aren't interpretable (.hi-orange[yet], and they're getting MUCH better at this every year.) -- They also use hundreds of thousands of parameters for even very simple models. -- That means you have to be super careful in how you evaluate them. -- --- class: inverse, middle # extras --- # Time Series, Explicitly Maybe you don't trust a model to TRULY learn time dependence. Then, we can use a **RNN**, but with a bunch of extra bells and whistles. -- .hi[LSTM/GRU]: Long-Short Term Memory/ Gated Recurrent Unit. Both of these adapt our neurons so they can 'hold onto' data further in the learning process and change it as they like. The diagram is ugly, but... -- <img src="images/lstm.png" width="30%" style="display: block; margin: auto;" /> where each cell is now <img src="images/lstmcell.png" width="20%" style="display: block; margin: auto;" /> --- #Basic RNN Visualized  --- #LSTM init  --- #LSTM step 2  --- #LSTM step 3  --- #LSTM final  --- # Let's Code! We can code one of these models now. Data: time series of power load, hourly in a given household. We see some voltage levels, and we want to predict the pull on the whole system. - note: this is a little bit tricky, but I promise it's all stuff you've seen or done before. **recommend:** you code along. --- # Running a LSTM in R
Date
Time
Global_active_power
Global_reactive_power
Voltage
Global_intensity
Sub_metering_1
Sub_metering_2
Sub_metering_3
datetime
16/12/2006
17:24:00
4.22
0.418
235
18.4
0
1
17
2006-12-16 17:24:00
16/12/2006
17:25:00
5.36
0.436
234
23Â Â
0
1
16
2006-12-16 17:25:00
16/12/2006
17:26:00
5.37
0.498
233
23Â Â
0
2
17
2006-12-16 17:26:00
16/12/2006
17:27:00
5.39
0.502
234
23Â Â
0
1
17
2006-12-16 17:27:00
16/12/2006
17:28:00
3.67
0.528
236
15.8
0
1
17
2006-12-16 17:28:00
16/12/2006
17:29:00
3.52
0.522
235
15Â Â
0
2
17
2006-12-16 17:29:00
16/12/2006
17:30:00
3.7Â
0.52Â
235
15.8
0
1
17
2006-12-16 17:30:00
16/12/2006
17:31:00
3.7Â
0.52Â
235
15.8
0
1
17
2006-12-16 17:31:00
16/12/2006
17:32:00
3.67
0.51Â
234
15.8
0
1
17
2006-12-16 17:32:00
16/12/2006
17:33:00
3.66
0.51Â
234
15.8
0
2
16
2006-12-16 17:33:00
16/12/2006
17:34:00
4.45
0.498
233
19.6
0
1
17
2006-12-16 17:34:00
16/12/2006
17:35:00
5.41
0.47Â
233
23.2
0
1
17
2006-12-16 17:35:00
16/12/2006
17:36:00
5.22
0.478
233
22.4
0
1
16
2006-12-16 17:36:00
16/12/2006
17:37:00
5.27
0.398
233
22.6
0
2
17
2006-12-16 17:37:00
16/12/2006
17:38:00
4.05
0.422
235
17.6
0
1
17
2006-12-16 17:38:00
16/12/2006
17:39:00
3.38
0.282
237
14.2
0
0
17
2006-12-16 17:39:00
16/12/2006
17:40:00
3.27
0.152
237
13.8
0
0
17
2006-12-16 17:40:00
16/12/2006
17:41:00
3.43
0.156
237
14.4
0
0
17
2006-12-16 17:41:00
16/12/2006
17:42:00
3.27
0Â Â Â Â
237
13.8
0
0
18
2006-12-16 17:42:00
16/12/2006
17:43:00
3.73
0Â Â Â Â
236
16.4
0
0
17
2006-12-16 17:43:00
16/12/2006
17:44:00
5.89
0Â Â Â Â
233
25.4
0
0
16
2006-12-16 17:44:00
16/12/2006
17:45:00
7.71
0Â Â Â Â
231
33.2
0
0
17
2006-12-16 17:45:00
16/12/2006
17:46:00
7.03
0Â Â Â Â
232
30.6
0
0
16
2006-12-16 17:46:00
16/12/2006
17:47:00
5.17
0Â Â Â Â
234
22Â Â
0
0
17
2006-12-16 17:47:00
16/12/2006
17:48:00
4.47
0Â Â Â Â
235
19.4
0
0
17
2006-12-16 17:48:00
16/12/2006
17:49:00
3.25
0Â Â Â Â
237
13.6
0
0
17
2006-12-16 17:49:00
16/12/2006
17:50:00
3.24
0Â Â Â Â
236
13.6
0
0
17
2006-12-16 17:50:00
16/12/2006
17:51:00
3.23
0Â Â Â Â
236
13.6
0
0
17
2006-12-16 17:51:00
16/12/2006
17:52:00
3.26
0Â Â Â Â
235
13.8
0
0
17
2006-12-16 17:52:00
16/12/2006
17:53:00
3.18
0Â Â Â Â
235
13.4
0
0
17
2006-12-16 17:53:00
16/12/2006
17:54:00
2.72
0Â Â Â Â
235
11.6
0
0
17
2006-12-16 17:54:00
16/12/2006
17:55:00
3.76
0.076
234
16.4
0
0
17
2006-12-16 17:55:00
16/12/2006
17:56:00
4.34
0.09Â
234
18.4
0
0
16
2006-12-16 17:56:00
16/12/2006
17:57:00
4.51
0Â Â Â Â
234
19.2
0
0
17
2006-12-16 17:57:00
16/12/2006
17:58:00
4.06
0.2Â Â
235
17.6
0
0
17
2006-12-16 17:58:00
16/12/2006
17:59:00
2.47
0.058
237
10.4
0
0
17
2006-12-16 17:59:00
16/12/2006
18:00:00
2.79
0.18Â
238
11.8
0
0
18
2006-12-16 18:00:00
16/12/2006
18:01:00
2.62
0.144
238
11Â Â
0
0
17
2006-12-16 18:01:00
16/12/2006
18:02:00
2.77
0.118
238
11.6
0
0
17
2006-12-16 18:02:00
16/12/2006
18:03:00
3.74
0.108
237
16.4
0
16
18
2006-12-16 18:03:00
16/12/2006
18:04:00
4.93
0.202
235
21Â Â
0
37
16
2006-12-16 18:04:00
16/12/2006
18:05:00
6.05
0.192
233
26.2
0
37
17
2006-12-16 18:05:00
16/12/2006
18:06:00
6.75
0.186
232
29Â Â
0
36
17
2006-12-16 18:06:00
16/12/2006
18:07:00
6.47
0.144
232
27.8
0
37
16
2006-12-16 18:07:00
16/12/2006
18:08:00
6.31
0.116
232
27Â Â
0
36
17
2006-12-16 18:08:00
16/12/2006
18:09:00
4.46
0.136
235
19Â Â
0
37
16
2006-12-16 18:09:00
16/12/2006
18:10:00
3.4Â
0.148
236
15Â Â
0
22
18
2006-12-16 18:10:00
16/12/2006
18:11:00
3.09
0.152
237
13.8
0
12
17
2006-12-16 18:11:00
16/12/2006
18:12:00
3.73
0.144
236
16.4
0
27
17
2006-12-16 18:12:00
16/12/2006
18:13:00
2.31
0.16Â
237
9.6
0
1
17
2006-12-16 18:13:00
16/12/2006
18:14:00
2.39
0.158
237
10Â Â
0
1
17
2006-12-16 18:14:00
16/12/2006
18:15:00
4.6Â
0.1Â Â
234
21.4
0
20
17
2006-12-16 18:15:00
16/12/2006
18:16:00
4.52
0.076
234
19.6
0
9
17
2006-12-16 18:16:00
16/12/2006
18:17:00
4.2Â
0.082
234
17.8
0
1
17
2006-12-16 18:17:00
16/12/2006
18:18:00
4.47
0Â Â Â Â
233
19.2
0
1
16
2006-12-16 18:18:00
16/12/2006
18:19:00
2.85
0Â Â Â Â
236
12Â Â
0
1
17
2006-12-16 18:19:00
16/12/2006
18:20:00
2.93
0Â Â Â Â
235
12.4
0
1
17
2006-12-16 18:20:00
16/12/2006
18:21:00
2.94
0Â Â Â Â
236
12.4
0
2
17
2006-12-16 18:21:00
16/12/2006
18:22:00
2.93
0Â Â Â Â
236
12.4
0
1
17
2006-12-16 18:22:00
16/12/2006
18:23:00
2.93
0Â Â Â Â
236
12.4
0
1
17
2006-12-16 18:23:00
16/12/2006
18:24:00
3.45
0Â Â Â Â
235
15.2
0
1
17
2006-12-16 18:24:00
16/12/2006
18:25:00
4.87
0Â Â Â Â
234
20.8
0
1
17
2006-12-16 18:25:00
16/12/2006
18:26:00
4.87
0Â Â Â Â
234
20.8
0
1
17
2006-12-16 18:26:00
16/12/2006
18:27:00
4.87
0Â Â Â Â
234
20.8
0
1
17
2006-12-16 18:27:00
16/12/2006
18:28:00
3.18
0Â Â Â Â
236
13.8
0
1
17
2006-12-16 18:28:00
16/12/2006
18:29:00
2.92
0Â Â Â Â
236
12.4
0
1
17
2006-12-16 18:29:00
16/12/2006
18:30:00
2.93
0Â Â Â Â
236
12.4
0
1
17
2006-12-16 18:30:00
16/12/2006
18:31:00
2.91
0.05Â
236
12.4
0
1
17
2006-12-16 18:31:00
16/12/2006
18:32:00
2.61
0.052
235
11Â Â
0
1
17
2006-12-16 18:32:00
16/12/2006
18:33:00
2.71
0.162
235
11.6
0
0
17
2006-12-16 18:33:00
16/12/2006
18:34:00
3.54
0.086
234
15.6
0
1
16
2006-12-16 18:34:00
16/12/2006
18:35:00
6.07
0Â Â Â Â
232
26.4
0
27
17
2006-12-16 18:35:00
16/12/2006
18:36:00
4.54
0Â Â Â Â
234
19.4
0
1
17
2006-12-16 18:36:00
16/12/2006
18:37:00
4.41
0Â Â Â Â
232
18.8
0
1
16
2006-12-16 18:37:00
16/12/2006
18:38:00
2.91
0.048
234
13Â Â
0
1
17
2006-12-16 18:38:00
16/12/2006
18:39:00
2.33
0.054
235
9.8
0
1
17
2006-12-16 18:39:00
16/12/2006
18:40:00
2.26
0.054
235
9.6
0
1
17
2006-12-16 18:40:00
16/12/2006
18:41:00
2.27
0.054
235
9.6
0
1
17
2006-12-16 18:41:00
16/12/2006
18:42:00
2.26
0.054
235
9.6
0
1
17
2006-12-16 18:42:00
16/12/2006
18:43:00
2.19
0.068
236
9.2
0
1
17
2006-12-16 18:43:00
16/12/2006
18:44:00
2.98
0.166
235
13.2
0
1
17
2006-12-16 18:44:00
16/12/2006
18:45:00
4.2Â
0.174
234
17.8
0
1
17
2006-12-16 18:45:00
16/12/2006
18:46:00
4.2Â
0.186
234
17.8
0
1
16
2006-12-16 18:46:00
16/12/2006
18:47:00
4.22
0.178
234
18Â Â
0
1
17
2006-12-16 18:47:00
16/12/2006
18:48:00
2.79
0.188
235
12Â Â
0
2
17
2006-12-16 18:48:00
16/12/2006
18:49:00
2.54
0.088
235
10.8
0
4
17
2006-12-16 18:49:00
16/12/2006
18:50:00
2.5Â
0.08Â
234
10.6
0
3
17
2006-12-16 18:50:00
16/12/2006
18:51:00
2.34
0.07Â
234
10Â Â
0
1
16
2006-12-16 18:51:00
16/12/2006
18:52:00
2.32
0Â Â Â Â
233
9.8
0
0
17
2006-12-16 18:52:00
16/12/2006
18:53:00
2.45
0Â Â Â Â
234
10.6
0
1
17
2006-12-16 18:53:00
16/12/2006
18:54:00
4.3Â
0Â Â Â Â
232
18.4
0
1
16
2006-12-16 18:54:00
16/12/2006
18:55:00
4.23
0.09Â
232
18.2
0
1
17
2006-12-16 18:55:00
16/12/2006
18:56:00
4.23
0.09Â
232
18.2
0
2
16
2006-12-16 18:56:00
16/12/2006
18:57:00
3.92
0.084
233
17Â Â
0
1
17
2006-12-16 18:57:00
16/12/2006
18:58:00
4.22
0.09Â
232
18Â Â
0
1
17
2006-12-16 18:58:00
16/12/2006
18:59:00
4.22
0.09Â
232
18.2
0
1
16
2006-12-16 18:59:00
16/12/2006
19:00:00
4.07
0.088
232
17.4
0
1
17
2006-12-16 19:00:00
16/12/2006
19:01:00
3.61
0.09Â
232
15.6
0
2
16
2006-12-16 19:01:00
16/12/2006
19:02:00
3.46
0.09Â
233
14.8
0
1
17
2006-12-16 19:02:00
16/12/2006
19:03:00
3.43
0.09Â
232
14.8
0
1
16
2006-12-16 19:03:00
Date
Time
Global_active_power
Global_reactive_power
Voltage
Global_intensity
Sub_metering_1
Sub_metering_2
Sub_metering_3
datetime
16/12/2006
17:24:00
4.22
0.418
235
18.4
0
1
17
2006-12-16 17:24:00
16/12/2006
17:25:00
5.36
0.436
234
23Â Â
0
1
16
2006-12-16 17:25:00
16/12/2006
17:26:00
5.37
0.498
233
23Â Â
0
2
17
2006-12-16 17:26:00
16/12/2006
17:27:00
5.39
0.502
234
23Â Â
0
1
17
2006-12-16 17:27:00
16/12/2006
17:28:00
3.67
0.528
236
15.8
0
1
17
2006-12-16 17:28:00
16/12/2006
17:29:00
3.52
0.522
235
15Â Â
0
2
17
2006-12-16 17:29:00
16/12/2006
17:30:00
3.7Â
0.52Â
235
15.8
0
1
17
2006-12-16 17:30:00
16/12/2006
17:31:00
3.7Â
0.52Â
235
15.8
0
1
17
2006-12-16 17:31:00
16/12/2006
17:32:00
3.67
0.51Â
234
15.8
0
1
17
2006-12-16 17:32:00
16/12/2006
17:33:00
3.66
0.51Â
234
15.8
0
2
16
2006-12-16 17:33:00
16/12/2006
17:34:00
4.45
0.498
233
19.6
0
1
17
2006-12-16 17:34:00
16/12/2006
17:35:00
5.41
0.47Â
233
23.2
0
1
17
2006-12-16 17:35:00
16/12/2006
17:36:00
5.22
0.478
233
22.4
0
1
16
2006-12-16 17:36:00
16/12/2006
17:37:00
5.27
0.398
233
22.6
0
2
17
2006-12-16 17:37:00
16/12/2006
17:38:00
4.05
0.422
235
17.6
0
1
17
2006-12-16 17:38:00
16/12/2006
17:39:00
3.38
0.282
237
14.2
0
0
17
2006-12-16 17:39:00
16/12/2006
17:40:00
3.27
0.152
237
13.8
0
0
17
2006-12-16 17:40:00
16/12/2006
17:41:00
3.43
0.156
237
14.4
0
0
17
2006-12-16 17:41:00
16/12/2006
17:42:00
3.27
0Â Â Â Â
237
13.8
0
0
18
2006-12-16 17:42:00
16/12/2006
17:43:00
3.73
0Â Â Â Â
236
16.4
0
0
17
2006-12-16 17:43:00
16/12/2006
17:44:00
5.89
0Â Â Â Â
233
25.4
0
0
16
2006-12-16 17:44:00
16/12/2006
17:45:00
7.71
0Â Â Â Â
231
33.2
0
0
17
2006-12-16 17:45:00
16/12/2006
17:46:00
7.03
0Â Â Â Â
232
30.6
0
0
16
2006-12-16 17:46:00
16/12/2006
17:47:00
5.17
0Â Â Â Â
234
22Â Â
0
0
17
2006-12-16 17:47:00
16/12/2006
17:48:00
4.47
0Â Â Â Â
235
19.4
0
0
17
2006-12-16 17:48:00
16/12/2006
17:49:00
3.25
0Â Â Â Â
237
13.6
0
0
17
2006-12-16 17:49:00
16/12/2006
17:50:00
3.24
0Â Â Â Â
236
13.6
0
0
17
2006-12-16 17:50:00
16/12/2006
17:51:00
3.23
0Â Â Â Â
236
13.6
0
0
17
2006-12-16 17:51:00
16/12/2006
17:52:00
3.26
0Â Â Â Â
235
13.8
0
0
17
2006-12-16 17:52:00
16/12/2006
17:53:00
3.18
0Â Â Â Â
235
13.4
0
0
17
2006-12-16 17:53:00
16/12/2006
17:54:00
2.72
0Â Â Â Â
235
11.6
0
0
17
2006-12-16 17:54:00
16/12/2006
17:55:00
3.76
0.076
234
16.4
0
0
17
2006-12-16 17:55:00
16/12/2006
17:56:00
4.34
0.09Â
234
18.4
0
0
16
2006-12-16 17:56:00
16/12/2006
17:57:00
4.51
0Â Â Â Â
234
19.2
0
0
17
2006-12-16 17:57:00
16/12/2006
17:58:00
4.06
0.2Â Â
235
17.6
0
0
17
2006-12-16 17:58:00
16/12/2006
17:59:00
2.47
0.058
237
10.4
0
0
17
2006-12-16 17:59:00
16/12/2006
18:00:00
2.79
0.18Â
238
11.8
0
0
18
2006-12-16 18:00:00
16/12/2006
18:01:00
2.62
0.144
238
11Â Â
0
0
17
2006-12-16 18:01:00
16/12/2006
18:02:00
2.77
0.118
238
11.6
0
0
17
2006-12-16 18:02:00
16/12/2006
18:03:00
3.74
0.108
237
16.4
0
16
18
2006-12-16 18:03:00
16/12/2006
18:04:00
4.93
0.202
235
21Â Â
0
37
16
2006-12-16 18:04:00
16/12/2006
18:05:00
6.05
0.192
233
26.2
0
37
17
2006-12-16 18:05:00
16/12/2006
18:06:00
6.75
0.186
232
29Â Â
0
36
17
2006-12-16 18:06:00
16/12/2006
18:07:00
6.47
0.144
232
27.8
0
37
16
2006-12-16 18:07:00
16/12/2006
18:08:00
6.31
0.116
232
27Â Â
0
36
17
2006-12-16 18:08:00
16/12/2006
18:09:00
4.46
0.136
235
19Â Â
0
37
16
2006-12-16 18:09:00
16/12/2006
18:10:00
3.4Â
0.148
236
15Â Â
0
22
18
2006-12-16 18:10:00
16/12/2006
18:11:00
3.09
0.152
237
13.8
0
12
17
2006-12-16 18:11:00
16/12/2006
18:12:00
3.73
0.144
236
16.4
0
27
17
2006-12-16 18:12:00
16/12/2006
18:13:00
2.31
0.16Â
237
9.6
0
1
17
2006-12-16 18:13:00
16/12/2006
18:14:00
2.39
0.158
237
10Â Â
0
1
17
2006-12-16 18:14:00
16/12/2006
18:15:00
4.6Â
0.1Â Â
234
21.4
0
20
17
2006-12-16 18:15:00
16/12/2006
18:16:00
4.52
0.076
234
19.6
0
9
17
2006-12-16 18:16:00
16/12/2006
18:17:00
4.2Â
0.082
234
17.8
0
1
17
2006-12-16 18:17:00
16/12/2006
18:18:00
4.47
0Â Â Â Â
233
19.2
0
1
16
2006-12-16 18:18:00
16/12/2006
18:19:00
2.85
0Â Â Â Â
236
12Â Â
0
1
17
2006-12-16 18:19:00
16/12/2006
18:20:00
2.93
0Â Â Â Â
235
12.4
0
1
17
2006-12-16 18:20:00
16/12/2006
18:21:00
2.94
0Â Â Â Â
236
12.4
0
2
17
2006-12-16 18:21:00
16/12/2006
18:22:00
2.93
0Â Â Â Â
236
12.4
0
1
17
2006-12-16 18:22:00
16/12/2006
18:23:00
2.93
0Â Â Â Â
236
12.4
0
1
17
2006-12-16 18:23:00
16/12/2006
18:24:00
3.45
0Â Â Â Â
235
15.2
0
1
17
2006-12-16 18:24:00
16/12/2006
18:25:00
4.87
0Â Â Â Â
234
20.8
0
1
17
2006-12-16 18:25:00
16/12/2006
18:26:00
4.87
0Â Â Â Â
234
20.8
0
1
17
2006-12-16 18:26:00
16/12/2006
18:27:00
4.87
0Â Â Â Â
234
20.8
0
1
17
2006-12-16 18:27:00
16/12/2006
18:28:00
3.18
0Â Â Â Â
236
13.8
0
1
17
2006-12-16 18:28:00
16/12/2006
18:29:00
2.92
0Â Â Â Â
236
12.4
0
1
17
2006-12-16 18:29:00
16/12/2006
18:30:00
2.93
0Â Â Â Â
236
12.4
0
1
17
2006-12-16 18:30:00
16/12/2006
18:31:00
2.91
0.05Â
236
12.4
0
1
17
2006-12-16 18:31:00
16/12/2006
18:32:00
2.61
0.052
235
11Â Â
0
1
17
2006-12-16 18:32:00
16/12/2006
18:33:00
2.71
0.162
235
11.6
0
0
17
2006-12-16 18:33:00
16/12/2006
18:34:00
3.54
0.086
234
15.6
0
1
16
2006-12-16 18:34:00
16/12/2006
18:35:00
6.07
0Â Â Â Â
232
26.4
0
27
17
2006-12-16 18:35:00
16/12/2006
18:36:00
4.54
0Â Â Â Â
234
19.4
0
1
17
2006-12-16 18:36:00
16/12/2006
18:37:00
4.41
0Â Â Â Â
232
18.8
0
1
16
2006-12-16 18:37:00
16/12/2006
18:38:00
2.91
0.048
234
13Â Â
0
1
17
2006-12-16 18:38:00
16/12/2006
18:39:00
2.33
0.054
235
9.8
0
1
17
2006-12-16 18:39:00
16/12/2006
18:40:00
2.26
0.054
235
9.6
0
1
17
2006-12-16 18:40:00
16/12/2006
18:41:00
2.27
0.054
235
9.6
0
1
17
2006-12-16 18:41:00
16/12/2006
18:42:00
2.26
0.054
235
9.6
0
1
17
2006-12-16 18:42:00
16/12/2006
18:43:00
2.19
0.068
236
9.2
0
1
17
2006-12-16 18:43:00
16/12/2006
18:44:00
2.98
0.166
235
13.2
0
1
17
2006-12-16 18:44:00
16/12/2006
18:45:00
4.2Â
0.174
234
17.8
0
1
17
2006-12-16 18:45:00
16/12/2006
18:46:00
4.2Â
0.186
234
17.8
0
1
16
2006-12-16 18:46:00
16/12/2006
18:47:00
4.22
0.178
234
18Â Â
0
1
17
2006-12-16 18:47:00
16/12/2006
18:48:00
2.79
0.188
235
12Â Â
0
2
17
2006-12-16 18:48:00
16/12/2006
18:49:00
2.54
0.088
235
10.8
0
4
17
2006-12-16 18:49:00
16/12/2006
18:50:00
2.5Â
0.08Â
234
10.6
0
3
17
2006-12-16 18:50:00
16/12/2006
18:51:00
2.34
0.07Â
234
10Â Â
0
1
16
2006-12-16 18:51:00
16/12/2006
18:52:00
2.32
0Â Â Â Â
233
9.8
0
0
17
2006-12-16 18:52:00
16/12/2006
18:53:00
2.45
0Â Â Â Â
234
10.6
0
1
17
2006-12-16 18:53:00
16/12/2006
18:54:00
4.3Â
0Â Â Â Â
232
18.4
0
1
16
2006-12-16 18:54:00
16/12/2006
18:55:00
4.23
0.09Â
232
18.2
0
1
17
2006-12-16 18:55:00
16/12/2006
18:56:00
4.23
0.09Â
232
18.2
0
2
16
2006-12-16 18:56:00
16/12/2006
18:57:00
3.92
0.084
233
17Â Â
0
1
17
2006-12-16 18:57:00
16/12/2006
18:58:00
4.22
0.09Â
232
18Â Â
0
1
17
2006-12-16 18:58:00
16/12/2006
18:59:00
4.22
0.09Â
232
18.2
0
1
16
2006-12-16 18:59:00
16/12/2006
19:00:00
4.07
0.088
232
17.4
0
1
17
2006-12-16 19:00:00
16/12/2006
19:01:00
3.61
0.09Â
232
15.6
0
2
16
2006-12-16 19:01:00
16/12/2006
19:02:00
3.46
0.09Â
233
14.8
0
1
17
2006-12-16 19:02:00
16/12/2006
19:03:00
3.43
0.09Â
232
14.8
0
1
16
2006-12-16 19:03:00
Date
Time
Global_active_power
Global_reactive_power
Voltage
Global_intensity
Sub_metering_1
Sub_metering_2
Sub_metering_3
datetime
16/12/2006
17:24:00
0.263Â Â
0.792Â
0.0162Â
0.277Â
NaN
0.027Â
0.0556
2006-12-16 17:24:00
16/12/2006
17:25:00
0.412Â Â
0.826Â
0.0111Â
0.416Â
NaN
0.027Â
0Â Â Â Â Â
2006-12-16 17:25:00
16/12/2006
17:26:00
0.413Â Â
0.943Â
0.00969
0.416Â
NaN
0.0541
0.0556
2006-12-16 17:26:00
16/12/2006
17:27:00
0.415Â Â
0.951Â
0.0116Â
0.416Â
NaN
0.027Â
0.0556
2006-12-16 17:27:00
16/12/2006
17:28:00
0.192Â Â
1Â Â Â Â Â
0.0197Â
0.199Â
NaN
0.027Â
0.0556
2006-12-16 17:28:00
16/12/2006
17:29:00
0.173Â Â
0.989Â
0.017Â Â
0.175Â
NaN
0.0541
0.0556
2006-12-16 17:29:00
16/12/2006
17:30:00
0.196Â Â
0.985Â
0.0172Â
0.199Â
NaN
0.027Â
0.0556
2006-12-16 17:30:00
16/12/2006
17:31:00
0.196Â Â
0.985Â
0.0178Â
0.199Â
NaN
0.027Â
0.0556
2006-12-16 17:31:00
16/12/2006
17:32:00
0.192Â Â
0.966Â
0.0126Â
0.199Â
NaN
0.027Â
0.0556
2006-12-16 17:32:00
16/12/2006
17:33:00
0.191Â Â
0.966Â
0.0121Â
0.199Â
NaN
0.0541
0Â Â Â Â Â
2006-12-16 17:33:00
16/12/2006
17:34:00
0.293Â Â
0.943Â
0.00789
0.313Â
NaN
0.027Â
0.0556
2006-12-16 17:34:00
16/12/2006
17:35:00
0.418Â Â
0.89Â Â
0.00755
0.422Â
NaN
0.027Â
0.0556
2006-12-16 17:35:00
16/12/2006
17:36:00
0.394Â Â
0.905Â
0.00844
0.398Â
NaN
0.027Â
0Â Â Â Â Â
2006-12-16 17:36:00
16/12/2006
17:37:00
0.4Â Â Â Â
0.754Â
0.0081Â
0.404Â
NaN
0.0541
0.0556
2006-12-16 17:37:00
16/12/2006
17:38:00
0.242Â Â
0.799Â
0.0179Â
0.253Â
NaN
0.027Â
0.0556
2006-12-16 17:38:00
16/12/2006
17:39:00
0.155Â Â
0.534Â
0.0259Â
0.151Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:39:00
16/12/2006
17:40:00
0.14Â Â Â
0.288Â
0.0241Â
0.139Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:40:00
16/12/2006
17:41:00
0.161Â Â
0.295Â
0.0255Â
0.157Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:41:00
16/12/2006
17:42:00
0.14Â Â Â
0Â Â Â Â Â
0.0258Â
0.139Â
NaN
0Â Â Â Â Â
0.111Â
2006-12-16 17:42:00
16/12/2006
17:43:00
0.2Â Â Â Â
0Â Â Â Â Â
0.0204Â
0.217Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:43:00
16/12/2006
17:44:00
0.481Â Â
0Â Â Â Â Â
0.00718
0.488Â
NaN
0Â Â Â Â Â
0Â Â Â Â Â
2006-12-16 17:44:00
16/12/2006
17:45:00
0.716Â Â
0Â Â Â Â Â
0Â Â Â Â Â Â
0.723Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:45:00
16/12/2006
17:46:00
0.628Â Â
0Â Â Â Â Â
0.00516
0.645Â
NaN
0Â Â Â Â Â
0Â Â Â Â Â
2006-12-16 17:46:00
16/12/2006
17:47:00
0.387Â Â
0Â Â Â Â Â
0.0135Â
0.386Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:47:00
16/12/2006
17:48:00
0.297Â Â
0Â Â Â Â Â
0.0167Â
0.307Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:48:00
16/12/2006
17:49:00
0.138Â Â
0Â Â Â Â Â
0.0238Â
0.133Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:49:00
16/12/2006
17:50:00
0.136Â Â
0Â Â Â Â Â
0.0204Â
0.133Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:50:00
16/12/2006
17:51:00
0.135Â Â
0Â Â Â Â Â
0.0194Â
0.133Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:51:00
16/12/2006
17:52:00
0.139Â Â
0Â Â Â Â Â
0.0189Â
0.139Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:52:00
16/12/2006
17:53:00
0.128Â Â
0Â Â Â Â Â
0.018Â Â
0.127Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:53:00
16/12/2006
17:54:00
0.069Â Â
0Â Â Â Â Â
0.0171Â
0.0723
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:54:00
16/12/2006
17:55:00
0.204Â Â
0.144Â
0.0134Â
0.217Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:55:00
16/12/2006
17:56:00
0.28Â Â Â
0.17Â Â
0.0117Â
0.277Â
NaN
0Â Â Â Â Â
0Â Â Â Â Â
2006-12-16 17:56:00
16/12/2006
17:57:00
0.302Â Â
0Â Â Â Â Â
0.0111Â
0.301Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:57:00
16/12/2006
17:58:00
0.243Â Â
0.379Â
0.0155Â
0.253Â
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:58:00
16/12/2006
17:59:00
0.0369Â
0.11Â Â
0.025Â Â
0.0361
NaN
0Â Â Â Â Â
0.0556
2006-12-16 17:59:00
16/12/2006
18:00:00
0.0781Â
0.341Â
0.0274Â
0.0783
NaN
0Â Â Â Â Â
0.111Â
2006-12-16 18:00:00
16/12/2006
18:01:00
0.0566Â
0.273Â
0.0303Â
0.0542
NaN
0Â Â Â Â Â
0.0556
2006-12-16 18:01:00
16/12/2006
18:02:00
0.0758Â
0.223Â
0.0306Â
0.0723
NaN
0Â Â Â Â Â
0.0556
2006-12-16 18:02:00
16/12/2006
18:03:00
0.201Â Â
0.205Â
0.025Â Â
0.217Â
NaN
0.432Â
0.111Â
2006-12-16 18:03:00
16/12/2006
18:04:00
0.356Â Â
0.383Â
0.0169Â
0.355Â
NaN
1Â Â Â Â Â
0Â Â Â Â Â
2006-12-16 18:04:00
16/12/2006
18:05:00
0.501Â Â
0.364Â
0.00818
0.512Â
NaN
1Â Â Â Â Â
0.0556
2006-12-16 18:05:00
16/12/2006
18:06:00
0.592Â Â
0.352Â
0.00478
0.596Â
NaN
0.973Â
0.0556
2006-12-16 18:06:00
16/12/2006
18:07:00
0.556Â Â
0.273Â
0.00365
0.56Â Â
NaN
1Â Â Â Â Â
0Â Â Â Â Â
2006-12-16 18:07:00
16/12/2006
18:08:00
0.535Â Â
0.22Â Â
0.00533
0.536Â
NaN
0.973Â
0.0556
2006-12-16 18:08:00
16/12/2006
18:09:00
0.295Â Â
0.258Â
0.0154Â
0.295Â
NaN
1Â Â Â Â Â
0Â Â Â Â Â
2006-12-16 18:09:00
16/12/2006
18:10:00
0.157Â Â
0.28Â Â
0.0219Â
0.175Â
NaN
0.595Â
0.111Â
2006-12-16 18:10:00
16/12/2006
18:11:00
0.117Â Â
0.288Â
0.0256Â
0.139Â
NaN
0.324Â
0.0556
2006-12-16 18:11:00
16/12/2006
18:12:00
0.2Â Â Â Â
0.273Â
0.0201Â
0.217Â
NaN
0.73Â Â
0.0556
2006-12-16 18:12:00
16/12/2006
18:13:00
0.0156Â
0.303Â
0.0271Â
0.012Â
NaN
0.027Â
0.0556
2006-12-16 18:13:00
16/12/2006
18:14:00
0.026Â Â
0.299Â
0.0264Â
0.0241
NaN
0.027Â
0.0556
2006-12-16 18:14:00
16/12/2006
18:15:00
0.313Â Â
0.189Â
0.0137Â
0.367Â
NaN
0.541Â
0.0556
2006-12-16 18:15:00
16/12/2006
18:16:00
0.303Â Â
0.144Â
0.0135Â
0.313Â
NaN
0.243Â
0.0556
2006-12-16 18:16:00
16/12/2006
18:17:00
0.261Â Â
0.155Â
0.014Â Â
0.259Â
NaN
0.027Â
0.0556
2006-12-16 18:17:00
16/12/2006
18:18:00
0.296Â Â
0Â Â Â Â Â
0.00969
0.301Â
NaN
0.027Â
0Â Â Â Â Â
2006-12-16 18:18:00
16/12/2006
18:19:00
0.0862Â
0Â Â Â Â Â
0.0194Â
0.0843
NaN
0.027Â
0.0556
2006-12-16 18:19:00
16/12/2006
18:20:00
0.096Â Â
0Â Â Â Â Â
0.0179Â
0.0964
NaN
0.027Â
0.0556
2006-12-16 18:20:00
16/12/2006
18:21:00
0.0976Â
0Â Â Â Â Â
0.0212Â
0.0964
NaN
0.0541
0.0556
2006-12-16 18:21:00
16/12/2006
18:22:00
0.0968Â
0Â Â Â Â Â
0.019Â Â
0.0964
NaN
0.027Â
0.0556
2006-12-16 18:22:00
16/12/2006
18:23:00
0.0958Â
0Â Â Â Â Â
0.0197Â
0.0964
NaN
0.027Â
0.0556
2006-12-16 18:23:00
16/12/2006
18:24:00
0.164Â Â
0Â Â Â Â Â
0.0177Â
0.181Â
NaN
0.027Â
0.0556
2006-12-16 18:24:00
16/12/2006
18:25:00
0.348Â Â
0Â Â Â Â Â
0.0116Â
0.349Â
NaN
0.027Â
0.0556
2006-12-16 18:25:00
16/12/2006
18:26:00
0.348Â Â
0Â Â Â Â Â
0.012Â Â
0.349Â
NaN
0.027Â
0.0556
2006-12-16 18:26:00
16/12/2006
18:27:00
0.348Â Â
0Â Â Â Â Â
0.0118Â
0.349Â
NaN
0.027Â
0.0556
2006-12-16 18:27:00
16/12/2006
18:28:00
0.128Â Â
0Â Â Â Â Â
0.019Â Â
0.139Â
NaN
0.027Â
0.0556
2006-12-16 18:28:00
16/12/2006
18:29:00
0.095Â Â
0Â Â Â Â Â
0.0204Â
0.0964
NaN
0.027Â
0.0556
2006-12-16 18:29:00
16/12/2006
18:30:00
0.0963Â
0Â Â Â Â Â
0.0217Â
0.0964
NaN
0.027Â
0.0556
2006-12-16 18:30:00
16/12/2006
18:31:00
0.094Â Â
0.0947
0.0203Â
0.0964
NaN
0.027Â
0.0556
2006-12-16 18:31:00
16/12/2006
18:32:00
0.0545Â
0.0985
0.0186Â
0.0542
NaN
0.027Â
0.0556
2006-12-16 18:32:00
16/12/2006
18:33:00
0.0683Â
0.307Â
0.0161Â
0.0723
NaN
0Â Â Â Â Â
0.0556
2006-12-16 18:33:00
16/12/2006
18:34:00
0.175Â Â
0.163Â
0.0117Â
0.193Â
NaN
0.027Â
0Â Â Â Â Â
2006-12-16 18:34:00
16/12/2006
18:35:00
0.504Â Â
0Â Â Â Â Â
0.0063Â
0.518Â
NaN
0.73Â Â
0.0556
2006-12-16 18:35:00
16/12/2006
18:36:00
0.305Â Â
0Â Â Â Â Â
0.0107Â
0.307Â
NaN
0.027Â
0.0556
2006-12-16 18:36:00
16/12/2006
18:37:00
0.288Â Â
0Â Â Â Â Â
0.00562
0.289Â
NaN
0.027Â
0Â Â Â Â Â
2006-12-16 18:37:00
16/12/2006
18:38:00
0.094Â Â
0.0909
0.0128Â
0.114Â
NaN
0.027Â
0.0556
2006-12-16 18:38:00
16/12/2006
18:39:00
0.0179Â
0.102Â
0.0159Â
0.0181
NaN
0.027Â
0.0556
2006-12-16 18:39:00
16/12/2006
18:40:00
0.00986
0.102Â
0.0155Â
0.012Â
NaN
0.027Â
0.0556
2006-12-16 18:40:00
16/12/2006
18:41:00
0.0106Â
0.102Â
0.018Â Â
0.012Â
NaN
0.027Â
0.0556
2006-12-16 18:41:00
16/12/2006
18:42:00
0.00908
0.102Â
0.0174Â
0.012Â
NaN
0.027Â
0.0556
2006-12-16 18:42:00
16/12/2006
18:43:00
0Â Â Â Â Â Â
0.129Â
0.0202Â
0Â Â Â Â Â
NaN
0.027Â
0.0556
2006-12-16 18:43:00
16/12/2006
18:44:00
0.103Â Â
0.314Â
0.0161Â
0.12Â Â
NaN
0.027Â
0.0556
2006-12-16 18:44:00
16/12/2006
18:45:00
0.261Â Â
0.33Â Â
0.0143Â
0.259Â
NaN
0.027Â
0.0556
2006-12-16 18:45:00
16/12/2006
18:46:00
0.262Â Â
0.352Â
0.0135Â
0.259Â
NaN
0.027Â
0Â Â Â Â Â
2006-12-16 18:46:00
16/12/2006
18:47:00
0.263Â Â
0.337Â
0.0126Â
0.265Â
NaN
0.027Â
0.0556
2006-12-16 18:47:00
16/12/2006
18:48:00
0.0776Â
0.356Â
0.0168Â
0.0843
NaN
0.0541
0.0556
2006-12-16 18:48:00
16/12/2006
18:49:00
0.0457Â
0.167Â
0.0155Â
0.0482
NaN
0.108Â
0.0556
2006-12-16 18:49:00
16/12/2006
18:50:00
0.04Â Â Â
0.152Â
0.0123Â
0.0422
NaN
0.0811
0.0556
2006-12-16 18:50:00
16/12/2006
18:51:00
0.0192Â
0.133Â
0.0106Â
0.0241
NaN
0.027Â
0Â Â Â Â Â
2006-12-16 18:51:00
16/12/2006
18:52:00
0.0174Â
0Â Â Â Â Â
0.0103Â
0.0181
NaN
0Â Â Â Â Â
0.0556
2006-12-16 18:52:00
16/12/2006
18:53:00
0.0337Â
0Â Â Â Â Â
0.0112Â
0.0422
NaN
0.027Â
0.0556
2006-12-16 18:53:00
16/12/2006
18:54:00
0.274Â Â
0Â Â Â Â Â
0.00592
0.277Â
NaN
0.027Â
0Â Â Â Â Â
2006-12-16 18:54:00
16/12/2006
18:55:00
0.265Â Â
0.17Â Â
0.00533
0.271Â
NaN
0.027Â
0.0556
2006-12-16 18:55:00
16/12/2006
18:56:00
0.265Â Â
0.17Â Â
0.00562
0.271Â
NaN
0.0541
0Â Â Â Â Â
2006-12-16 18:56:00
16/12/2006
18:57:00
0.225Â Â
0.159Â
0.0076Â
0.235Â
NaN
0.027Â
0.0556
2006-12-16 18:57:00
16/12/2006
18:58:00
0.263Â Â
0.17Â Â
0.00466
0.265Â
NaN
0.027Â
0.0556
2006-12-16 18:58:00
16/12/2006
18:59:00
0.264Â Â
0.17Â Â
0.00411
0.271Â
NaN
0.027Â
0Â Â Â Â Â
2006-12-16 18:59:00
16/12/2006
19:00:00
0.244Â Â
0.167Â
0.00424
0.247Â
NaN
0.027Â
0.0556
2006-12-16 19:00:00
16/12/2006
19:01:00
0.185Â Â
0.17Â Â
0.00579
0.193Â
NaN
0.0541
0Â Â Â Â Â
2006-12-16 19:01:00
16/12/2006
19:02:00
0.165Â Â
0.17Â Â
0.00726
0.169Â
NaN
0.027Â
0.0556
2006-12-16 19:02:00
16/12/2006
19:03:00
0.162Â Â
0.17Â Â
0.00432
0.169Â
NaN
0.027Â
0Â Â Â Â Â
2006-12-16 19:03:00
--- # Running an LSTM in R ```r ## this tells keras that I want it to remember the weights with the lowest loss and to wait for 6 epochs before saying 'I'm done' savebest = keras::callback_early_stopping(restore_best_weights = T, patience = 6) opt = optimizer_adam() model <- keras_model_sequential() %>% layer_conv_1d(input_shape = c(train_window, features), filters=21, kernel_size=1, strides = 1, activation = 'relu',name = 'conv-1d-1', padding = 'same') %>% #layer_batch_normalization(name = 'batchnorm') %>% ## normally want this #layer_activation_relu() %>% ## if we want to add an extra activation layer_lstm(21, name = 'lstm_layer', return_sequences = T, stateful = F) %>% layer_lstm(10, input_shape = c(train_window, features), name = 'lstm_layer_2', stateful = F) %>% layer_dense(10, activation = "linear", name = 'outputlayer') ``` --- # LSTM in Keras ```r summary(model) ``` --- # Running an LSTM in R ```r compile(model, loss = 'MSE', optimizer = opt) #normally - we'd want to reset_states(), and use a 'stateful' LSTM - then manually loop through the splits, but R-studio HATES that. #Some debugging gets this to work with some effort as below. We also would prefer a layernorm layer for(epoch in c(1:20)){ ## 20 epochs is our train time print(paste("Beginning epoch #", epoch)) keras::fit(model, x = data_prep_tf, y = data_prep_y_tf, epochs = 1, batch_size = 10, validation_data = list(x = data_test_tf,y = data_test_y_tf), shuffle=F, callbacks = savebest) model %>% reset_states() } model %>% save_model_hdf5("/Users/connor/Desktop/GithubProjects/Econometrics/524/EC524W20/lab/005-Perceptrons_and_NeuralNets/lstm_model_sf.hdf5") ``` --- <img src="lstm-preds.png" width="1068" style="display: block; margin: auto;" />