We all have heard about Neural network and its prediction power. Underlying concepts of neural networks were not new and it uses traditional statistical techniques like linear regression. Many have a question in mind to compare neural networks and linear regression.
In this blog post, I am going to compare neural networks and linear regression with an example dataset. So this will provide more information and clarity on neural networks.
#devtools::install_github('rstudio/cloudml')
library(keras)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(cloudml)
## Loading required package: tfruns
#cloudml_train('train.R')
Lets go back to the basics of statistics “Linear Regression”. It is one of the oldest ML model. Everyone has used it or heard of it. There are many different versions of linear regression with different math behind it. But we are going to see a simple form of multiple linear regression.
In this test, we will be using a dataset related to excercise. Dataset is choosen to motivate everyone and the effects of it. You learn the technical part and also get to know about the excercises(Thats great!).
Our dataset has different predictor variables like Height, weight, duration of excercise, heart rate at that period, body temprature and gender. Our response variable is Calories burnt.
# Loading data
excercise <- read.csv('./data/excercise.csv',stringsAsFactors = FALSE)
calories <- read.csv('./data/calories.csv',stringsAsFactors = FALSE)
# Merging the data
df = merge(x=excercise,y=calories,by='User_ID')
# Dummy variable for gender
df$Gender_bin <- factor(if_else(df$Gender=='female',0,1))
df <- df %>% dplyr::select(-c(User_ID,Gender))
head(df)
As a good data science citizen, we need to split our dataset into two and create a multiple linear regression model.
set.seed(40)
#Random numbers
randomobs <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
# Train dataset
train.df <- df[randomobs,]
#Test dataset
test.df <- df[-randomobs,]
model_1_multipe <- lm(Calories ~ .,train.df)
summary(model_1_multipe)
##
## Call:
## lm(formula = Calories ~ ., data = train.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.530 -7.080 -1.475 5.400 75.545
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 459.76159 13.20347 34.821 < 2e-16 ***
## Age 0.49862 0.00684 72.903 < 2e-16 ***
## Height -0.20583 0.02910 -7.073 1.61e-12 ***
## Weight 0.32872 0.03160 10.401 < 2e-16 ***
## Duration 6.61213 0.03743 176.636 < 2e-16 ***
## Heart_Rate 2.00525 0.02192 91.459 < 2e-16 ***
## Body_Temp -16.83764 0.32849 -51.258 < 2e-16 ***
## Gender_bin1 -1.46182 0.36998 -3.951 7.83e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.27 on 10492 degrees of freedom
## Multiple R-squared: 0.9672, Adjusted R-squared: 0.9671
## F-statistic: 4.415e+04 on 7 and 10492 DF, p-value: < 2.2e-16
pred_regression <- predict(model_1_multipe, test.df %>% select(-Calories),type='response')
print(sqrt(mean((test.df$Calories - pred_regression)^2)))
## [1] 11.42611
Above model show that all the predictors are significant in predicting the calories burnt. And we got an impressive adjusted R-squared of 0.967. We might also think, we do we need use neural networks here. But lets put these results in our pocket and try out neural networks.
Now, lets use some fancy words in our model called neural networks. Neural netwoks learn the weights similar to linear regression. It involves series of matrix operations on different layers. But lets try to make it simple with one hidden layer.
We will keras package for building this NN. It sits on top of Tensorflow framework. In R, we easily install that package.
Lets get back to Neural networks, everyone has a basic question, what is neuron and why it is called as networks? Neurons are simple computational units that have weighed input signals and produce an output signal using an activation function. Network is a series of layers which are combined together.
Neuron weights are pretty much like the coefficients used in a regression equation. Like linear regression, each neuron has a bias term which is added to the input. Generally weights are initialized to a small random values in range 0 to 1. It is desirable to keep small weights. That is an another reason to scale our input data.
Weighted inputs are summed and passed through an activation or transfer function. An activation function is a simple mapping of summed weighted input to the output of the neuron. It decides whether the neuron has to be activated and the output should be sent to next layer or not. Traditionally nonlinear activation functions are used to model complex data. More recently the rectfifier activation function has been shown to provide better results.
train.df$Gender_bin <- if_else(train.df$Gender_bin==0,0,1)
test.df$Gender_bin <- if_else(test.df$Gender_bin==0,0,1)
train_x <- train.df %>% select(-Calories) %>% scale()
train_x_s <- scale(train_x)
train_y <- train.df %>% select(Calories)%>% as.matrix()
test_x <- test.df %>% select(-Calories)
test_x_s <- scale(test_x)
test_y <- test.df %>% select(Calories)%>% as.matrix()
Now we have defined our training and testing dataset which has been scaled. Scaling will be reduce the magnitude of the values, so NN can perform backpropagation. Backpropagation will update the weights of the units.
The model which we are creating is a sequential model. And the model need to know the number of predictors which is been pasesed. We dont need to supply sample size. We also need to pass the number of neurons or units which needs to be created initially. After that we need to add the hidden layer, here we need to mention the count of neurons. Finally, we need to create a layer which is of one neuron to predict the Calories burnt.
model <- keras_model_sequential()
model %>% layer_dense(units = 8, activation = 'relu', input_shape = c(7)) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 1)
summary(model)
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## dense_1 (Dense) (None, 8) 64
## ___________________________________________________________________________
## dense_2 (Dense) (None, 64) 576
## ___________________________________________________________________________
## dense_3 (Dense) (None, 1) 65
## ===========================================================================
## Total params: 705
## Trainable params: 705
## Non-trainable params: 0
## ___________________________________________________________________________
We also need to define the loss metric. Model will optimize the weights depending on the loss function. Optimizer is a type of stocastic gradient decent.
model %>% compile(loss='mse',optimizer='rmsprop',metrics='mse')
Finally we will fit the model with 100 epochs(100 different times) and the batch size(it will update weights after each batch)
history = model %>% fit(train_x_s,train_y, epochs=10,batch_size = 8,validation_split = 0.2)
Model has been executed for 10 epochs and finally it got trained with the proper weights. This is the final model which can be used to predict the future data. Lets quickly plot the loss and error
plot(history)
Finally we will evaluate the model with test dataset. This will provide the
#RMSE
print(sqrt(evaluate(model, test_x_s, test_y)$mean_squared_error))
## [1] 2.953651
preds <- predict(model, test_x_s)
final <- data.frame(preds_nn=preds,preds_lr =pred_regression, actual=test_y)
knitr::kable(head(final))
| preds_nn | preds_lr | Calories | |
|---|---|---|---|
| 6 | 131.42140 | 139.16194 | 130 |
| 7 | 66.44913 | 72.51575 | 65 |
| 8 | 31.51748 | 16.75636 | 30 |
| 10 | 58.30813 | 64.04842 | 55 |
| 13 | 58.63359 | 57.43671 | 55 |
| 14 | 252.56935 | 216.73320 | 264 |
We can see from the results that a simple neural network outperforms the linear regression. We can do lots of improvement to linear model, however it has to be done manually. But in neural network, we have not performed any major transformations. Still NN outperforms many manual models easily. NN shine when there is more complex dataset.