Introduction

We all have heard about Neural network and its prediction power. Underlying concepts of neural networks were not new and it uses traditional statistical techniques like linear regression. Many have a question in mind to compare neural networks and linear regression.

In this blog post, I am going to compare neural networks and linear regression with an example dataset. So this will provide more information and clarity on neural networks.

#devtools::install_github('rstudio/cloudml')

library(keras)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(cloudml)

## Loading required package: tfruns

#cloudml_train('train.R')

Back to the basics - Linear Regression

Lets go back to the basics of statistics “Linear Regression”. It is one of the oldest ML model. Everyone has used it or heard of it. There are many different versions of linear regression with different math behind it. But we are going to see a simple form of multiple linear regression.

In this test, we will be using a dataset related to excercise. Dataset is choosen to motivate everyone and the effects of it. You learn the technical part and also get to know about the excercises(Thats great!).

Our dataset has different predictor variables like Height, weight, duration of excercise, heart rate at that period, body temprature and gender. Our response variable is Calories burnt.

# Loading data
excercise <- read.csv('./data/excercise.csv',stringsAsFactors = FALSE)
calories <- read.csv('./data/calories.csv',stringsAsFactors = FALSE)

# Merging the data
df = merge(x=excercise,y=calories,by='User_ID')

# Dummy variable for gender
df$Gender_bin <- factor(if_else(df$Gender=='female',0,1))


df <- df %>%  dplyr::select(-c(User_ID,Gender))

head(df)

As a good data science citizen, we need to split our dataset into two and create a multiple linear regression model.

set.seed(40)
#Random numbers
randomobs <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))

# Train dataset
train.df <- df[randomobs,]

#Test dataset
test.df <- df[-randomobs,]

model_1_multipe <- lm(Calories ~ .,train.df)
summary(model_1_multipe)

## 
## Call:
## lm(formula = Calories ~ ., data = train.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.530  -7.080  -1.475   5.400  75.545 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 459.76159   13.20347  34.821  < 2e-16 ***
## Age           0.49862    0.00684  72.903  < 2e-16 ***
## Height       -0.20583    0.02910  -7.073 1.61e-12 ***
## Weight        0.32872    0.03160  10.401  < 2e-16 ***
## Duration      6.61213    0.03743 176.636  < 2e-16 ***
## Heart_Rate    2.00525    0.02192  91.459  < 2e-16 ***
## Body_Temp   -16.83764    0.32849 -51.258  < 2e-16 ***
## Gender_bin1  -1.46182    0.36998  -3.951 7.83e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.27 on 10492 degrees of freedom
## Multiple R-squared:  0.9672, Adjusted R-squared:  0.9671 
## F-statistic: 4.415e+04 on 7 and 10492 DF,  p-value: < 2.2e-16

pred_regression <- predict(model_1_multipe, test.df %>% select(-Calories),type='response')

print(sqrt(mean((test.df$Calories - pred_regression)^2)))

## [1] 11.42611

Above model show that all the predictors are significant in predicting the calories burnt. And we got an impressive adjusted R-squared of 0.967. We might also think, we do we need use neural networks here. But lets put these results in our pocket and try out neural networks.

Fancy linear regression - Neural Networks(NN)

Now, lets use some fancy words in our model called neural networks. Neural netwoks learn the weights similar to linear regression. It involves series of matrix operations on different layers. But lets try to make it simple with one hidden layer.

We will keras package for building this NN. It sits on top of Tensorflow framework. In R, we easily install that package.

What is neural + networks?

Lets get back to Neural networks, everyone has a basic question, what is neuron and why it is called as networks? Neurons are simple computational units that have weighed input signals and produce an output signal using an activation function. Network is a series of layers which are combined together.

Neuron weights

Neuron weights are pretty much like the coefficients used in a regression equation. Like linear regression, each neuron has a bias term which is added to the input. Generally weights are initialized to a small random values in range 0 to 1. It is desirable to keep small weights. That is an another reason to scale our input data.

Activation function

Weighted inputs are summed and passed through an activation or transfer function. An activation function is a simple mapping of summed weighted input to the output of the neuron. It decides whether the neuron has to be activated and the output should be sent to next layer or not. Traditionally nonlinear activation functions are used to model complex data. More recently the rectfifier activation function has been shown to provide better results.

Modelling using neural networks

train.df$Gender_bin <- if_else(train.df$Gender_bin==0,0,1)
test.df$Gender_bin <- if_else(test.df$Gender_bin==0,0,1)

train_x <- train.df %>% select(-Calories) %>% scale()
train_x_s <- scale(train_x)

train_y <- train.df %>% select(Calories)%>% as.matrix()

test_x <- test.df %>% select(-Calories) 
test_x_s <- scale(test_x)

test_y <- test.df %>% select(Calories)%>% as.matrix()

Now we have defined our training and testing dataset which has been scaled. Scaling will be reduce the magnitude of the values, so NN can perform backpropagation. Backpropagation will update the weights of the units.

The model which we are creating is a sequential model. And the model need to know the number of predictors which is been pasesed. We dont need to supply sample size. We also need to pass the number of neurons or units which needs to be created initially. After that we need to add the hidden layer, here we need to mention the count of neurons. Finally, we need to create a layer which is of one neuron to predict the Calories burnt.

model <- keras_model_sequential() 

model %>% layer_dense(units = 8, activation = 'relu', input_shape = c(7)) %>% 
  layer_dense(units = 64, activation = "relu") %>%
  layer_dense(units = 1)

summary(model)

## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_1 (Dense)                  (None, 8)                     64          
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 64)                    576         
## ___________________________________________________________________________
## dense_3 (Dense)                  (None, 1)                     65          
## ===========================================================================
## Total params: 705
## Trainable params: 705
## Non-trainable params: 0
## ___________________________________________________________________________

We also need to define the loss metric. Model will optimize the weights depending on the loss function. Optimizer is a type of stocastic gradient decent.

model %>% compile(loss='mse',optimizer='rmsprop',metrics='mse')

Finally we will fit the model with 100 epochs(100 different times) and the batch size(it will update weights after each batch)

history = model %>% fit(train_x_s,train_y, epochs=10,batch_size = 8,validation_split = 0.2)

Model has been executed for 10 epochs and finally it got trained with the proper weights. This is the final model which can be used to predict the future data. Lets quickly plot the loss and error

plot(history)

Finally we will evaluate the model with test dataset. This will provide the

#RMSE
print(sqrt(evaluate(model, test_x_s, test_y)$mean_squared_error))

## [1] 2.953651

preds <- predict(model, test_x_s)


final <- data.frame(preds_nn=preds,preds_lr =pred_regression, actual=test_y)

knitr::kable(head(final))

	preds_nn	preds_lr	Calories
6	131.42140	139.16194	130
7	66.44913	72.51575	65
8	31.51748	16.75636	30
10	58.30813	64.04842	55
13	58.63359	57.43671	55
14	252.56935	216.73320	264

Conclusion

We can see from the results that a simple neural network outperforms the linear regression. We can do lots of improvement to linear model, however it has to be done manually. But in neural network, we have not performed any major transformations. Still NN outperforms many manual models easily. NN shine when there is more complex dataset.

R Notebook