library(MASS)
set.seed(500)
library(MASS)
data <- Boston
apply(data,2,function(x) sum(is.na(x)))
## crim zn indus chas nox rm age dis rad tax
## 0 0 0 0 0 0 0 0 0 0
## ptratio black lstat medv
## 0 0 0 0
index <- sample(1:nrow(data),round(0.75*nrow(data)))
train <- data[index,]
test <- data[-index,]
lm.fit <- glm(medv~., data=train)
summary(lm.fit)
##
## Call:
## glm(formula = medv ~ ., data = train)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.111702 5.459811 5.698 2.49e-08 ***
## crim -0.111372 0.033256 -3.349 0.000895 ***
## zn 0.042633 0.014307 2.980 0.003077 **
## indus 0.001483 0.067455 0.022 0.982473
## chas 1.756844 0.981087 1.791 0.074166 .
## nox -18.184847 4.471572 -4.067 5.84e-05 ***
## rm 4.760341 0.480472 9.908 < 2e-16 ***
## age -0.013439 0.014101 -0.953 0.341190
## dis -1.553748 0.218929 -7.097 6.65e-12 ***
## rad 0.288181 0.072017 4.002 7.62e-05 ***
## tax -0.013739 0.004060 -3.384 0.000791 ***
## ptratio -0.947549 0.140120 -6.762 5.38e-11 ***
## black 0.009502 0.002901 3.276 0.001154 **
## lstat -0.388902 0.059733 -6.511 2.47e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 20.23806)
##
## Null deviance: 32463.5 on 379 degrees of freedom
## Residual deviance: 7407.1 on 366 degrees of freedom
## AIC: 2237
##
## Number of Fisher Scoring iterations: 2
pr.lm <- predict(lm.fit,test)
MSE.lm <- sum((pr.lm - test$medv)^2)/nrow(test)
maxs <- apply(data, 2, max)
mins <- apply(data, 2, min)
scaled <- as.data.frame(scale(data, center = mins, scale = maxs - mins))
train_ <- scaled[index,]
test_ <- scaled[-index,]
library(neuralnet)
n <- names(train_)
f <- as.formula(paste("medv ~", paste(n[!n %in% "medv"], collapse = " + ")))
nn <- neuralnet(f,data=train_,hidden=c(5,3),linear.output=T)
For some reason the formula y. is not accepted in the neuralnet() function. You need to first write the formula and then pass it as an argument in the fitting function. The hidden argument accepts a vector with the number of neurons for each hidden layer, while the argument linear.output is used to specify whether we want to do regression linear.output=TRUE or classification linear.output=FALSE @@ ## The neuralnet package provides a nice tool to plot the model
plot(nn)
This is the graphical representation of the model with the weights on each connection:
Now we can try to predict the values for the test set and calculate the MSE. Remember that the net will output a normalized prediction, so we need to scale it back in order to make a meaningful comparison (or just a simple prediction).
pr.nn <- compute(nn,test_[,1:13])
pr.nn_ <- pr.nn$net.result*(max(data$medv)-min(data$medv))+min(data$medv)
test.r <- (test_$medv)*(max(data$medv)-min(data$medv))+min(data$medv)
MSE.nn <- sum((test.r - pr.nn_)^2)/nrow(test_)
we then compare the two MSEs
print(paste(MSE.lm,MSE.nn))
## [1] "31.2630222372616 16.4595537665717"
par(mfrow=c(1,2))
plot(test$medv,pr.nn_,col='red',main='Real vs predicted NN',pch=18,cex=0.7)
abline(0,1,lwd=2)
legend('bottomright',legend='NN',pch=18,col='red', bty='n')
plot(test$medv,pr.lm,col='blue',main='Real vs predicted lm',pch=18, cex=0.7)
abline(0,1,lwd=2)
legend('bottomright',legend='LM',pch=18,col='blue', bty='n', cex=.95)
By visually inspecting the plot we can see that the predictions
made by the neural network are (in general) more concetrated around the
line (a perfect alignment with the line would indicate a MSE of 0 and
thus an ideal perfect prediction) than those made by the linear
modeL
plot(test$medv,pr.nn_,col='red',main='Real vs predicted NN',pch=18,cex=0.7)
points(test$medv,pr.lm,col='blue',pch=18,cex=0.7)
abline(0,1,lwd=2)
legend('bottomright',legend=c('NN','LM'),pch=18,col=c('red','blue'))
A perhaps more useful visual comparison is plotted ABOVE: # A
(fast) cross validation Cross validation is another very important step
of building predictive models. While there are different kind of cross
validation methods, the basic idea is repeating the following process a
number of time:
We are going to implement a fast cross validation using a for loop for the neural network and the cv.glm() function in the boot package for the linear model. As far as I know, there is no built-in function in R to perform cross-validation on this kind of neural network, if you do know such a function, please let me know in the comments. Here is the 10 fold cross-validated MSE for the linear model:
library(boot)
set.seed(200)
lm.fit <- glm(medv~.,data=data)
cv.glm(data,lm.fit,K=10)$delta[1]
## [1] 23.17094
set.seed(450)
cv.error <- NULL
k <- 10
library(plyr)
pbar <- create_progress_bar('text')
pbar$init(k)
##
|
| | 0%
for(i in 1:k){
index <- sample(1:nrow(data),round(0.9*nrow(data)))
train.cv <- scaled[index,]
test.cv <- scaled[-index,]
nn <- neuralnet(f,data=train.cv,hidden=c(5,2),linear.output=T)
pr.nn <- compute(nn,test.cv[,1:13])
pr.nn <- pr.nn$net.result*(max(data$medv)-min(data$medv))+min(data$medv)
test.cv.r <- (test.cv$medv)*(max(data$medv)-min(data$medv))+min(data$medv)
cv.error[i] <- sum((test.cv.r - pr.nn)^2)/nrow(test.cv)
pbar$step()
}
##
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
mean(cv.error)
## [1] 7.641292
cv.error
## [1] 13.331937 7.099840 6.580337 5.697609 6.841745 5.771481 10.751406
## [8] 5.384253 9.452109 5.502201
boxplot(cv.error,xlab='MSE CV',col='cyan',
border='blue',names='CV error (MSE)',
main='CV error (MSE) for NN',horizontal=TRUE)
As you can see, the average MSE for the neural network (10.33) is lower than the one of the linear model although there seems to be a certain degree of variation in the MSEs of the cross validation. This may depend on the splitting of the data or the random initialization of the weights in the net. By running the simulation different times with different seeds you can get a more precise point estimate for the average MSE.
Neural networks resemble black boxes a lot: explaining their outcome is much more difficult than explaining the outcome of simpler model such as a linear model. Therefore, depending on the kind of application you need, you might want to take into account this factor too. Furthermore, as you have seen above, extra care is needed to fit a neural network and small changes can lead to different results.
A gist with the full code for this post can be found here.
Thank you for reading this post, leave a comment below if you have any question.