We will use the popular Boston dataset from the MASS package, which describes some features for houses in Boston in 1978.
flextable::flextable(definition_df,cwidth = c(2.5,4.5))
Name | Description |
CRIM | per capita crime rate by town |
ZN | proportion of residential land zoned for lots over 25,000 sq.ft. |
INDUS | proportion of non-retail business acres per town. |
CHAS | Charles River dummy variable (1 if tract bounds river; 0 otherwise) |
NOX | nitric oxides concentration (parts per 10 million) |
RM | average number of rooms per dwelling |
AGE | proportion of owner-occupied units built prior to 1940 |
DIS | weighted distances to five Boston employment centres |
RAD | index of accessibility to radial highways |
TAX | full-value property-tax rate per 10,000 dollars |
PTRATIO | pupil-teacher ratio by town |
B | 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town |
LSTAT | % lower status of the population |
MEDV | Median value of owner-occupied homes in $1000's |
We will be trying to predict the Median Value MEDV
library(MASS)
set.seed(101)
data <- Boston
str(data)
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ black : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
summary(data)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
head(data)
any(is.na(data))
## [1] FALSE
First you’ll need to install the neural net library:
library(neuralnet)
As a first step, we are going to address data preprocessing. It is good practice to normalize your data before training a neural network. Depending on your dataset, avoiding normalization may lead to useless results or to a very difficult training process (most of the times the algorithm will not converge before the number of maximum iterations allowed). You can choose different methods to scale the data (z-normalization, min-max scale, etc…). Usually scaling in the intervals [0,1] or [-1,1] tends to give better results. We therefore scale and split the data before moving on:
maxs <- apply(data, 2, max)
mins <- apply(data, 2, min)
maxs
## crim zn indus chas nox rm age dis
## 88.9762 100.0000 27.7400 1.0000 0.8710 8.7800 100.0000 12.1265
## rad tax ptratio black lstat medv
## 24.0000 711.0000 22.0000 396.9000 37.9700 50.0000
mins
## crim zn indus chas nox rm age dis
## 0.00632 0.00000 0.46000 0.00000 0.38500 3.56100 2.90000 1.12960
## rad tax ptratio black lstat medv
## 1.00000 187.00000 12.60000 0.32000 1.73000 5.00000
scaled <- as.data.frame(scale(data, center = mins, scale = maxs - mins))
head(scaled)
Now with our standardized data, let’s split it:
library(caTools)
split = sample.split(scaled$medv, SplitRatio = 0.70)
train = subset(scaled, split == TRUE)
test = subset(scaled, split == FALSE)
# Call package
library(neuralnet)
nn <- neuralnet(medv ~ crim + zn + indus + chas + nox + rm + age + dis + rad +
tax + ptratio + black + lstat
,data=train,hidden=c(5,3),linear.output=TRUE)
You can plot out your model to see a very neat visualization with the weights on each connection.
The black lines show the connections between each layer and the weights on each connection while the blue lines show the bias term added in each step. The bias can be thought as the intercept of a linear model. The net is essentially a black box so we cannot say that much about the fitting, the weights and the model. Suffice to say that the training algorithm has converged and therefore the model is ready to be used.
plot(nn)
Now we can try to predict the values for the test set and calculate the MSE. Remember that the net will output a normalized prediction, so we need to scale it back in order to make a meaningful comparison (or just a simple prediction).
# Compute Predictions off Test Set
predicted.nn.values <- compute(nn,test[1:13])
# Its a list returned
str(predicted.nn.values)
## List of 2
## $ neurons :List of 3
## ..$ : num [1:139, 1:14] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:139] "1" "2" "8" "9" ...
## .. .. ..$ : chr [1:14] "" "crim" "zn" "indus" ...
## ..$ : num [1:139, 1:6] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:139] "1" "2" "8" "9" ...
## .. .. ..$ : NULL
## ..$ : num [1:139, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:139] "1" "2" "8" "9" ...
## .. .. ..$ : NULL
## $ net.result: num [1:139, 1] 0.512 0.411 0.324 0.209 0.334 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:139] "1" "2" "8" "9" ...
## .. ..$ : NULL
# Convert back to non-scaled predictions
true.predictions <- predicted.nn.values$net.result*(max(data$medv)-min(data$medv))+min(data$medv)
# Convert the test data
test.r <- (test$medv)*(max(data$medv)-min(data$medv))+min(data$medv)
# Check the Mean Squared Error
MSE.nn <- sum((test.r - true.predictions)^2)/nrow(test)
MSE.nn
## [1] 15.1656
error.df <- data.frame(test.r,true.predictions)
head(error.df)
library(ggplot2)
ggplot(error.df,aes(x=test.r,y=true.predictions)) + geom_point() + stat_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'