11.3 Car Sales. Consider the data on used cars (ToyotaCorolla.csv) with 1436 records and details on 38 attributes, including Price, Age, KM, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications.
Toyotacorolla <- read.csv("~/Desktop/ToyotaCorolla.csv")
• Use predictors Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quarterly_Tax, Mfr_Guarantee, Guarantee_Period, Airco, Automatic_airco, CD_Player, Powered_Windows, Sport_Model, and Tow_Bar.
Toyotacorolla1 <- Toyotacorolla[,c(
"Price", "Age_08_04", "KM", "Fuel_Type", "HP", "Automatic", "Doors", "Quarterly_Tax","Mfr_Guarantee", "Guarantee_Period","Airco", "Automatic_airco", "CD_Player", "Powered_Windows", "Sport_Model", "Tow_Bar"
)]
• Remember to first scale the numerical predictor and outcome variables to a 0–1 scale (use function preprocess() with method = “range”—see Chapter 7) and convert categorical predictors to dummies.
#Converting Categorical Predictor to dummy variable using library fast dummies
Toyotacorolla2 <- Toyotacorolla1 %>% dummy_cols(select_columns=c('Fuel_Type'))
#Removing original Fuel_Type and one of the dummy variables from the previous data frame
Toyotacorolla3 <- Toyotacorolla2 %>% select(c(-Fuel_Type, -Fuel_Type_CNG))
#Removing NA values
Toyotacorolla3[is.na(Toyotacorolla3)]<-0
#Let's preprocess the data by scaling the numerical variables to a 0-1 scale using method="range"
data_normalize <- preProcess(Toyotacorolla3, method=c("range"), na.remove=TRUE)
#The processed data is sent to predict() function to get the final normalized data using the min-max scaling method
data_normalize2 <- predict(data_normalize, as.data.frame(Toyotacorolla3))
Record the RMS error for the training data and the validation data. Repeat the process, changing the number of hidden layers and nodes to {single layer with 5 nodes}, {two layers, 5 nodes in each layer}.
#Let's split the data into training (80%) and validation (20%)
ind <- sample(2, nrow(data_normalize2), replace=TRUE, prob=c(0.8, 0.2))
tdata <- data_normalize2[ind==1, ] #ind==1 means the first sample
vdata<- data_normalize2[ind==2, ] #ind==2 means the second sample
A. Fitting a neural network model to the data using a single hidden layer with 2 nodes.
#Plotting the neural network for training data
nn <- neuralnet(data = tdata, Price ~., hidden=2)
plot(nn, rep="best")
#Calculating RMSE on training data
pred <- compute(nn, tdata)$net.result
error <- rmse(tdata[, "Price"], pred) #We use rmse() function from the Metrics package
error
[1] 0.03715224
#Plotting the neural network for validation data
nn <- neuralnet(data = vdata, Price ~., hidden=2)
plot(nn, rep="best")
#Calculating RMSE on the validation data
pred <- compute(nn, vdata)$net.result
error <- rmse(vdata[, "Price"], pred) #We use rmse() function from the Metrics package
error
[1] 0.03389838
B. Fitting a neural network model to the data using single layer with 5 nodes
#Plotting the neural network for training data
nn <- neuralnet(data = tdata, Price ~., hidden=c(5)) #5 nodes, 1 layers
plot(nn, rep="best")
#Calculating RMSE on training data
pred <- compute(nn, tdata)$net.result
error <- rmse(tdata[, "Price"], pred) #We use rmse() function from the Metrics package
error
[1] 0.03476291
#Plotting the neural network for validation data
nn <- neuralnet(data = vdata, Price ~., hidden=c(5)) # 5 nodes, 1 layers
plot(nn,rep="best")
#Calculating RMSE on validation data
pred <- compute(nn, vdata)$net.result
error <- rmse(vdata[, "Price"], pred) #We use rmse() function from the Metrics package
error
[1] 0.03047666
C. Fitting a neural network model to the data using two layers and 5 nodes in each layer
#Plotting the neural network for training data
nn <- neuralnet(data = tdata, Price ~., hidden=c(5,5)) #5 nodes, 2 layers
plot(nn, rep="best")
#Calculating RMSE on training data
pred <- compute(nn, tdata)$net.result
error <- rmse(tdata[, "Price"], pred) #We use rmse() function from the Metrics package
error
[1] 0.03378979
#Plotting the neural network for validation data
nn <- neuralnet(data = vdata, Price ~., hidden=c(5, 5)) # 5 nodes, 2 layers
plot(nn, rep="best")
#Calculating RMSE on training data
pred <- compute(nn, vdata)$net.result
error <- rmse(vdata[, "Price"], pred) #We use rmse() function from the Metrics package
error
[1] 0.03197282
We can see from the above outputs that the root mean square error for the training data decreases as we increase the number of layers and nodes.
We can see from the above outputs that the root mean square error for the validation data increases.
From the above results, we can conclude that 2 layers and 5 nodes in each layer are appropriate for this application.
11.4 Direct Mailing to Airline Customers. East-West Airlines has entered into a partnership with the wireless phone company Telcon to sell the latter’s service via direct mail. The file EastWestAirlinesNN.csv contains a subset of a data sample of who has already received a test offer. About 13% accepted. TABLE 11.8
EastWestAirlinesNN <- read.csv("~/Desktop/EastWestAirlinesNN.csv")
EastWestAirlinesNN = as.data.frame(EastWestAirlinesNN)
#Removing NA values. If we don't do
EastWestAirlinesNN[is.na(EastWestAirlinesNN)]<-0
You are asked to develop a model to classify East–West customers as to whether they purchase a wireless phone service contract (outcome variable Phone_Sale). This model will be used to classify additional customers.
#Let's split the data into training (75%) and validation set (25%).
#we use library(caTools) for data partitioning.
set.seed(123)
n=nrow(EastWestAirlinesNN)
train_index <- sample(nrow(EastWestAirlinesNN), n*0.75)
train <- EastWestAirlinesNN[train_index,]
valid <- EastWestAirlinesNN[-train_index,]
#let's normalize numerical data using preProcess() from the caret package for both training and test data
norm.values = caret::preProcess(train, method="range", na.remove=TRUE) #normalizing to [0,1]
#Using predict() to do normalization
train.norm <- predict(norm.values, train)
valid.norm<- predict(norm.values, valid)
train.norm0 = train.norm
valid.norm0 = valid.norm
train.norm$Phone_sale <- as.factor(train.norm$Phone_sale)
#Fitting a neural network
nn = neuralnet(Phone_sale ~ Topflight + Balance + Qual_miles + Bonus_miles+ Bonus_trans
+ Flight_miles_12mo,
data = train.norm,
hidden=5, linear.output = FALSE)
# Plotting the neural net
plot(nn, rep="best")
#Decile-wise lift chart for training set
#We need to use the "gains" package to compute the deciles
library(gains)
pred<- predict(nn, newdata= train.norm)
gain <- gains(train.norm0$Phone_sale, pred[,2])
barplot(gain$mean.resp/mean(train.norm0$Phone_sale), names.arg=gain$depth, xlab="percentile", ylab="Mean Response", main="Decile-wise lift chart for training set")
EastWestAirlinesNN$Phone_sale = as.factor(EastWestAirlinesNN$Phone_sale)
#Decile-wise lift chart for validation set
pred<- predict(nn, newdata= valid.norm)
gain <- gains(valid.norm$Phone_sale, pred[,2])
barplot(gain$mean.resp/mean(valid.norm$Phone_sale), names.arg=gain$depth, xlab="percentile", ylab="Mean Response", main="Decile-wise lift chart for validation set")
If we look at the decile-wise lift chart for training set, we can see that it has the staircase effect which means that a good staircase decile analysis is the one we can consider moving forward with. On the other hand, decile-wise lift chart for validation set has bars out of order.
We can see that the neural model with hidden nodes as 5 had a lower error compared to the model with hidden nodes as 1. The error shows how well our network is performing on the training set. And the model with hidden nodes as 5 is more efficient than the model with 1 node.
#Second neural network
nn = neuralnet(Phone_sale ~ Topflight + Balance + Qual_miles + Bonus_miles+ Bonus_trans
+ Flight_miles_12mo, data = train.norm,
hidden = 1, linear.output = FALSE)
plot(nn, rep="best")
In the above neural plot, we can see that the outcome variable is Phone_sale which is a categorical variable. We have input variables as Topflight, Balance, Qual miles, Bonus miles, Bonus trans, and flight miles. The raw input is processed and passed to the single hidden layer as a information. We can look at the result.matrix. It generates the error of the neural model along with the weights between the inputs, hidden layers, and outputs.Then we visualize the results by plotting generalized weights. We can see that the variance of generalized weight for covariate Balance is comparitively larger than others. Similarly, the generalized weight of covariate Topflight gather around zero which means that the covariate has no effect on the outcome status.
nn$result.matrix
[,1]
error 4.203926e+02
reached.threshold 9.966353e-03
steps 4.909000e+03
Intercept.to.1layhid1 -7.270141e-01
Topflight.to.1layhid1 1.690829e-02
Balance.to.1layhid1 3.447682e+00
Qual_miles.to.1layhid1 2.420007e-01
Bonus_miles.to.1layhid1 -1.655185e+00
Bonus_trans.to.1layhid1 -1.992288e+00
Flight_miles_12mo.to.1layhid1 3.075930e-01
Intercept.to.0 -5.805500e-01
1layhid1.to.0 8.792514e+00
Intercept.to.1 5.758246e-01
1layhid1.to.1 -8.774067e+00
#Let's visualize the results by plotting generalized weights. gwplot uses the calculated generalized weights provided by nn$generalized.weights
par(mfrow=c(2,2)) #setting graphical parameters
gwplot(nn, selected.covariate="Topflight", min=-2.5, max=5)
gwplot(nn, selected.covariate="Balance", min=-2.5, max=5)
gwplot(nn, selected.covariate="Qual_miles", min=-2.5, max=5)
gwplot(nn, selected.covariate="Bonus_miles", min=-2.5, max=5)
gwplot(nn, selected.covariate="Bonus_trans", min=-2.5, max=5)
gwplot(nn, selected.covariate="Flight_miles_12mo", min=-2.5, max=5)