\(\newcommand{\ci}{\perp\!\!\!\perp}\) \(\newcommand{\prob}[1]{\operatorname{P}\left(#1\right)}\) \(\newcommand{\Var}[1]{\operatorname{Var}\left(#1\right)}\) \(\newcommand{\E}[1]{\operatorname{E}\left(#1\right)}\) \(\newcommand{\Cov}[1]{\operatorname{Cov}\left(#1\right)}\) \(\newcommand{\Cor}[1]{\operatorname{Corr}\left(#1\right)}\) \(\newcommand{\sd}[1]{\operatorname{sd}\left(#1\right)}\) \(\newcommand{\prob}[1]{\operatorname{P}\left(#1\right)}\) \(\DeclareMathOperator*{\minf}{minimize \quad}\) \(\newcommand{\mininlineeq}[3]{\mbox{minimize}_{#1}\quad#2\qquad\mbox{subject to }#3}\) \(\newcommand{\mininline}[2]{\mbox{minimize}_{#1}\quad#2}\)
We will demonstrate several more advanced predictive models using Boston housing dataset. This price dataset contains the factors that contribute to the median value of owner-occupied homes in $1000’s.
By constructing a regression model we can quantify the effects of the following measure factors (excluding MEDV as the response):
variable | description |
---|---|
CRIM | per capita crime rate by town |
ZN | proportion of residential land zoned for lots over 25,000 sq.ft |
INDUS | proportion of non-retail business acres per town |
CHAS | Charles River dummy variable (1 if tract bounds river; else 0) |
NOX | nitric oxides concentration (parts per 10 million) |
RM | average number of rooms per dwelling |
AGE | proportion of owner-occupied units built prior to 1940 |
DIS | weighted distances to five Boston employment centres |
RAD | index of accessibility to radial highways |
TAX | full-value property-tax rate per $10,000 |
PTRATIO | pupil-teacher ratio by town |
B | \(1000(\mbox{Bk} - 0.63)^2\) where Bk is the proportion of blacks by town |
LSTAT | % lower status of the population |
MEDV | Median value of owner-occupied homes in $1000’s |
knitr::kable(Boston[1:5,1:7])
crim | zn | indus | chas | nox | rm | age |
---|---|---|---|---|---|---|
0.00632 | 18 | 2.31 | 0 | 0.538 | 6.575 | 65.2 |
0.02731 | 0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 |
0.02729 | 0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 |
0.03237 | 0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 |
0.06905 | 0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 |
dim(Boston)
## [1] 506 14
First, we split data into training and test subsets
test_index = sample(1:nrow(Boston), size = as.integer(0.2*nrow(Boston)))
testing = Boston[test_index,]
training = Boston[-test_index,]
nrow(training)
## [1] 405
Let’s look at the correlation of each independent variable with the dependent variable and near zero variance
knitr::kable(cor(Boston,Boston$medv))
crim | -0.3883046 |
zn | 0.3604453 |
indus | -0.4837252 |
chas | 0.1752602 |
nox | -0.4273208 |
rm | 0.6953599 |
age | -0.3769546 |
dis | 0.2499287 |
rad | -0.3816262 |
tax | -0.4685359 |
ptratio | -0.5077867 |
black | 0.3334608 |
lstat | -0.7376627 |
medv | 1.0000000 |
knitr::kable(t(summarise_if(Boston,is.numeric, sd)/summarise_if(Boston,is.numeric, mean)))
crim | 2.3803761 |
zn | 2.0523759 |
indus | 0.6160087 |
chas | 3.6720281 |
nox | 0.2089034 |
rm | 0.1117992 |
age | 0.4104834 |
dis | 0.5548581 |
rad | 0.9118115 |
tax | 0.4128412 |
ptratio | 0.1173060 |
black | 0.2559616 |
lstat | 0.5643741 |
medv | 0.4081651 |
par(pch=21)
plot(Boston[1:100,c("medv","lstat","rm","age", "crim")], cex=0.5, bg="light blue")
We first try to regress lstat
on medv
.
fit.lm <- lm(medv~lstat,data = training)
#predict on test set
pred.lm <- predict(fit.lm, newdata = testing)
plot(testing$medv,pred.lm, pch=16); abline(a=0,b=1)
We will use RMSE and MAPE to measure predictive perofrmance of our models
metrics = function(yhat,y) {
n = length(y)
rmse = sqrt(sum((yhat - y)^2)/n)
mape = mean(abs((yhat - y)/y))
r2 = 1 - sum((y-yhat)^2)/sum((y-mean(y))^2)
return(c(rmse=rmse,mape=mape,R2=r2))
}
Let’s calculate those metrics for the linear model
fit.lm.metrics = metrics(pred.lm,testing$medv)
knitr::kable(t(fit.lm.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
7.08 | 0.23 | 0.51 |
Now, let’s look at the relations when medv
is on log
scale and when both medv
and lstat
are on the
log scale
par(mfrow=c(3,1), mar=c(4,4,0,0), bty='n')
plot(medv~lstat, data=Boston, pch=16)
plot(log(medv)~lstat, data=Boston, pch=16)
plot(log(medv)~log(lstat), data=Boston, pch=16)
Seems like log tranforms might help. Let’s try it in the model
fit.lm.log <- lm(log(medv)~log(lstat),data = training)
#predict on test set
pred.lm <- exp(predict(fit.lm.log, newdata = testing))
par(mfrow=c(1,1))
plot(testing$medv,pred.lm, pch=16); abline(a=0,b=1)
fit.lm.log.metrics = metrics(pred.lm,testing$medv)
knitr::kable(t(fit.lm.log.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
6.19 | 0.21 | 0.62 |
What if we only pu medv
on the log-scale
#Try linear model using all features
fit.lm.log1 <- lm(log(medv)~lstat,data = training)
#predict on test set
pred.lm <- exp(predict(fit.lm.log1 , newdata = testing))
plot(testing$medv,pred.lm, pch=16); abline(a=0,b=1)
fit.lm.log1.metrics = metrics(pred.lm,testing$medv)
knitr::kable(t(fit.lm.log1.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
6.77 | 0.2 | 0.55 |
Try simple linear model using selected features
fit.lm2 <- lm(formula = log(medv) ~ crim + chas + nox + rm + dis + ptratio +
rad + black + log(lstat), data = training)
pred.lm2 <- exp(predict(fit.lm2, newdata = testing))
# Plot of predicted price vs actual price
plot(pred.lm2,testing$medv, xlab = "Predicted Price", ylab = "Actual Price",pch=21,bg="grey")
abline(a=0,b=1)
fit.lm2.metrics = metrics(pred.lm2,testing$medv)
knitr::kable(t(fit.lm2.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
5.34 | 0.17 | 0.72 |
# Diagnostics plots
layout(matrix(c(1,2,3,4),2,2))
plot(fit.lm2)
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-8
# Create grid of lambda values
log_lambda_grid <- seq(2.5, -2, length=100) # In reverse
lambda_grid <- 10^log_lambda_grid
# Create 100 ridge models
ridge_models <- glmnet(testing[,-14], log(testing$medv), alpha=0, lambda=lambda_grid)
# Get coefficients of all 100 models
ridge_coef <- coef(ridge_models)
# Display coefficients for 6 models.
# Ridge Models are displayed in decreasing order of lambda.
knitr::kable(round(as.matrix(ridge_coef[, c(1:3, 98:100)], 6)))
s0 | s1 | s2 | s97 | s98 | s99 | |
---|---|---|---|---|---|---|
(Intercept) | 3 | 3 | 3 | 4 | 4 | 4 |
crim | 0 | 0 | 0 | 0 | 0 | 0 |
zn | 0 | 0 | 0 | 0 | 0 | 0 |
indus | 0 | 0 | 0 | 0 | 0 | 0 |
chas | 0 | 0 | 0 | 0 | 0 | 0 |
nox | 0 | 0 | 0 | -1 | -1 | -1 |
rm | 0 | 0 | 0 | 0 | 0 | 0 |
age | 0 | 0 | 0 | 0 | 0 | 0 |
dis | 0 | 0 | 0 | 0 | 0 | 0 |
rad | 0 | 0 | 0 | 0 | 0 | 0 |
tax | 0 | 0 | 0 | 0 | 0 | 0 |
ptratio | 0 | 0 | 0 | 0 | 0 | 0 |
black | 0 | 0 | 0 | 0 | 0 | 0 |
lstat | 0 | 0 | 0 | 0 | 0 | 0 |
Now we plot the relations between the \(\lambda\) and the \(\beta\)s
library(ggplot2)
m = as.matrix(ridge_coef[-1,])
sa <- stack(as.data.frame(t(m)))
sa$log_lambda <- log_lambda_grid
qplot(log_lambda, values, data = sa, group = ind, colour = ind, geom = "line")
We now use cross-valudation to pick the best model, the one that has the lowest out-of-sample RMSE
test_pred_ridge <- predict(ridge_models, as.matrix(testing[,-14]))
RMSE <- colMeans((exp(test_pred_ridge) - testing$medv)^2)
par(mar=c(4,4,0,0),bty='n')
plot(log_lambda_grid, RMSE, xlab="ln(lambda)", ylab="RMSE", type='l', lwd=2)
fit.ridge.metrics = metrics(exp(test_pred_ridge[,which.min(RMSE)]),testing$medv)
log_lambda_grid <- seq(1.5, -2, length=100) # In reverse
lambda_grid <- 10^log_lambda_grid
lasso_models <- glmnet(testing[,-14], log(testing$medv), alpha=1, lambda=lambda_grid)
# Get coefficients of all 100 models
lasso_coef <- coef(lasso_models)
# Display coefficients for 6 models.
# lasso Models are displayed in decreasing order of lambda.
knitr::kable(round(as.matrix(lasso_coef[, c(1:3, 98:100)], 6)))
s0 | s1 | s2 | s97 | s98 | s99 | |
---|---|---|---|---|---|---|
(Intercept) | 3 | 3 | 3 | 4 | 4 | 4 |
crim | 0 | 0 | 0 | 0 | 0 | 0 |
zn | 0 | 0 | 0 | 0 | 0 | 0 |
indus | 0 | 0 | 0 | 0 | 0 | 0 |
chas | 0 | 0 | 0 | 0 | 0 | 0 |
nox | 0 | 0 | 0 | 0 | 0 | 0 |
rm | 0 | 0 | 0 | 0 | 0 | 0 |
age | 0 | 0 | 0 | 0 | 0 | 0 |
dis | 0 | 0 | 0 | 0 | 0 | 0 |
rad | 0 | 0 | 0 | 0 | 0 | 0 |
tax | 0 | 0 | 0 | 0 | 0 | 0 |
ptratio | 0 | 0 | 0 | 0 | 0 | 0 |
black | 0 | 0 | 0 | 0 | 0 | 0 |
lstat | 0 | 0 | 0 | 0 | 0 | 0 |
Now we plot the relations between the \(\lambda\) and the \(\beta\)s
library(ggplot2)
m = as.matrix(lasso_coef[-1,])
sa <- stack(as.data.frame(t(m)))
sa$log_lambda <- log_lambda_grid
qplot(log_lambda, values, data = sa, group = ind, colour = ind, geom = "line")
We now use cross-valudation to pick the best model, the one that has the lowest out-of-sample RMSE
test_pred_lasso <- predict(lasso_models, as.matrix(testing[,-14]))
RMSE <- colMeans((exp(test_pred_lasso) - testing$medv)^2)
par(mar=c(4,4,0,0),bty='n')
plot(log_lambda_grid, RMSE, xlab="ln(lambda)", ylab="RMSE", type='l', lwd=2)
fit.lasso.metrics = metrics(exp(test_pred_lasso[,which.min(RMSE)]),testing$medv)
library(kknn) ## knn library
attach(Boston)
## The following objects are masked from Boston (pos = 6):
##
## age, black, chas, crim, dis, indus, lstat, medv, nox, ptratio, rad,
## rm, tax, zn
ind = order(testing$lstat)
RMSE = NULL
kk = c(2,10,20,50,100,300)
par(mfrow=c(3,2), mar=c(3,3,2,0), bty='n')
for(i in kk){
near = kknn(medv~lstat,training,testing,k=i,kernel = "rectangular")
aux = sqrt(mean((testing$medv-near$fitted)^2))
RMSE = c(RMSE,aux)
plot(testing$lstat,testing$medv,main=paste("k=",i),pch=21,bg="grey",cex=0.8,col="darkgray")
lines(testing$lstat[ind],near$fitted[ind],col=2,lwd=2)
}
Now let’s plot the relation between the number of neighbors and model performance (out-of-sample)
par(mfrow=c(1,1), mar=c(3,3,2,0), bty='n')
plot(log(1/kk),RMSE,type="b",xlab="Complexity(log(1/k))",col="blue",ylab="RMSE",lwd=2,cex.lab=1.2,pch=21,bg="grey")
First big tree size
big.tree = tree(medv~lstat,data=training,mindev=.0001)
print(length(unique(big.tree$where)))
## [1] 60
par(mfrow=c(1,1))
plot(big.tree,type="uniform")
text(big.tree,col="blue",label=c("yval"),cex=.8)
Let’s see how well we did on the out-of-sample data
boston.fit = predict(big.tree,testing) #get training fitted values
tree.metrics = metrics(boston.fit,testing$medv)
knitr::kable(t(tree.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
6.89 | 0.24 | 0.53 |
Not so bad, comparable to LM. Do we overfit? Let’s see if a smaller tree will do any better.
boston.tree=prune.tree(big.tree,best=7)
cat('pruned tree size: \n')
## pruned tree size:
print(length(unique(boston.tree$where)))
## [1] 7
#plot the tree
plot(boston.tree,type="uniform")
text(boston.tree,col="blue",label=c("yval"),cex=.8)
#plot the tree
plot(boston.tree,type="uniform")
text(boston.tree,col="blue",label=c("yval"),cex=.8)
boston.fit = predict(boston.tree,testing) #get training fitted values
tree.metrics1 = metrics(boston.fit,testing$medv)
knitr::kable(t(tree.metrics1), digits = 2)
rmse | mape | R2 |
---|---|---|
6.32 | 0.22 | 0.61 |
We have a simpler model (less leafes) that predicts better. The larger model did overfit! Let’s plit our predictions
plot(testing$lstat,testing$medv,cex=.5,pch=16) #plot data
oo=order(testing$lstat)
lines(testing$lstat[oo],boston.fit[oo],col='red',lwd=3) #step function fit
cvals=c(9.725,4.65,3.325,5.495,16.085,19.9) #cutpoints from tree
for(i in 1:length(cvals)) abline(v=cvals[i],col='magenta',lty=2) #cutpoints
Let’s Fit a regression tree to mev~dis+lstat
. Now our we
have a two dimensional input and the partitions will be rectangles
(possible open) The tree is plotted as well as the corresponding
partition of the two-dimensional. Again, let’s start with a big tree
big.tree = tree(medv~lstat+dis,training,mindev=.0001)
cat('first big tree size: \n')
## first big tree size:
print(length(unique(big.tree$where)))
## [1] 63
Then prune it down to one with 7 leaves
boston.tree=prune.tree(big.tree,best=7)
cat('pruned tree size: \n')
## pruned tree size:
print(length(unique(boston.tree$where)))
## [1] 7
qplot(lstat,dis,data=Boston, colour = medv) +
scale_color_gradient(low="blue", high="red")
Plot tree and partition in x.
par(mfrow=c(1,2))
plot(boston.tree,type="u")
text(boston.tree,col="blue",label=c("yval"),cex=.8)
partition.tree(boston.tree)
lines(training$lstat,training$dis, type='p', pch=16, cex=0.5, col=as.integer(training$medv))
pv=seq(from=.01,to=.99,by=.05)
x1q = quantile(testing$lstat,probs=pv)
x2q = quantile(testing$dis,probs=pv)
xx = expand.grid(x1q,x2q) #matrix with two columns using all combinations of x1q and x2q
dfpred = data.frame(dis=xx[,2],lstat=xx[,1])
lmedpred = predict(boston.tree,dfpred)
#make perspective plot
par(mfrow=c(1,1))
persp(x1q,x2q,matrix(lmedpred,ncol=length(x2q),byrow=T),
theta=150,xlab='dis',ylab='lstat',zlab='medv',
zlim=c(min(training$medv),1.1*max(training$medv)))
tree.boston=tree(medv~.,training,mindev=.01)
cv.boston=cv.tree(tree.boston, );
plot(cv.boston$size,cv.boston$dev, type='l')
prune.boston=prune.tree(tree.boston,best=6)
#summary(prune.boston)
#cv.tree(tree.boston,,prune.tree)$dev
plot(prune.boston)
text(prune.boston,pretty=0)
boston.fit = predict(prune.boston,testing) #get training fitted values
par(mfrow=c(1,1))
plot(testing$medv,boston.fit, pch=16)
tree1.metrics = metrics(boston.fit,testing$medv)
knitr::kable(t(tree1.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
6.57 | 0.21 | 0.58 |
The best model thus far!
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
## The following object is masked from 'package:dplyr':
##
## combine
set.seed(1)
bag.boston=randomForest(medv~.,data=training,mtry=13,importance=TRUE)
bag.boston
##
## Call:
## randomForest(formula = medv ~ ., data = training, mtry = 13, importance = TRUE)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 13
##
## Mean of squared residuals: 11.84956
## % Var explained: 85.19
yhat.bag = predict(bag.boston,newdata=testing)
par(mfrow=c(1,1))
plot(yhat.bag, testing$medv)
abline(0,1)
sqrt(mean((yhat.bag-testing$medv)^2))
## [1] 4.193147
bag.boston=randomForest(medv~.,data=training,mtry=13,ntree=25)
yhat.bag = predict(bag.boston,newdata=testing)
sqrt(mean((yhat.bag-testing$medv)^2))
## [1] 4.400957
rf.boston=randomForest(medv~.,data=training,mtry=6,importance=TRUE)
yhat.rf = predict(rf.boston,newdata=testing)
rf.metrics = metrics(yhat.rf,testing$medv)
knitr::kable(t(rf.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
4.05 | 0.13 | 0.84 |
importance(rf.boston)
## %IncMSE IncNodePurity
## crim 15.523032 1605.74043
## zn 2.910491 99.40062
## indus 11.228386 1999.87201
## chas 3.390930 120.06602
## nox 15.413441 1434.09818
## rm 40.547099 11570.61633
## age 11.716186 738.76790
## dis 16.588296 1413.14748
## rad 5.199374 133.21436
## tax 12.343244 828.34377
## ptratio 13.203003 1298.15690
## black 10.704904 444.85923
## lstat 33.013070 10102.85634
par(mfrow=c(1,1))
varImpPlot(rf.boston,pch=21,bg="grey")
#-------------------------------
library(gbm)
## Loaded gbm 2.1.8.1
set.seed(1)
boost.boston=gbm(medv~.,data=training,distribution="gaussian",n.trees=5000,interaction.depth=4)
summary(boost.boston)
## var rel.inf
## rm rm 40.9286195
## lstat lstat 28.2010295
## nox nox 7.0059821
## dis dis 6.8971384
## crim crim 4.4402223
## ptratio ptratio 3.1289338
## black black 3.0313266
## age age 2.7663871
## tax tax 1.4889469
## chas chas 0.9745309
## indus indus 0.6928422
## rad rad 0.2995614
## zn zn 0.1444793
par(mfrow=c(1,2))
plot(boost.boston,i="rm")
plot(boost.boston,i="lstat")
yhat.boost=predict(boost.boston,newdata=testing,n.trees=5000)
gbm.metrics = metrics(yhat.boost,testing$medv)
knitr::kable(t(gbm.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
3.39 | 0.13 | 0.83 |
boost.boston=gbm(medv~.,data=training,distribution="gaussian",n.trees=5000,interaction.depth=4,shrinkage=0.2,verbose=F)
yhat.boost=predict(boost.boston,newdata=testing,n.trees=5000)
gbm1.metrics = metrics(yhat.boost,testing$medv)
knitr::kable(t(gbm1.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
3.43 | 0.13 | 0.82 |
dat=Boston
dat$medv=1*(dat$medv>25)+0*(dat$medv<=25)
dat$medv=as.factor(dat$medv)
#--------------------------------
library(e1071)
mod_svm=svm(medv~.,dat)
summary(mod_svm)
##
## Call:
## svm(formula = medv ~ ., data = dat)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 159
##
## ( 86 73 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
par(mfrow=c(1,1))
plot(mod_svm,data=dat,rm~lstat,pch=21,bg="grey")
mod_logit=glm(medv~.,data=dat,family=binomial())
summary(mod_logit)
##
## Call:
## glm(formula = medv ~ ., family = binomial(), data = dat)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.312511 4.876070 1.090 0.275930
## crim -0.011101 0.045322 -0.245 0.806503
## zn 0.010917 0.010834 1.008 0.313626
## indus -0.110452 0.058740 -1.880 0.060060 .
## chas 0.966337 0.808960 1.195 0.232266
## nox -6.844521 4.483514 -1.527 0.126861
## rm 1.886872 0.452692 4.168 3.07e-05 ***
## age 0.003491 0.011133 0.314 0.753853
## dis -0.589016 0.164013 -3.591 0.000329 ***
## rad 0.318042 0.082623 3.849 0.000118 ***
## tax -0.010826 0.004036 -2.682 0.007314 **
## ptratio -0.353017 0.122259 -2.887 0.003884 **
## black -0.002264 0.003826 -0.592 0.554105
## lstat -0.367355 0.073020 -5.031 4.88e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 563.52 on 505 degrees of freedom
## Residual deviance: 209.11 on 492 degrees of freedom
## AIC: 237.11
##
## Number of Fisher Scoring iterations: 7
# diagnostic plot
par(mfrow=c(2,2), mar=c(3,3,0,0), bty='n')
plot(mod_logit,pch=21,bg="grey")
dat2=Boston
dat2$medv=0*(dat2$medv<=17)+1*(dat2$medv<=25 & dat2$medv>17)+2*(dat2$medv>25)
dat2$medv=as.factor(dat2$medv)
library(nnet)
mod_multinom=multinom(medv~.,data=dat2)
## # weights: 45 (28 variable)
## initial value 555.897818
## iter 10 value 338.289401
## iter 20 value 309.716007
## iter 30 value 211.736278
## iter 40 value 203.333103
## iter 50 value 202.890685
## iter 60 value 202.858211
## final value 202.858175
## converged
summary(mod_multinom)
## Call:
## multinom(formula = medv ~ ., data = dat2)
##
## Coefficients:
## (Intercept) crim zn indus chas nox rm
## 1 24.19217 -0.2231462 0.04670231 0.10160200 0.343424 -12.91234 -0.1602033
## 2 29.51905 -0.0643685 0.05704767 -0.01761023 1.311033 -18.77936 1.7631714
## age dis rad tax ptratio black
## 1 -0.03926022 -0.6648017 0.1826890 -0.006564069 -0.3770017 0.005904329
## 2 -0.03695548 -1.2169084 0.4612001 -0.017781470 -0.7175312 0.001405510
## lstat
## 1 -0.2409420
## 2 -0.5899765
##
## Std. Errors:
## (Intercept) crim zn indus chas nox rm
## 1 0.09595305 0.07342447 0.04186604 0.05732693 0.7108412 0.17673798 0.3344770
## 2 0.02845394 0.06391464 0.04316645 0.08044716 1.0467007 0.01608296 0.4633493
## age dis rad tax ptratio black lstat
## 1 0.01544144 0.2200741 0.05386458 0.003231373 0.09660977 0.002003785 0.04387560
## 2 0.01846981 0.2649990 0.08985403 0.005039059 0.13750043 0.004055685 0.08066911
##
## Residual Deviance: 405.7164
## AIC: 461.7164
par(mfrow=c(1,1))
pred=predict(mod_multinom,dat2)
plot(dat2$medv,pred,xlab="true class",ylab="predicted class")
library(keras)
model <- keras_model_sequential() %>%
layer_dense(units = 64, activation = "relu", input_shape = 13) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(units = 1)
summary(model)
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_3 (Dense) (None, 64) 896
## dense_2 (Dense) (None, 64) 4160
## dense_1 (Dense) (None, 32) 2080
## dense (Dense) (None, 1) 33
## ================================================================================
## Total params: 7,169
## Trainable params: 7,169
## Non-trainable params: 0
## ________________________________________________________________________________
model %>% compile(loss = "mse",optimizer = "RMSprop")
Now let’s train the model
history = model %>% fit(x = as.matrix(scale(training[,1:13])), y = training$medv,verbose = 2, epochs=20, validation_data = list(as.matrix(scale(testing[,1:13])),testing$medv))
yhat = model %>% predict( as.matrix(scale(testing[,1:13])))
plot(history)
plot(testing$medv,yhat, pch=16, cex=0.5)
nn.metrics = metrics(yhat,testing$medv)
knitr::kable(t(nn.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
5.3 | 0.2 | 0.57 |
library(pls)
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
library(randomForest)
N = nrow(Boston)
# train PLS
plsr_fit <-plsr(medv~., data=Boston, scale=T, validation="CV")
summary(plsr_fit)
## Data: X dimension: 506 13
## Y dimension: 506 1
## Fit method: kernelpls
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 9.206 6.534 5.074 4.989 4.922 4.880 4.852
## adjCV 9.206 6.532 5.069 4.981 4.912 4.869 4.844
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 4.843 4.842 4.841 4.841 4.842 4.842 4.842
## adjCV 4.835 4.833 4.832 4.832 4.833 4.833 4.833
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 45.86 56.78 63.91 69.70 75.47 78.72 81.95 84.62
## medv 49.93 70.64 72.30 73.29 73.77 73.92 74.01 74.06
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 90.16 92.40 96.22 98.16 100.00
## medv 74.06 74.06 74.06 74.06 74.06
# use 7 components
plsr_7 <-plsr(formula = medv ~ ., ncomp = 7, data=Boston, scale = T)
PLS_Score <- scores(plsr_7)
# random forest
rf <- randomForest(x=PLS_Score, y=Boston$medv, importance=TRUE, proximity=TRUE)
# goodness of fit
pls_yhat = predict(plsr_7,Boston,ncomp=7)
rf_yhat = rf$predicted
v = mean((Boston$medv - mean(Boston$medv))^2)
pls.metrics = metrics(pls_yhat,Boston$medv); knitr::kable(t(pls.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
4.68 | 0.16 | 0.74 |
plsrf.metrics = metrics(rf_yhat,Boston$medv); knitr::kable(t(plsrf.metrics), digits = 2)
rmse | mape | R2 |
---|---|---|
4.05 | 0.14 | 0.81 |
par(mfrow=c(1,2))
plot(Boston$medv, pls_yhat,
xlab='y', ylab = 'yhat',
main=paste("PLS Fit, ", "R-squared =", round(pls.metrics[3],2)),
pch=21,bg="lightblue",
xlim=c(-5,50),ylim=c(-5,50))
abline(0,1,col="red",lwd=2)
plot(Boston$medv, rf_yhat,
xlab='y', ylab = 'yhat',
main=paste("PLS+RF Fit, ", "R-squared =", round(plsrf.metrics[3],2)),
pch=21,bg="lightblue",
xlim=c(-5,50),ylim=c(-5,50))
abline(0,1,col="red",lwd=2)
rmse | mape | R2 | |
---|---|---|---|
simple lm | 7.08 | 0.23 | 0.51 |
simple log-log lm | 6.19 | 0.21 | 0.62 |
simple log lm | 6.77 | 0.20 | 0.55 |
multiple lm | 5.34 | 0.17 | 0.72 |
ridge | 5.60 | 0.16 | 0.69 |
lasso | 5.72 | 0.16 | 0.68 |
tree model | 6.89 | 0.24 | 0.53 |
simpler tree model | 6.32 | 0.22 | 0.61 |
random forest | 4.05 | 0.13 | 0.84 |
Boosted tree | 3.39 | 0.13 | 0.83 |
simplee Boosted tree | 3.43 | 0.13 | 0.82 |
neural net | 5.30 | 0.20 | 0.57 |
PLS | 4.68 | 0.16 | 0.74 |
PLS RF | 4.05 | 0.14 | 0.81 |