Data about pollution measured by this station has been provided by Prof Ouarzazi in 2013. It accounts for Co, NO2, Wind Speed, Temperature, PM10, SO2, Solar Radiation and Ozone hourly based.
setwd("~/git/ouarzazi")
pkgs <- c('MASS', 'Cubist', 'caret','xtable','Peaks','magic','doMC','gbm',
'segmented','stringr','utils','stats','ztable','doParallel',
'signal','zoo','plyr','compare','mgcv','elasticnet','pbapply',
'e1071','nnet','zoo','kernlab','pls','parallel','foreach','GA',
'devtools','caretEnsemble','mlbench','mclust','analogue','cluster',
'randomForest','rpart','party','fRegression','polspline','VIF',
'iterators','ridge','mboost','earth','car')
lapply(pkgs, require, character.only = T)
## Loading required package: MASS
## Loading required package: Cubist
## Loading required package: lattice
## Loading required package: caret
## Loading required package: ggplot2
## Loading required package: xtable
## Loading required package: Peaks
## Loading required package: magic
## Loading required package: abind
## Loading required package: doMC
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
## Loading required package: gbm
## Loading required package: survival
##
## Attaching package: 'survival'
##
## The following object is masked from 'package:caret':
##
## cluster
##
## Loading required package: splines
## Loaded gbm 2.1.1
## Loading required package: segmented
## Loading required package: stringr
## Loading required package: ztable
## Welcome to package ztable ver 0.1.5
## Loading required package: doParallel
## Loading required package: signal
##
## Attaching package: 'signal'
##
## The following objects are masked from 'package:stats':
##
## filter, poly
##
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: plyr
## Loading required package: compare
##
## Attaching package: 'compare'
##
## The following object is masked from 'package:base':
##
## isTRUE
##
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-10. For overview type 'help("mgcv-package")'.
##
## Attaching package: 'mgcv'
##
## The following object is masked from 'package:magic':
##
## magic
##
## Loading required package: elasticnet
## Loading required package: lars
## Loaded lars 1.2
##
## Loading required package: pbapply
## Loading required package: e1071
## Loading required package: nnet
##
## Attaching package: 'nnet'
##
## The following object is masked from 'package:mgcv':
##
## multinom
##
## Loading required package: kernlab
##
## Attaching package: 'kernlab'
##
## The following object is masked from 'package:ggplot2':
##
## alpha
##
## Loading required package: pls
##
## Attaching package: 'pls'
##
## The following object is masked from 'package:caret':
##
## R2
##
## The following object is masked from 'package:stats':
##
## loadings
##
## Loading required package: GA
## Package 'GA' version 2.2
## Type 'citation("GA")' for citing this R package in publications.
## Loading required package: devtools
## Loading required package: caretEnsemble
## Warning: replacing previous import by 'grid::arrow' when loading
## 'caretEnsemble'
## Warning: replacing previous import by 'grid::unit' when loading
## 'caretEnsemble'
## Loading required package: mlbench
## Loading required package: mclust
## Package 'mclust' version 5.1
## Type 'citation("mclust")' for citing this R package in publications.
##
## Attaching package: 'mclust'
##
## The following object is masked from 'package:mgcv':
##
## mvn
##
## Loading required package: analogue
## Loading required package: vegan
## Loading required package: permute
##
## Attaching package: 'permute'
##
## The following object is masked from 'package:devtools':
##
## check
##
## The following object is masked from 'package:kernlab':
##
## how
##
## This is vegan 2.3-2
##
## Attaching package: 'vegan'
##
## The following object is masked from 'package:pls':
##
## scores
##
## The following object is masked from 'package:caret':
##
## tolerance
##
## This is analogue 0.16-3
##
## Attaching package: 'analogue'
##
## The following objects are masked from 'package:pls':
##
## crossval, pcr, RMSEP
##
## The following object is masked from 'package:compare':
##
## compare
##
## The following object is masked from 'package:plyr':
##
## join
##
## Loading required package: cluster
## Loading required package: randomForest
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
##
## The following object is masked from 'package:ggplot2':
##
## margin
##
## Loading required package: rpart
## Loading required package: party
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
##
## Attaching package: 'modeltools'
##
## The following object is masked from 'package:kernlab':
##
## prior
##
## The following object is masked from 'package:plyr':
##
## empty
##
## Loading required package: strucchange
## Loading required package: sandwich
##
## Attaching package: 'strucchange'
##
## The following object is masked from 'package:stringr':
##
## boundary
##
## Loading required package: fRegression
## Loading required package: timeDate
##
## Attaching package: 'timeDate'
##
## The following objects are masked from 'package:e1071':
##
## kurtosis, skewness
##
## The following object is masked from 'package:xtable':
##
## align
##
## Loading required package: timeSeries
##
## Attaching package: 'timeSeries'
##
## The following object is masked from 'package:randomForest':
##
## outlier
##
## The following object is masked from 'package:analogue':
##
## smoothSpline
##
## The following object is masked from 'package:zoo':
##
## time<-
##
## Loading required package: fBasics
##
##
## Rmetrics Package fBasics
## Analysing Markets and calculating Basic Statistics
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
##
## Attaching package: 'fBasics'
##
## The following object is masked from 'package:modeltools':
##
## getModel
##
## The following object is masked from 'package:signal':
##
## triang
##
## The following object is masked from 'package:ztable':
##
## tr
##
##
##
## Rmetrics Package fRegression
## Regression Based Decision and Prediction
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
## Loading required package: polspline
## Loading required package: VIF
## Loading required package: ridge
## Loading required package: mboost
## Loading required package: stabs
##
## Attaching package: 'stabs'
##
## The following object is masked from 'package:modeltools':
##
## parameters
##
## This is mboost 2.5-0. See 'package?mboost' and 'news(package = "mboost")'
## for a complete list of changes.
##
##
## Attaching package: 'mboost'
##
## The following object is masked from 'package:ggplot2':
##
## %+%
##
## Loading required package: earth
## Loading required package: plotmo
## Loading required package: plotrix
## Loading required package: TeachingDemos
## Loading required package: car
##
## Attaching package: 'car'
##
## The following object is masked from 'package:VIF':
##
## vif
##
## The following object is masked from 'package:fBasics':
##
## densityPlot
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] TRUE
##
## [[5]]
## [1] TRUE
##
## [[6]]
## [1] TRUE
##
## [[7]]
## [1] TRUE
##
## [[8]]
## [1] TRUE
##
## [[9]]
## [1] TRUE
##
## [[10]]
## [1] TRUE
##
## [[11]]
## [1] TRUE
##
## [[12]]
## [1] TRUE
##
## [[13]]
## [1] TRUE
##
## [[14]]
## [1] TRUE
##
## [[15]]
## [1] TRUE
##
## [[16]]
## [1] TRUE
##
## [[17]]
## [1] TRUE
##
## [[18]]
## [1] TRUE
##
## [[19]]
## [1] TRUE
##
## [[20]]
## [1] TRUE
##
## [[21]]
## [1] TRUE
##
## [[22]]
## [1] TRUE
##
## [[23]]
## [1] TRUE
##
## [[24]]
## [1] TRUE
##
## [[25]]
## [1] TRUE
##
## [[26]]
## [1] TRUE
##
## [[27]]
## [1] TRUE
##
## [[28]]
## [1] TRUE
##
## [[29]]
## [1] TRUE
##
## [[30]]
## [1] TRUE
##
## [[31]]
## [1] TRUE
##
## [[32]]
## [1] TRUE
##
## [[33]]
## [1] TRUE
##
## [[34]]
## [1] TRUE
##
## [[35]]
## [1] TRUE
##
## [[36]]
## [1] TRUE
##
## [[37]]
## [1] TRUE
##
## [[38]]
## [1] TRUE
##
## [[39]]
## [1] TRUE
##
## [[40]]
## [1] TRUE
##
## [[41]]
## [1] TRUE
##
## [[42]]
## [1] TRUE
##
## [[43]]
## [1] TRUE
##
## [[44]]
## [1] TRUE
##
## [[45]]
## [1] TRUE
##
## [[46]]
## [1] TRUE
#
registerDoMC(cores=8)
#
# suppressPackageStartupMessages(library(RCurl))
# root.url<-'https://gist.github.com/fawda123'
# raw.fun<-paste(
# root.url,
# '5086859/raw/17fd6d2adec4dbcf5ce750cbd1f3e0f4be9d8b19/nnet_plot_fun.r',
# sep='/'
# )
# script<-getURL(raw.fun, ssl.verifypeer = FALSE)
# eval(parse(text = script))
# rm('script','raw.fun')
#
# Auxiliary functions
#
crossx<-function(data,vpc=4,cross,...){
call <- match.call()
resp <- function(formula, data) {
model.response(model.frame(formula, data))
}
n <- nrow(data)
perm.ind <- sample(n)
train.ind<-tapply(1:n, cut(1:n, breaks = cross),
function(x) perm.ind[-x])
sampling.errors <- c()
models<-list()
for (sample in 1:length(train.ind)) {
models[[sample]] <- polymars(data[train.ind[[
sample]],vpc],data[train.ind[[sample]],-vpc],
knots=7)
pred <- predict(models[[sample]],
data[-train.ind[[sample]],-vpc])
real.y<-data[-train.ind[[sample]], vpc]
repeat.errors <- crossprod(
pred - real.y)/length(pred)
sampling.errors[sample] <- repeat.errors
}
best <- which.min(sampling.errors)
structure(list(best.performance =
sampling.errors[best], best.model =
models[[best]]), class = "tune")
}
#
panel.cor.scale <- function(x, y, digits=2, prefix="", cex.cor){
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r = (cor(x, y,use="pairwise"))
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste(prefix, txt, sep="")
if(missing(cex.cor)) cex <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex * abs(r))
}
#
panel.cor <- function(x, y, digits=2, prefix="", cex.cor){
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r = (cor(x, y,use="pairwise"))
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste(prefix, txt, sep="")
if(missing(cex.cor)) cex <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex )
}
#
panel.hist <- function(x, ...) {
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col="cyan", ...)
}
#
pairs.panels <- function (x,smooth=TRUE,scale=FALSE){
if (smooth ){
if (scale) {
pairs(x,diag.panel=panel.hist,upper.panel=
panel.cor.scale,lower.panel=panel.smooth)
}else {
pairs(x,diag.panel=panel.hist,upper.panel=
panel.cor,lower.panel=panel.smooth)
}
} else { #smooth is not true
if (scale) {
pairs(x,diag.panel=panel.hist,upper.panel=panel.cor.scale)
} else {
pairs(x,diag.panel=panel.hist,upper.panel=panel.cor)
}
} #end of else (smooth)
} #end of function
#
# Modelling functions
#
m_lin=function(dat,vprd="O3",vexp=".",cv=10){
frm=as.formula(paste(vprd," ~ ",vexp,sep=""))
objM = tune(lm,frm,data=dat, tunecontrol=
tune.control(sampling="cross",cross=cv))
cprd=which(colnames(dat) %in% vprd)
dx=as.numeric(dat[,cprd])
prd = predict(objM$best.model,dat[,-cprd])
dy=as.numeric(prd)
cr.lm=cor(dx,dy)
er=lm(dy~dx+1)
return(list(model=objM,cc=cr.lm,ermodel=er,dx=dx,dy=dy))
}
#
m_svm=function(dat,vprd="O3",vexp=".",cv=10,
rng=list(gamma = 2^(-3:-1), cost = 2^(1:3))){
frm=as.formula(paste(vprd," ~ ",vexp,sep=""))
objM = tune(svm,frm,data=dat,
ranges = rng,
tunecontrol=tune.control(sampling="cross",cross=cv))
cprd=which(colnames(dat) %in% vprd)
dx=as.numeric(dat[,cprd])
prd = predict(objM$best.model,dat[,-cprd])
dy=as.numeric(prd)
cr.svm=cor(dx,dy)
er=lm(dy~dx+1)
return(list(model=objM,cc=cr.svm,ermodel=er,dx=dx,dy=dy))
}
#
m_rf=function(dat,vprd="O3",vexp=".",cv=10,n_jobs=10,
rng=list(mtry = 2:5, ntree = seq(300,900,300))){
frm=as.formula(paste(vprd," ~ ",vexp,sep=""))
objM = tune(randomForest,frm,data=dat,
ranges = rng,n_jobs=n_jobs,importance=TRUE,
tunecontrol=tune.control(sampling="cross",cross=cv))
cprd=which(colnames(dat) %in% vprd)
dx=as.numeric(dat[,cprd])
prd = predict(objM$best.model,dat[,-cprd])
dy=as.numeric(prd)
cr.svm=cor(dx,dy)
er=lm(dy~dx+1)
return(list(model=objM,cc=cr.svm,ermodel=er,dx=dx,dy=dy))
}
#
m_mlp=function(dat,vprd="O3",vexp=".",cv=10,
rng= list(linout=TRUE, size=4:20, maxit=50000,
decay=8^-2, abstol=1.0e-6,reltol=1.0e-7,
trace=FALSE,rang=0,7,skip=TRUE)){
frm=as.formula(paste(vprd," ~ ",vexp,sep=""))
objM = tune(nnet,frm,data=dat,
ranges = rng,
tunecontrol=tune.control(sampling="cross",cross=cv))
cprd=which(colnames(dat) %in% vprd)
dx=as.numeric(dat[,cprd])
prd = predict(objM$best.model,dat[,-cprd])
dy=as.numeric(prd)
cr.mlp=cor(dx,dy)
er=lm(dy~dx+1)
return(list(model=objM,cc=cr.mlp,ermodel=er,dx=dx,dy=dy))
}
#
m_rpt=function(dat,vprd="O3",vexp=".",cv=10,
rng= list(method="anova",rpart.control=
list(cp=seq(0.01,0.1,0.01)))){
frm=as.formula(paste(vprd," ~ ",vexp,sep=""))
objM = tune(rpart,frm,data=dat, ranges=rng,
tunecontrol=tune.control(sampling="cross",cross=cv))
cprd=which(colnames(dat) %in% vprd)
dx=as.numeric(dat[,cprd])
prd = predict(objM$best.model,dat[,-cprd])
dy=as.numeric(prd)
cr.rpt=cor(dx,dy)
er=lm(dy~dx+1)
return(list(model=objM,cc=cr.rpt,ermodel=er,dx=dx,dy=dy))
}
#
#
# Plotting functions
#
#
plot.nnet<-function(mod.in,nid=T,all.out=T,all.in=T,wts.only=F,
rel.rsc=5,circle.cex=5,node.labs=T,
line.stag=NULL,cex.val=1,alpha.val=1,
circle.col='lightgrey',pos.col='black',neg.col='grey',...){
require(scales)
#gets weights for neural network, output is list
#if rescaled argument is true, weights are returned but rescaled based on abs value
nnet.vals<-function(mod.in,nid,rel.rsc){
library(scales)
layers<-mod.in$n
wts<-mod.in$wts
if(nid) wts<-rescale(abs(wts),c(1,rel.rsc))
indices<-matrix(seq(1,layers[1]*layers[2]+layers[2]),ncol=layers[2])
out.ls<-list()
for(i in 1:ncol(indices)){
out.ls[[paste('hidden',i)]]<-wts[indices[,i]]
}
if(layers[3]==1) out.ls[['out 1']]<-wts[(max(indices)+1):length(wts)]
else{
out.indices<-matrix(seq(max(indices)+1,length(wts)),ncol=layers[3])
for(i in 1:ncol(out.indices)){
out.ls[[paste('out',i)]]<-wts[out.indices[,i]]
}
}
out.ls
}
wts<-nnet.vals(mod.in,nid=F)
if(wts.only) return(wts)
#par(mar=numeric(4),oma=numeric(4),family='serif')
library(scales)
struct<-mod.in$n
x.range<-c(0,100)
y.range<-c(0,100)
#these are all proportions from 0-1
if(is.null(line.stag)) line.stag<-0.011*circle.cex/2
layer.x<-seq(0.17,0.9,length=3)
bias.x<-c(mean(layer.x[1:2]),mean(layer.x[2:3]))
bias.y<-0.95
in.col<-bord.col<-circle.col
circle.cex<-circle.cex
#get variable names from nnet object
if(is.null(mod.in$call$formula)){
x.names<-colnames(eval(mod.in$call$x))
y.names<-colnames(eval(mod.in$call$y))
}
else{
forms<-eval(mod.in$call$formula)
dat.names<-model.frame(forms,data=eval(mod.in$call$data))
y.names<-as.character(forms)[2]
x.names<-names(dat.names)[!names(dat.names) %in% y.names]
}
#initiate plot
plot(x.range,y.range,type='n',axes=F,ylab='',xlab='',...)
#function for getting y locations for input, hidden, output layers
#input is integer value from 'struct'
get.ys<-function(lyr){
spacing<-diff(c(0*diff(y.range),0.9*diff(y.range)))/max(struct)
seq(0.5*(diff(y.range)+spacing*(lyr-1)),0.5*(diff(y.range)-spacing*(lyr-1)),
length=lyr)
}
#function for plotting nodes
#'layer' specifies which layer, integer from 'struct'
#'x.loc' indicates x location for layer, integer from 'layer.x'
#'layer.name' is string indicating text to put in node
layer.points<-function(layer,x.loc,layer.name,cex=cex.val){
x<-rep(x.loc*diff(x.range),layer)
y<-get.ys(layer)
points(x,y,pch=21,cex=circle.cex,col=in.col,bg=bord.col)
if(node.labs) text(x,y,paste(layer.name,1:layer,sep=''),cex=cex.val)
if(layer.name=='I' & node.labs){
text(x-line.stag*diff(x.range),y,x.names,pos=2,cex=cex.val)
}
if(layer.name=='O' & node.labs)
text(x+line.stag*diff(x.range),y,y.names,pos=4,cex=cex.val)
}
#function for plotting bias points
#'bias.x' is vector of values for x locations
#'bias.y' is vector for y location
#'layer.name' is string indicating text to put in node
bias.points<-function(bias.x,bias.y,layer.name,cex,...){
for(val in 1:length(bias.x)){
points(
diff(x.range)*bias.x[val],
bias.y*diff(y.range),
pch=21,col=in.col,bg=bord.col,cex=circle.cex
)
if(node.labs)
text(
diff(x.range)*bias.x[val],
bias.y*diff(y.range),
paste(layer.name,val,sep=''),
cex=cex.val
)
}
}
#function creates lines colored by direction and width as proportion of magnitude
#use 'all.in' argument if you want to plot connection lines for only a single input node
layer.lines<-function(mod.in,h.layer,layer1=1,layer2=2,out.layer=F,
nid,rel.rsc,all.in,pos.col,neg.col,...){
x0<-rep(layer.x[layer1]*diff(x.range)+line.stag*diff(x.range),struct[layer1])
x1<-rep(layer.x[layer2]*diff(x.range)-line.stag*diff(x.range),struct[layer1])
if(out.layer==T){
y0<-get.ys(struct[layer1])
y1<-rep(get.ys(struct[layer2])[h.layer],struct[layer1])
src.str<-paste('out',h.layer)
wts<-nnet.vals(mod.in,nid=F,rel.rsc)
wts<-wts[grep(src.str,names(wts))][[1]][-1]
wts.rs<-nnet.vals(mod.in,nid=T,rel.rsc)
wts.rs<-wts.rs[grep(src.str,names(wts.rs))][[1]][-1]
cols<-rep(pos.col,struct[layer1])
cols[wts<0]<-neg.col
if(nid) segments(x0,y0,x1,y1,col=cols,lwd=wts.rs)
else segments(x0,y0,x1,y1)
}
else{
if(is.logical(all.in)) all.in<-h.layer
else all.in<-which(x.names==all.in)
y0<-rep(get.ys(struct[layer1])[all.in],struct[2])
y1<-get.ys(struct[layer2])
src.str<-'hidden'
wts<-nnet.vals(mod.in,nid=F,rel.rsc)
wts<-unlist(lapply(wts[grep(src.str,names(wts))],function(x) x[all.in+1]))
wts.rs<-nnet.vals(mod.in,nid=T,rel.rsc)
wts.rs<-unlist(lapply(wts.rs[grep(src.str,names(wts.rs))],function(x) x[all.in+1]))
cols<-rep(pos.col,struct[layer2])
cols[wts<0]<-neg.col
if(nid) segments(x0,y0,x1,y1,col=cols,lwd=wts.rs)
else segments(x0,y0,x1,y1)
}
}
bias.lines<-function(bias.x,mod.in,nid,rel.rsc,all.out,pos.col,neg.col,...){
if(is.logical(all.out)) all.out<-1:struct[3]
else all.out<-which(y.names==all.out)
for(val in 1:length(bias.x)){
wts<-nnet.vals(mod.in,nid=F,rel.rsc)
wts.rs<-nnet.vals(mod.in,nid=T,rel.rsc)
if(val==1){
wts<-wts[grep('out',names(wts),invert=T)]
wts.rs<-wts.rs[grep('out',names(wts.rs),invert=T)]
}
if(val==2){
wts<-wts[grep('out',names(wts))]
wts.rs<-wts.rs[grep('out',names(wts.rs))]
}
cols<-rep(pos.col,length(wts))
cols[unlist(lapply(wts,function(x) x[1]))<0]<-neg.col
wts.rs<-unlist(lapply(wts.rs,function(x) x[1]))
if(nid==F){
wts.rs<-rep(1,struct[val+1])
cols<-rep('black',struct[val+1])
}
if(val==1){
segments(
rep(diff(x.range)*bias.x[val]+diff(x.range)*line.stag,struct[val+1]),
rep(bias.y*diff(y.range),struct[val+1]),
rep(diff(x.range)*layer.x[val+1]-diff(x.range)*line.stag,struct[val+1]),
get.ys(struct[val+1]),
lwd=wts.rs,
col=cols
)
}
if(val==2){
segments(
rep(diff(x.range)*bias.x[val]+diff(x.range)*line.stag,struct[val+1]),
rep(bias.y*diff(y.range),struct[val+1]),
rep(diff(x.range)*layer.x[val+1]-diff(x.range)*line.stag,struct[val+1]),
get.ys(struct[val+1])[all.out],
lwd=wts.rs[all.out],
col=cols[all.out]
)
}
}
}
#use functions to plot connections between layers
#bias lines
bias.lines(bias.x,mod.in,nid=nid,rel.rsc=rel.rsc,all.out=all.out,
pos.col=alpha(pos.col,alpha.val),neg.col=alpha(neg.col,alpha.val))
#
#layer lines, makes use of arguments to plot all or for individual layers
#starts with input-hidden
#uses 'all.in' argument to plot connection lines for all input nodes or a single node
if(is.logical(all.in)){
mapply(
function(x) layer.lines(mod.in,x,layer1=1,layer2=2,nid=nid,rel.rsc=rel.rsc,all.in=all.in,
pos.col=alpha(pos.col,alpha.val),neg.col=alpha(neg.col,alpha.val)),
1:struct[1]
)
}
else{
node.in<-which(x.names==all.in)
layer.lines(mod.in,node.in,layer1=1,layer2=2,nid=nid,rel.rsc=rel.rsc,all.in=all.in,
pos.col=alpha(pos.col,alpha.val),neg.col=alpha(neg.col,alpha.val))
}
#
if(is.logical(all.out))
mapply(
function(x) layer.lines(mod.in,x,layer1=2,layer2=3,out.layer=T,nid=nid,rel.rsc=rel.rsc,
all.in=all.in,pos.col=alpha(pos.col,alpha.val),neg.col=alpha(neg.col,alpha.val)),
1:struct[3]
)
else{
all.out<-which(y.names==all.out)
layer.lines(mod.in,all.out,layer1=2,layer2=3,out.layer=T,nid=nid,rel.rsc=rel.rsc,
pos.col=pos.col,neg.col=neg.col)
}
#use functions to plot nodes
layer.points(struct[1],layer.x[1],'I')
layer.points(struct[2],layer.x[2],'H')
layer.points(struct[3],layer.x[3],'O')
bias.points(bias.x,bias.y,'B')
}
#
modellization=function(xx,vy,folds=10,repeats=1,nc=12){
registerDoMC(nc)
xx = as.data.frame(xx)
vyp= which(vy == colnames(xx))
X = xx[,-vyp]
rownames(X)<- 1:nrow(X)
X <- data.frame(X)
Y <- xx[,vyp]
train=1:nrow(X)
#
myControl = trainControl(method='cv', number=folds,
repeats=repeats, returnResamp='none',
returnData=FALSE, savePredictions=TRUE,
verboseIter=FALSE, allowParallel=TRUE,
index=createMultiFolds(Y[train],
k=folds, times=repeats))
#Train some models
model_1.1 = "train(X[train,], Y[train], method='gbm',
trControl=myControl,
tuneGrid=expand.grid(.n.trees=500,
.interaction.depth=15, .n.minobsinnode=c(2:5),
.shrinkage = 0.01),verbose=FALSE)"
model_1.2 = "train(X[train,], Y[train], method='blackboost',
trControl=myControl, tuneGrid=
expand.grid(.mstop=10^2:4,
.maxdepth=seq(5,15,5)))"
model_1.3 = "train(X[train,], Y[train], method='rf',
trControl=myControl,
tuneGrid=expand.grid(.mtry=(2:ncol(X))),
proximity=TRUE)"
model_1.4 = "train(X[train,], Y[train], method='ridge',
trControl=myControl, trace=FALSE,
tuneGrid=expand.grid(.lambda=10^(-3:2)))"
model_1.5 = "train(X[train,], Y[train], method='ppr',
trControl=myControl,tuneGrid=
expand.grid(.nterms=c(3:ncol(X))))"
model_1.6 = "train(X[train,], Y[train], method='earth',
trControl=myControl,tuneGrid=
expand.grid(.degree = 3,
.nprune = (1:5) * 2), metric = 'RMSE',
maximize = FALSE)"
model_1.7 = "train(X[train,], Y[train], method='glm',
trControl=myControl)"
model_1.8 = "train(X[train,], Y[train], method='svmRadial',
trControl=myControl,tuneGrid=
expand.grid(.C=10^(-3:3),
.sigma=c(10^(-4:3))))"
model_1.9 = "train(X[train,], Y[train], method='gam',
trControl=myControl,tuneGrid=
expand.grid(.select=TRUE,
.method=c('P-ML','P-REML','ML','REML')))"
model_1.10 = "train(X[train,], Y[train], method='kernelpls',
trControl=myControl,tuneGrid=
expand.grid(ncomp=seq(2,ncol(X))))"
model_1.11 = "train(X[train,], Y[train], method='nnet',
trControl=myControl,tuneGrid=
expand.grid(.size=seq(2,round(2*ncol(X)),3),
.decay=c(1.e-5,1.e-2)),
linout=TRUE, rang = 300, trace=FALSE,
maxit=10000, reltol=1.0e-11, abstol=1.0e-6)"
#
#Make a list of all the models
lmodels=c("model_1.1", "model_1.2", "model_1.3", "model_1.4",
"model_1.5", "model_1.6", "model_1.7", "model_1.8",
"model_1.9", "model_1.10","model_1.11")
all.models_1 =list()
for (i in lmodels) {
if (exists(i)) {
# cat(paste("Processing model:",i,"\n",sep=""))
all.models_1[[length(all.models_1)+1]] =
eval(parse(text=gsub('\n','',get(i))))
}
}
names(all.models_1) = sapply(all.models_1,
function(x) x$method)
#
#Make a greedy ensemble - currently can only use RMSE
greedy_1 <- caretEnsemble(all.models_1, iter=1000L)
linear_1 <- caretStack(all.models_1, method='glm',
trControl=trainControl(method='cv'))
if (exists("greedy_1")){
all.models_1[[length(all.models_1)+1]] = get("greedy_1")
names(all.models_1)[length(all.models_1)]="Greedy"
}
#Make a linear regression ensemble
if (exists("linear_1")){
all.models_1[[length(all.models_1)+1]] = get("linear_1")
names(all.models_1)[length(all.models_1)]="ELM"
}
return(list(models=all.models_1,control=myControl,XY=xx))
}
#
#
plt=function(dat,nc,ylb,fich,model,pfile=TRUE,w=8,h=6) {
par(mgp=c(2.2,0.45,0),tcl=0.4,mar=c(3.6,4.5,2.1,1.1))
plot(model$dx,model$dy,
xlab=expression(O[3] ~ real), ylab=ylb,
cex.lab=2,pch='+',cex.axis=1.5)
vx<-seq(0,200,15)
lines(vx,vx,pch = c(25), type="b",lty=2,bg="white")
val<-data.frame(dx=seq(0,200,20),dy=0)
vy<-predict(model$ermodel,newdata=val)
lines(val$dx,as.numeric(vy),type="b",pch=21,
lty=3,lwd=2.5,cex=1.2,bg="white")
if (model[["model"]]$method=="nnet") {
par(mar=numeric(4),mfrow=c(1,2),family='serif')
plot(model[["model"]]$best.model,nid=F)
plot(model[["model"]]$best.model)
par(mar=numeric(4),mfrow=c(1,1),family='serif')
par(mgp=c(2.2,0.45,0),tcl=0.4,mar=c(3.6,4.5,2.1,1.1))
}
if (pfile) {
pdf(file=fich,width=w,height=h)
par(mgp=c(2.2,0.45,0),tcl=0.4,mar=c(3.6,4.5,2.1,1.1))
plot(model$dx,model$dy,
xlab=expression(O[3] ~ real), ylab=ylb,
cex.lab=2,pch='+',cex.axis=1.5)
vx<-seq(0,200,15)
lines(vx,vx,pch = c(25), type="b",lty=2,bg="white")
val<-data.frame(dx=seq(0,200,20),dy=0)
vy<-predict(model$ermodel,newdata=val)
lines(val$dx,as.numeric(vy),type="b",pch=21,
lty=3,lwd=2.5,cex=1.2,bg="white")
if (model[["model"]]$method=="nnet") {
par(mar=numeric(4),mfrow=c(1,2),family='serif')
plot(model[["model"]]$best.model,nid=F)
plot(model[["model"]]$best.model)
par(mar=numeric(4),mfrow=c(1,1),family='serif')
par(mgp=c(2.2,0.45,0),tcl=0.4,mar=c(3.6,4.5,2.1,1.1))
}
dev.off()
}
}
#
plt_prd=function(dat,nc,ylb,fich,model,pfile=TRUE,w=8,h=6) {
par(mgp=c(2.2,0.45,0),tcl=0.4,mar=c(3.6,4.5,2.1,1.1))
dx=dat[,nc]
mdl=model[["model"]]$best.model
dy=predict(mdl,newdata=dat[,-nc])
plot(dx,dy, xlab=expression(O[3] ~ real), ylab=ylb,
cex.lab=2,pch='+',cex.axis=1.5)
vx<-seq(0,200,15)
lines(vx,vx,pch = c(25), type="b",lty=2,bg="white")
val<-data.frame(dx=seq(0,200,20),dy=0)
vy<-predict(model$ermodel,newdata=val)
lines(val$dx,as.numeric(vy),type="b",pch=21,
lty=3,lwd=2.5,cex=1.2,bg="white")
if (pfile) {
pdf(file=fich,width=w,height=h)
par(mgp=c(2.2,0.45,0),tcl=0.4,mar=c(3.6,4.5,2.1,1.1))
plot(dx,dy, xlab=expression(O[3] ~ real), ylab=ylb,
cex.lab=2,pch='+',cex.axis=1.5)
lines(vx,vx,pch = c(25), type="b",lty=2,bg="white")
lines(val$dx,as.numeric(vy),type="b",pch=21,
lty=3,lwd=2.5,cex=1.2,bg="white")
dev.off()
}
return(cor(dx,dy))
}
#
#
plt_caret=function(dat,ncc,fich,model,pfile=TRUE,w=8,h=6) {
nc=which(colnames(dat)==ncc)
par(mgp=c(2.2,0.45,0),tcl=0.4,mar=c(3.6,4.5,2.1,1.1))
dx=dat[,nc]
cr=rep(NA,length(model[["models"]]))
for ( i in 1:length(model[["models"]])) {
mdl=model[["models"]][[i]]
dy=predict(mdl,newdata=dat[,-nc])
ylb=names(model[["models"]])[i]
plot(dx,dy, xlab=bquote(O[3] ~ real),
ylab=bquote(O[3] ~ by ~ .(ylb)),
cex.lab=2,pch='+',cex.axis=1.5)
cr[i]=cor(dx,dy)
vx<-seq(0,200,15)
lines(vx,vx,pch = c(25), type="b",lty=2,bg="white")
ermodel=lm(dy~dx)
val<-data.frame(dx=seq(0,200,20),dy=0)
vy<-predict(ermodel,newdata=val)
lines(val$dx,as.numeric(vy),type="b",pch=21,
lty=3,lwd=2.5,cex=1.2,bg="white")
}
if (pfile) {
pdf(file=fich,width=w,height=h)
par(mgp=c(2.2,0.45,0),tcl=0.4,mar=c(3.6,4.5,2.1,1.1))
for ( i in 1:length(model[["models"]])) {
mdl=model[["models"]][[i]]
dy=predict(mdl,newdata=dat[,-nc])
ylb=names(model[["models"]])[i]
ylb2=expression(O[3] ~ .(ylb))
plot(dx,dy, xlab=bquote(O[3] ~ real),
ylab=bquote(O[3] ~ by ~ .(ylb)),
cex.lab=2,pch='+',cex.axis=1.5)
vx<-seq(0,200,15)
lines(vx,vx,pch = c(25), type="b",lty=2,bg="white")
ermodel=lm(dy~dx)
val<-data.frame(dx=seq(0,200,20),dy=0)
vy<-predict(ermodel,newdata=val)
lines(val$dx,as.numeric(vy),type="b",pch=21,
lty=3,lwd=2.5,cex=1.2,bg="white")
}
dev.off()
}
return(cr)
}
#
plt_pairs=function(dat,fich,w=8,h=6,pfile=TRUE){
pairs.panels(dat)
if (pfile) {
pdf(file=fich,width=w,height=h)
pairs.panels(dat)
dev.off()
}
}
#
svm_sum=function(x){
res=NULL
if ("svm" %in% class(x)) {
y=summary(x)
res=data.frame(kernel=y$kernel,cost=y$cost,
eps=y$epsilon,gamma=y$gamma,nu=y$nu,
rhp=y$rho,nsv=y$tot.nSV,degree=y$degree,
type=y$type)
}
return(res)
}
#
o3f=function(x,delta=8,ref) {
nd=strptime(x[which(names(ref)=="Date")],"%Y-%m-%d %H:%M:%S")+delta*3600
idx=ref$Date==nd
if ( sum(idx)==1) {
return(ref$O3[idx])
} else {
return(NA)
}
}
#
Let’s load the data from the csv files
#
# if (file.exists("MHAMID.RData")) {
# load("MHAMID.RData")
# } else {
MHM<-read.csv(file="Mhamid_data.csv",sep=";",dec=",",
header=TRUE)
DMHM<-MHM[! is.na(as.Date(as.character(MHM[,1]))),
1:ncol(MHM)]
DMHM=DMHM[-as.numeric(which(apply(DMHM,1,function(x){return(sum(is.na(x)))}) > 0 )),]
newc<-paste(as.character(DMHM[,1]),
paste(DMHM[,2],":00:00",sep=""),sep= " ")
newd<-strptime(newc,"%d/%m/%y %H:%M:%S")
antes<-newd - 3600
NMHM<-DMHM[,-2]
NMHM[,1]<-as.data.frame(newd)
colnames(NMHM)=gsub('.Mhamid','',colnames(NMHM))
colnames(NMHM)[9]="WS"
colnames(NMHM)[10]="SR"
pNMHM=NMHM
pNMHM[,11]=as.numeric(format(NMHM$Date,"%H"))
colnames(pNMHM)[11]="Hour"
colnames(pNMHM)[5]="C_O3"
hourf=8
pNMHM[,12]=apply(NMHM,1,o3f,delta=hourf,ref=NMHM)
colnames(pNMHM)[12]="O3"
pNMHM=pNMHM[! is.na(pNMHM$O3),]
save(MHM,DMHM,NMHM,pNMHM,hourf,file="MHAMID.RData")
#}
rm(MHM)
NMHM=pNMHM
#
plt_pairs(NMHM[,-1],fich="./plots/MHM_pairs.pdf",pfile=TRUE)
## png
## 2
# plt_pairs(pNMHM[,-1],fich="./plots/pMHM_pairs.pdf",pfile=TRUE)
[,1] [,2]
Date "Min. :2009-06-01 01:00:00 " "1st Qu.:2009-12-27 19:30:00 "
CO "Min. :0.0000 " "1st Qu.:0.0300 "
HR "Min. : 8.00 " "1st Qu.:39.00 "
NO2 "Min. : 4.00 " "1st Qu.: 12.00 "
C_O3 "Min. : 0.00 " "1st Qu.: 25.00 "
PM10 "Min. : 0.00 " "1st Qu.: 30.00 "
SO2 "Min. : 0.000 " "1st Qu.: 6.000 "
TC "Min. : 4.20 " "1st Qu.:15.20 "
WS "Min. :0.100 " "1st Qu.:0.700 "
SR "Min. : 0.00 " "1st Qu.: 0.00 "
Hour "Min. : 0.0 " "1st Qu.: 5.0 "
O3 "Min. : 0.00 " "1st Qu.: 25.00 "
[,3] [,4]
Date "Median :2010-04-09 13:00:00 " "Mean :2010-04-05 07:14:02 "
CO "Median :0.0500 " "Mean :0.1001 "
HR "Median :57.00 " "Mean :56.92 "
NO2 "Median : 19.00 " "Mean : 23.89 "
C_O3 "Median : 37.00 " "Mean : 43.63 "
PM10 "Median : 49.00 " "Mean : 61.22 "
SO2 "Median : 9.000 " "Mean : 9.515 "
TC "Median :19.50 " "Mean :20.61 "
WS "Median :1.100 " "Mean :1.228 "
SR "Median : 2.83 " "Mean : 199.71 "
Hour "Median :11.0 " "Mean :11.4 "
O3 "Median : 37.00 " "Mean : 43.92 "
[,5] [,6]
Date "3rd Qu.:2010-08-06 15:30:00 " "Max. :2010-11-28 15:00:00 "
CO "3rd Qu.:0.1000 " "Max. :4.0100 "
HR "3rd Qu.:75.00 " "Max. :99.00 "
NO2 "3rd Qu.: 32.00 " "Max. :113.00 "
C_O3 "3rd Qu.: 54.00 " "Max. :270.00 "
PM10 "3rd Qu.: 74.00 " "Max. :4187.00 "
SO2 "3rd Qu.:11.000 " "Max. :46.000 "
TC "3rd Qu.:25.20 " "Max. :43.90 "
WS "3rd Qu.:1.600 " "Max. :7.600 "
SR "3rd Qu.: 381.10 " "Max. :1092.00 "
Hour "3rd Qu.:17.0 " "Max. :23.0 "
O3 "3rd Qu.: 55.00 " "Max. :270.00 "
Numerical treatment will be performed by using the well known open source statistical environment R (http://www.r-project.org).
In order to compare with Prof Ouarzazi’s results (corr = 0.84) for a local based model O3 ~ remaining variables at the same period, we will use several technologies.
Basic methodology will be: * To apply cross correlation learning validation as it becomes more robust that the fixed approach 70%,15%,15% * To apply full validation to all dataset, after selecting the best model, as Prof Ouarzazi did. * The hourly based moted was selected as for learning what it is possible to do, even when \(O_3\) should be accounted by its maximum per day and/or the dosage by 8h periods, depending on the specific regulation. * Uncertainty about future predictors was removed as we were no predicting Ozone with any lag.
A linear model is considered as reference, for comparison of results in order to evaluate the degree of linearity
#
print(xtable(as.data.frame(car::vif(lm(O3~.,data=NMHM[,-1])))),type="html")
| car::vif(lm(O3 ~ ., data = NMHM[, -1])) | |
|---|---|
| CO | 1.69 |
| HR | 3.29 |
| NO2 | 1.49 |
| C_O3 | 1.43 |
| PM10 | 1.36 |
| SO2 | 1.14 |
| TC | 3.18 |
| WS | 1.21 |
| SR | 1.50 |
| Hour | 1.33 |
if (file.exists(paste("MHM_lm_",hourf,".RData",sep=""))) {
load(paste("MHM_lm_",hourf,".RData",sep=""))
} else {
rej=which(colnames(NMHM) %in% c("Date","SO2"))
M.lm=m_lin(NMHM[,-rej],vprd="O3",vexp=".",cv=10)
#
idx = sample(1:nrow(NMHM),floor(0.15*nrow(NMHM)),replace=FALSE)
NMHM.trn = NMHM[-idx,]
NMHM.tst = NMHM[idx,]
M.lmp=m_lin(NMHM.trn[,-rej],vprd="O3",vexp=".",cv=10)
c.lmp=plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
fich="./plots/O3_LM_pt.pdf",M.lmp,pfile=FALSE)
# For the day
NMHM.trn_d = NMHM.trn[NMHM.trn$SR>0,]
NMHM.tst_d = NMHM.tst[NMHM.tst$SR>0,]
M.lmp_d=m_lin(NMHM.trn_d[,-rej],vprd="O3",vexp=".",cv=10)
c.lmp_d=plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
fich="./plots/O3_LM_pt_d.pdf",M.lmp_d,pfile=FALSE)
# For the night
NMHM.trn_n = NMHM.trn[NMHM.trn$SR==0,]
NMHM.tst_n = NMHM.tst[NMHM.tst$SR==0,]
rej2=which(colnames(NMHM) %in% c("Date","SO2","SR"))
M.lmp_n=m_lin(NMHM.trn_n[,-rej2],vprd="O3",vexp=".",cv=10)
c.lmp_n=plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
fich="./plots/O3_LM_pt_n.pdf",M.lmp_n,pfile=FALSE)
#
print(xtable(summary(M.lm[["model"]]$best.model)),type="html")
print(xtable(summary(M.lmp[["model"]]$best.model)),type="html")
print(xtable(summary(M.lmp_d[["model"]]$best.model)),type="html")
print(xtable(summary(M.lmp_n[["model"]]$best.model)),type="html")
#
r2=data.frame(r2=(summary(M.lm[["model"]]$best.model))$r.squared,
r2adj=(summary(M.lm[["model"]]$best.model))$adj.r.squared)
r2=rbind(r2,c((summary(M.lmp[["model"]]$best.model))$r.squared,
r2adj=(summary(M.lmp[["model"]]$best.model))$adj.r.squared))
r2=rbind(r2,c((summary(M.lmp_d[["model"]]$best.model))$r.squared,
r2adj=(summary(M.lmp_d[["model"]]$best.model))$adj.r.squared))
r2=rbind(r2,c((summary(M.lmp_n[["model"]]$best.model))$r.squared,
r2adj=(summary(M.lmp_n[["model"]]$best.model))$adj.r.squared))
rownames(r2)=c("M.lm","M.lmp","M.lmp_d","M.lmp_n")
print(xtable(r2),type="html")
#
cc=data.frame(Model=M.lm[["cc"]],Tst=0)
cc=rbind(cc,c(M.lmp[["cc"]],c.lmp))
cc=rbind(cc,c(M.lmp_d[["cc"]],c.lmp_d))
cc=rbind(cc,c(M.lmp_n[["cc"]],c.lmp_n))
rownames(cc)=c("M.lm","M.lmp","M.lmp_d","M.lmp_n")
print(xtable(cc),type="html")
#
cc.lm=cc
r2.lm=r2
rm(list=c("cc","r2"))
save(M.lm,M.lmp,M.lmp_d,M.lmp_n,NMHM,NMHM.trn,rej,rej2,
NMHM.tst,NMHM.trn_d,NMHM.trn_n,cc.lm,r2.lm,
NMHM.tst_d,NMHM.tst_n,file=paste("MHM_lm_",hourf,".RData",sep=""))
}
tb1=M.lm[["model"]]$performances
table01=xtable(tb1)
print(table01,type="html")
| dummyparameter | error | dispersion | |
|---|---|---|---|
| 1 | 0.00 | 503.50 | 53.85 |
plt(NMHM.trn,11,ylb=expression(O[3] ~ LM ~ predicted),
fich="./plots/O3_LM.pdf",model=M.lm,pfile=TRUE)
png 2
#
tb2=M.lmp[["model"]]$performances
table02=xtable(tb2)
print(table02,type="html")
| dummyparameter | error | dispersion | |
|---|---|---|---|
| 1 | 0.00 | 505.68 | 78.89 |
plt(NMHM.trn,11,ylb=expression(O[3] ~ LM ~ predicted),
fich="./plots/O3_LM_p.pdf",model=M.lmp,pfile=TRUE)
png 2
plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
fich="./plots/O3_LM_pt.pdf",M.lmp,pfile=TRUE)
[1] 0.5869973
#
tb3=M.lmp_d[["model"]]$performances
table03=xtable(tb3)
print(table03,type="html")
| dummyparameter | error | dispersion | |
|---|---|---|---|
| 1 | 0.00 | 418.10 | 66.64 |
plt(NMHM.trn_d,11,ylb=expression(O[3] ~ LM ~ predicted),
fich="./plots/O3_LM_p_d.pdf",model=M.lmp_d,pfile=TRUE)
png 2
plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
fich="./plots/O3_LM_pt_d.pdf",M.lmp_d,pfile=TRUE)
[1] 0.6856557
#
tb4=M.lmp_n[["model"]]$performances
table04=xtable(tb4)
print(table04,type="html")
| dummyparameter | error | dispersion | |
|---|---|---|---|
| 1 | 0.00 | 461.20 | 84.82 |
plt(NMHM.trn_n,11,ylb=expression(O[3] ~ LM ~ predicted),
fich="./plots/O3_LM_p_n.pdf",model=M.lmp_n,pfile=TRUE)
png 2
plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
fich="./plots/O3_LM_pt_n.pdf",M.lmp_n,pfile=TRUE)
[1] 0.617861
#
print(xtable(cc.lm),type="html")
| Model | Tst | |
|---|---|---|
| M.lm | 0.62 | 0.00 |
| M.lmp | 0.63 | 0.59 |
| M.lmp_d | 0.74 | 0.69 |
| M.lmp_n | 0.61 | 0.62 |
#
The results found account for a correlation of 0.620459. It will considered as a reference.
A wrapper for SVM based regressors is applied looking for best parameters of learning.
tb1=t(summary(M.svm[["model"]]$performances))
table01=xtable(tb1)
print(table01,type="html")
| V1 | V2 | V3 | V4 | V5 | V6 | |
|---|---|---|---|---|---|---|
| ||||||
| ||||||
| ||||||
| dispersion | Min. :29.20 | 1st Qu.:30.86 | Median :32.32 | Mean :32.23 | 3rd Qu.:33.66 | Max. :34.82 |
plt(NMHM,11,ylb=expression(O[3] ~ SVM ~ predicted),
fich="./plots/O3_SVM.pdf",model=M.svm,pfile=TRUE)
png 2
#
tb2=t(summary(M.svmp[["model"]]$performances))
table02=xtable(tb2)
print(table02,type="html")
| V1 | V2 | V3 | V4 | V5 | V6 | |
|---|---|---|---|---|---|---|
| ||||||
| ||||||
| ||||||
| dispersion | Min. :23.41 | 1st Qu.:25.40 | Median :26.27 | Mean :27.68 | 3rd Qu.:28.71 | Max. :35.74 |
plt(NMHM.trn,11,ylb=expression(O[3] ~ SVM ~ predicted),
fich="./plots/O3_SVM_p.pdf",model=M.svmp,pfile=TRUE)
png 2
plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ SVM ~ predicted),
fich="./plots/O3_SVM_pt.pdf",M.svmp,pfile=TRUE)
[1] 0.8930889
#
tb3=t(summary(M.svmp_d[["model"]]$performances))
table03=xtable(tb3)
print(table03,type="html")
| V1 | V2 | V3 | V4 | V5 | V6 | |
|---|---|---|---|---|---|---|
| ||||||
| ||||||
| ||||||
| dispersion | Min. :23.26 | 1st Qu.:24.76 | Median :27.19 | Mean :27.50 | 3rd Qu.:29.19 | Max. :33.55 |
plt(NMHM.trn_d,11,ylb=expression(O[3] ~ SVM ~ predicted),
fich="./plots/O3_SVM_p_d.pdf",model=M.svmp_d,pfile=TRUE)
png 2
plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ SVM ~ predicted),
fich="./plots/O3_SVM_pt_d.pdf",M.svmp_d,pfile=TRUE)
[1] 0.8983243
#
tb4=t(summary(M.svmp_n[["model"]]$performances))
table04=xtable(tb4)
print(table04,type="html")
| V1 | V2 | V3 | V4 | V5 | V6 | |
|---|---|---|---|---|---|---|
| ||||||
| ||||||
| ||||||
| dispersion | Min. :35.79 | 1st Qu.:44.27 | Median :49.56 | Mean :50.62 | 3rd Qu.:56.83 | Max. :66.92 |
plt(NMHM.trn_n,11,ylb=expression(O[3] ~ SVM ~ predicted),
fich="./plots/O3_SVM_p_n.pdf",model=M.svmp_n,pfile=TRUE)
png 2
plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ SVM ~ predicted),
fich="./plots/O3_SVM_pt_n.pdf",M.svmp_n,pfile=TRUE)
[1] 0.9138307
#
print(xtable(cc.svm),type="html")
| Model | Tst | |
|---|---|---|
| M.svm | 0.96 | 0.00 |
| M.svmp | 0.97 | 0.89 |
| M.svmp_d | 0.96 | 0.90 |
| M.svmp_n | 0.96 | 0.91 |
The results found account for a correlation of 0.9635835 which outperforms the initial proposal carried out by Prof. Ouarzazi.
Let’s test the randomForest technology.
| mtry | ntree | error | dispersion | |
|---|---|---|---|---|
| 1 | 2 | 300.00 | 161.41 | 32.95 |
| 2 | 3 | 300.00 | 145.10 | 26.35 |
| 3 | 4 | 300.00 | 139.00 | 22.57 |
| 4 | 5 | 300.00 | 136.70 | 21.14 |
| 5 | 6 | 300.00 | 136.44 | 20.13 |
| 6 | 2 | 500.00 | 161.46 | 32.99 |
| 7 | 3 | 500.00 | 144.21 | 25.77 |
| 8 | 4 | 500.00 | 138.54 | 23.64 |
| 9 | 5 | 500.00 | 136.63 | 21.10 |
| 10 | 6 | 500.00 | 136.42 | 19.24 |
| 11 | 2 | 700.00 | 161.54 | 33.90 |
| 12 | 3 | 700.00 | 144.10 | 26.53 |
| 13 | 4 | 700.00 | 139.19 | 23.68 |
| 14 | 5 | 700.00 | 136.78 | 21.66 |
| 15 | 6 | 700.00 | 136.45 | 20.38 |
| 16 | 2 | 900.00 | 160.91 | 32.29 |
| 17 | 3 | 900.00 | 144.03 | 26.58 |
| 18 | 4 | 900.00 | 138.44 | 23.01 |
| 19 | 5 | 900.00 | 136.44 | 21.85 |
| 20 | 6 | 900.00 | 136.56 | 20.17 |
| mtry | ntree | error | dispersion | |
|---|---|---|---|---|
| 1 | 2 | 300.00 | 163.61 | 26.93 |
| 2 | 3 | 300.00 | 146.02 | 19.75 |
| 3 | 4 | 300.00 | 140.07 | 18.30 |
| 4 | 5 | 300.00 | 139.31 | 16.29 |
| 5 | 6 | 300.00 | 138.93 | 16.06 |
| 6 | 2 | 500.00 | 163.16 | 24.70 |
| 7 | 3 | 500.00 | 146.46 | 20.19 |
| 8 | 4 | 500.00 | 140.38 | 16.93 |
| 9 | 5 | 500.00 | 138.64 | 16.43 |
| 10 | 6 | 500.00 | 138.25 | 15.76 |
| 11 | 2 | 700.00 | 163.19 | 25.68 |
| 12 | 3 | 700.00 | 146.35 | 20.41 |
| 13 | 4 | 700.00 | 140.31 | 17.48 |
| 14 | 5 | 700.00 | 137.99 | 16.33 |
| 15 | 6 | 700.00 | 138.14 | 15.25 |
| 16 | 2 | 900.00 | 163.12 | 25.64 |
| 17 | 3 | 900.00 | 146.09 | 20.11 |
| 18 | 4 | 900.00 | 140.78 | 18.25 |
| 19 | 5 | 900.00 | 137.75 | 15.96 |
| 20 | 6 | 900.00 | 138.06 | 15.86 |
| mtry | ntree | error | dispersion | |
|---|---|---|---|---|
| 1 | 2 | 300.00 | 174.91 | 54.48 |
| 2 | 3 | 300.00 | 159.04 | 46.20 |
| 3 | 4 | 300.00 | 154.59 | 43.09 |
| 4 | 5 | 300.00 | 153.50 | 39.68 |
| 5 | 6 | 300.00 | 153.17 | 38.19 |
| 6 | 2 | 500.00 | 175.88 | 56.34 |
| 7 | 3 | 500.00 | 158.44 | 47.15 |
| 8 | 4 | 500.00 | 153.05 | 40.60 |
| 9 | 5 | 500.00 | 153.48 | 40.47 |
| 10 | 6 | 500.00 | 153.20 | 37.46 |
| 11 | 2 | 700.00 | 174.72 | 54.75 |
| 12 | 3 | 700.00 | 157.70 | 45.06 |
| 13 | 4 | 700.00 | 154.40 | 42.47 |
| 14 | 5 | 700.00 | 153.20 | 40.27 |
| 15 | 6 | 700.00 | 153.85 | 40.13 |
| 16 | 2 | 900.00 | 174.41 | 55.99 |
| 17 | 3 | 900.00 | 159.09 | 47.28 |
| 18 | 4 | 900.00 | 154.17 | 42.41 |
| 19 | 5 | 900.00 | 153.33 | 40.81 |
| 20 | 6 | 900.00 | 153.38 | 38.52 |
| mtry | ntree | error | dispersion | |
|---|---|---|---|---|
| 1 | 2 | 300.00 | 156.03 | 45.09 |
| 2 | 3 | 300.00 | 139.08 | 37.44 |
| 3 | 4 | 300.00 | 134.03 | 35.73 |
| 4 | 5 | 300.00 | 132.07 | 34.47 |
| 5 | 6 | 300.00 | 134.26 | 34.22 |
| 6 | 2 | 500.00 | 156.89 | 43.73 |
| 7 | 3 | 500.00 | 140.09 | 38.24 |
| 8 | 4 | 500.00 | 134.47 | 36.77 |
| 9 | 5 | 500.00 | 132.14 | 34.97 |
| 10 | 6 | 500.00 | 133.15 | 34.10 |
| 11 | 2 | 700.00 | 155.14 | 42.23 |
| 12 | 3 | 700.00 | 140.28 | 37.63 |
| 13 | 4 | 700.00 | 133.86 | 35.30 |
| 14 | 5 | 700.00 | 132.94 | 35.56 |
| 15 | 6 | 700.00 | 132.35 | 34.22 |
| 16 | 2 | 900.00 | 154.81 | 42.15 |
| 17 | 3 | 900.00 | 139.20 | 37.38 |
| 18 | 4 | 900.00 | 133.53 | 35.26 |
| 19 | 5 | 900.00 | 132.19 | 34.28 |
| 20 | 6 | 900.00 | 132.73 | 34.43 |
| Model | Tst | |
|---|---|---|
| M.rf | 0.99 | 0.00 |
| M.rfp | 0.99 | 0.90 |
| M.rfp_d | 0.99 | 0.90 |
| M.rfp_n | 0.98 | 0.91 |
The results found account for a correlation of 0.9867766.
Let’s test backpropagation trained multilayer perceptron type neural network do their work.
tb1=M.mlp[["model"]]$performances
table01=xtable(tb1)
print(table01,type="html")
| linout | size | maxit | decay | abstol | reltol | trace | rang | Var9 | skip | error | dispersion | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TRUE | 4 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 259.60 | 82.45 |
| 2 | TRUE | 5 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 242.60 | 54.45 |
| 3 | TRUE | 6 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 264.34 | 116.25 |
| 4 | TRUE | 7 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 224.27 | 38.48 |
| 5 | TRUE | 8 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 260.69 | 83.26 |
| 6 | TRUE | 9 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 255.36 | 72.33 |
| 7 | TRUE | 10 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 261.23 | 71.27 |
| 8 | TRUE | 11 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 256.37 | 78.00 |
| 9 | TRUE | 12 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 246.03 | 52.66 |
| 10 | TRUE | 13 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 246.70 | 68.25 |
| 11 | TRUE | 14 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 252.40 | 81.37 |
| 12 | TRUE | 15 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 241.24 | 64.81 |
| 13 | TRUE | 16 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 250.65 | 86.09 |
| 14 | TRUE | 17 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 244.98 | 64.81 |
| 15 | TRUE | 18 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 234.27 | 38.06 |
| 16 | TRUE | 19 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 247.28 | 39.78 |
| 17 | TRUE | 20 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 242.11 | 68.08 |
plt(NMHM,11,ylb=expression(O[3] ~ MLP ~ predicted),
fich="./plots/O3_MLP.pdf",model=M.mlp,pfile=TRUE)
## Loading required package: scales
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:plotrix':
##
## rescale
##
## The following object is masked from 'package:kernlab':
##
## alpha
png 2
#
tb2=M.mlpp[["model"]]$performances
table02=xtable(tb2)
print(table02,type="html")
| linout | size | maxit | decay | abstol | reltol | trace | rang | Var9 | skip | error | dispersion | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TRUE | 4 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 329.48 | 122.34 |
| 2 | TRUE | 5 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 291.34 | 88.21 |
| 3 | TRUE | 6 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 250.97 | 59.30 |
| 4 | TRUE | 7 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 310.19 | 91.12 |
| 5 | TRUE | 8 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 271.32 | 59.75 |
| 6 | TRUE | 9 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 251.58 | 74.39 |
| 7 | TRUE | 10 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 272.39 | 53.96 |
| 8 | TRUE | 11 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 284.81 | 106.74 |
| 9 | TRUE | 12 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 267.87 | 69.97 |
| 10 | TRUE | 13 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 247.36 | 87.15 |
| 11 | TRUE | 14 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 231.84 | 62.63 |
| 12 | TRUE | 15 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 219.36 | 44.35 |
| 13 | TRUE | 16 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 230.62 | 48.82 |
| 14 | TRUE | 17 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 221.28 | 38.94 |
| 15 | TRUE | 18 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 282.95 | 120.61 |
| 16 | TRUE | 19 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 243.16 | 57.78 |
| 17 | TRUE | 20 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 264.51 | 88.36 |
plt(NMHM.trn,11,ylb=expression(O[3] ~ MLP ~ predicted),
fich="./plots/O3_MLP_p.pdf",model=M.mlpp,pfile=TRUE)
png 2
plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ MLP ~ predicted),
fich="./plots/O3_MLP_pt.pdf",M.mlpp,pfile=TRUE)
[,1][1,] 0.8703565
#
tb3=M.mlpp_d[["model"]]$performances
table03=xtable(tb3)
print(table03,type="html")
| linout | size | maxit | decay | abstol | reltol | trace | rang | Var9 | skip | error | dispersion | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TRUE | 4 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 361.07 | 88.74 |
| 2 | TRUE | 5 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 383.94 | 104.40 |
| 3 | TRUE | 6 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 373.75 | 90.80 |
| 4 | TRUE | 7 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 373.96 | 120.73 |
| 5 | TRUE | 8 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 363.49 | 109.76 |
| 6 | TRUE | 9 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 352.02 | 120.13 |
| 7 | TRUE | 10 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 353.96 | 97.03 |
| 8 | TRUE | 11 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 348.52 | 94.01 |
| 9 | TRUE | 12 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 286.14 | 87.94 |
| 10 | TRUE | 13 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 319.39 | 96.06 |
| 11 | TRUE | 14 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 370.72 | 94.28 |
| 12 | TRUE | 15 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 300.65 | 74.76 |
| 13 | TRUE | 16 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 300.78 | 76.66 |
| 14 | TRUE | 17 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 362.18 | 74.90 |
| 15 | TRUE | 18 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 347.29 | 81.34 |
| 16 | TRUE | 19 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 323.19 | 115.32 |
| 17 | TRUE | 20 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 355.91 | 84.74 |
plt(NMHM.trn_d,11,ylb=expression(O[3] ~ MLP ~ predicted),
fich="./plots/O3_MLP_p_d.pdf",model=M.mlpp_d,pfile=TRUE)
png 2
plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ MLP ~ predicted),
fich="./plots/O3_MLP_pt_d.pdf",M.mlpp_d,pfile=TRUE)
[,1][1,] 0.7081085
#
tb4=M.mlpp_n[["model"]]$performances
table04=xtable(tb4)
print(table04,type="html")
| linout | size | maxit | decay | abstol | reltol | trace | rang | Var9 | skip | error | dispersion | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TRUE | 4 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 297.85 | 152.77 |
| 2 | TRUE | 5 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 284.27 | 117.50 |
| 3 | TRUE | 6 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 292.16 | 118.28 |
| 4 | TRUE | 7 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 292.66 | 95.62 |
| 5 | TRUE | 8 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 265.62 | 90.27 |
| 6 | TRUE | 9 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 282.91 | 116.31 |
| 7 | TRUE | 10 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 283.08 | 120.88 |
| 8 | TRUE | 11 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 243.87 | 121.62 |
| 9 | TRUE | 12 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 223.63 | 52.12 |
| 10 | TRUE | 13 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 245.45 | 99.07 |
| 11 | TRUE | 14 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 269.28 | 111.55 |
| 12 | TRUE | 15 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 243.03 | 47.81 |
| 13 | TRUE | 16 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 225.58 | 42.96 |
| 14 | TRUE | 17 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 227.47 | 49.95 |
| 15 | TRUE | 18 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 213.79 | 39.24 |
| 16 | TRUE | 19 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 221.43 | 43.15 |
| 17 | TRUE | 20 | 50000.00 | 0.02 | 0.00 | 0.00 | FALSE | 0.00 | 7.00 | TRUE | 233.99 | 63.65 |
plt(NMHM.trn_n,11,ylb=expression(O[3] ~ MLP ~ predicted),
fich="./plots/O3_MLP_p_n.pdf",model=M.mlpp_n,pfile=TRUE)
png 2
plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ MLP ~ predicted),
fich="./plots/O3_MLP_pt_n.pdf",M.mlpp_n,pfile=TRUE)
[,1][1,] 0.8417708
#
print(xtable(cc.mlp),type="html")
| Model | Tst | |
|---|---|---|
| M.mlp | 0.84 | 0.00 |
| M.mlpp | 0.89 | 0.87 |
| M.mlpp_d | 0.75 | 0.71 |
| M.mlpp_n | 0.86 | 0.84 |
#
The results found account for a correlation of 0.8405096.
Now we will use classification and regression trees to have a look at their capabilities for this particular problem.
tb1=M.rpt[["model"]]$performances
table01=xtable(tb1)
print(table01,type="html")
| method | cp | minsplit | error | dispersion | |
|---|---|---|---|---|---|
| 1 | anova | 0.01 | 3 | 349.44 | 30.44 |
| 2 | anova | 0.02 | 3 | 400.28 | 44.06 |
| 3 | anova | 0.03 | 3 | 414.68 | 39.11 |
| 4 | anova | 0.04 | 3 | 422.87 | 45.60 |
| 5 | anova | 0.05 | 3 | 422.87 | 45.60 |
| 6 | anova | 0.06 | 3 | 467.64 | 56.31 |
| 7 | anova | 0.07 | 3 | 554.15 | 76.08 |
| 8 | anova | 0.08 | 3 | 581.38 | 73.91 |
| 9 | anova | 0.09 | 3 | 653.18 | 81.16 |
| 10 | anova | 0.10 | 3 | 653.18 | 81.16 |
| 11 | anova | 0.01 | 4 | 349.44 | 30.44 |
| 12 | anova | 0.02 | 4 | 400.28 | 44.06 |
| 13 | anova | 0.03 | 4 | 414.68 | 39.11 |
| 14 | anova | 0.04 | 4 | 422.87 | 45.60 |
| 15 | anova | 0.05 | 4 | 422.87 | 45.60 |
| 16 | anova | 0.06 | 4 | 467.64 | 56.31 |
| 17 | anova | 0.07 | 4 | 554.15 | 76.08 |
| 18 | anova | 0.08 | 4 | 581.38 | 73.91 |
| 19 | anova | 0.09 | 4 | 653.18 | 81.16 |
| 20 | anova | 0.10 | 4 | 653.18 | 81.16 |
| 21 | anova | 0.01 | 5 | 349.44 | 30.44 |
| 22 | anova | 0.02 | 5 | 400.28 | 44.06 |
| 23 | anova | 0.03 | 5 | 414.68 | 39.11 |
| 24 | anova | 0.04 | 5 | 422.87 | 45.60 |
| 25 | anova | 0.05 | 5 | 422.87 | 45.60 |
| 26 | anova | 0.06 | 5 | 467.64 | 56.31 |
| 27 | anova | 0.07 | 5 | 554.15 | 76.08 |
| 28 | anova | 0.08 | 5 | 581.38 | 73.91 |
| 29 | anova | 0.09 | 5 | 653.18 | 81.16 |
| 30 | anova | 0.10 | 5 | 653.18 | 81.16 |
| 31 | anova | 0.01 | 6 | 349.44 | 30.44 |
| 32 | anova | 0.02 | 6 | 400.28 | 44.06 |
| 33 | anova | 0.03 | 6 | 414.68 | 39.11 |
| 34 | anova | 0.04 | 6 | 422.87 | 45.60 |
| 35 | anova | 0.05 | 6 | 422.87 | 45.60 |
| 36 | anova | 0.06 | 6 | 467.64 | 56.31 |
| 37 | anova | 0.07 | 6 | 554.15 | 76.08 |
| 38 | anova | 0.08 | 6 | 581.38 | 73.91 |
| 39 | anova | 0.09 | 6 | 653.18 | 81.16 |
| 40 | anova | 0.10 | 6 | 653.18 | 81.16 |
| 41 | anova | 0.01 | 7 | 349.44 | 30.44 |
| 42 | anova | 0.02 | 7 | 400.28 | 44.06 |
| 43 | anova | 0.03 | 7 | 414.68 | 39.11 |
| 44 | anova | 0.04 | 7 | 422.87 | 45.60 |
| 45 | anova | 0.05 | 7 | 422.87 | 45.60 |
| 46 | anova | 0.06 | 7 | 467.64 | 56.31 |
| 47 | anova | 0.07 | 7 | 554.15 | 76.08 |
| 48 | anova | 0.08 | 7 | 581.38 | 73.91 |
| 49 | anova | 0.09 | 7 | 653.18 | 81.16 |
| 50 | anova | 0.10 | 7 | 653.18 | 81.16 |
plt(NMHM,11,ylb=expression(O[3] ~ CART ~ predicted),
fich="./plots/O3_CRT.pdf",model=M.rpt,pfile=TRUE)
png 2
#
tb2=M.rptp[["model"]]$performances
table02=xtable(tb2)
print(table02,type="html")
| method | cp | minsplit | error | dispersion | |
|---|---|---|---|---|---|
| 1 | anova | 0.01 | 3 | 343.29 | 39.58 |
| 2 | anova | 0.02 | 3 | 400.65 | 44.92 |
| 3 | anova | 0.03 | 3 | 412.83 | 52.80 |
| 4 | anova | 0.04 | 3 | 423.83 | 54.04 |
| 5 | anova | 0.05 | 3 | 423.83 | 54.04 |
| 6 | anova | 0.06 | 3 | 465.38 | 47.29 |
| 7 | anova | 0.07 | 3 | 582.55 | 57.16 |
| 8 | anova | 0.08 | 3 | 582.55 | 57.16 |
| 9 | anova | 0.09 | 3 | 650.63 | 75.96 |
| 10 | anova | 0.10 | 3 | 655.60 | 67.33 |
| 11 | anova | 0.01 | 4 | 343.29 | 39.58 |
| 12 | anova | 0.02 | 4 | 400.65 | 44.92 |
| 13 | anova | 0.03 | 4 | 412.83 | 52.80 |
| 14 | anova | 0.04 | 4 | 423.83 | 54.04 |
| 15 | anova | 0.05 | 4 | 423.83 | 54.04 |
| 16 | anova | 0.06 | 4 | 465.38 | 47.29 |
| 17 | anova | 0.07 | 4 | 582.55 | 57.16 |
| 18 | anova | 0.08 | 4 | 582.55 | 57.16 |
| 19 | anova | 0.09 | 4 | 650.63 | 75.96 |
| 20 | anova | 0.10 | 4 | 655.60 | 67.33 |
| 21 | anova | 0.01 | 5 | 343.29 | 39.58 |
| 22 | anova | 0.02 | 5 | 400.65 | 44.92 |
| 23 | anova | 0.03 | 5 | 412.83 | 52.80 |
| 24 | anova | 0.04 | 5 | 423.83 | 54.04 |
| 25 | anova | 0.05 | 5 | 423.83 | 54.04 |
| 26 | anova | 0.06 | 5 | 465.38 | 47.29 |
| 27 | anova | 0.07 | 5 | 582.55 | 57.16 |
| 28 | anova | 0.08 | 5 | 582.55 | 57.16 |
| 29 | anova | 0.09 | 5 | 650.63 | 75.96 |
| 30 | anova | 0.10 | 5 | 655.60 | 67.33 |
| 31 | anova | 0.01 | 6 | 343.29 | 39.58 |
| 32 | anova | 0.02 | 6 | 400.65 | 44.92 |
| 33 | anova | 0.03 | 6 | 412.83 | 52.80 |
| 34 | anova | 0.04 | 6 | 423.83 | 54.04 |
| 35 | anova | 0.05 | 6 | 423.83 | 54.04 |
| 36 | anova | 0.06 | 6 | 465.38 | 47.29 |
| 37 | anova | 0.07 | 6 | 582.55 | 57.16 |
| 38 | anova | 0.08 | 6 | 582.55 | 57.16 |
| 39 | anova | 0.09 | 6 | 650.63 | 75.96 |
| 40 | anova | 0.10 | 6 | 655.60 | 67.33 |
| 41 | anova | 0.01 | 7 | 343.29 | 39.58 |
| 42 | anova | 0.02 | 7 | 400.65 | 44.92 |
| 43 | anova | 0.03 | 7 | 412.83 | 52.80 |
| 44 | anova | 0.04 | 7 | 423.83 | 54.04 |
| 45 | anova | 0.05 | 7 | 423.83 | 54.04 |
| 46 | anova | 0.06 | 7 | 465.38 | 47.29 |
| 47 | anova | 0.07 | 7 | 582.55 | 57.16 |
| 48 | anova | 0.08 | 7 | 582.55 | 57.16 |
| 49 | anova | 0.09 | 7 | 650.63 | 75.96 |
| 50 | anova | 0.10 | 7 | 655.60 | 67.33 |
plt(NMHM.trn,11,ylb=expression(O[3] ~ CART ~ predicted),
fich="./plots/O3_CRT_p.pdf",model=M.rptp,pfile=TRUE)
png 2
plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ CRT ~ predicted),
fich="./plots/O3_CRT_pt.pdf",M.rptp,pfile=TRUE)
[1] 0.7219471
#
tb3=M.rptp_d[["model"]]$performances
table03=xtable(tb3)
print(table03,type="html")
| method | cp | minsplit | error | dispersion | |
|---|---|---|---|---|---|
| 1 | anova | 0.01 | 3 | 362.88 | 96.39 |
| 2 | anova | 0.02 | 3 | 426.60 | 97.80 |
| 3 | anova | 0.03 | 3 | 432.92 | 92.01 |
| 4 | anova | 0.04 | 3 | 432.92 | 92.01 |
| 5 | anova | 0.05 | 3 | 452.56 | 91.79 |
| 6 | anova | 0.06 | 3 | 520.20 | 94.67 |
| 7 | anova | 0.07 | 3 | 526.24 | 90.38 |
| 8 | anova | 0.08 | 3 | 545.32 | 99.33 |
| 9 | anova | 0.09 | 3 | 567.23 | 95.87 |
| 10 | anova | 0.10 | 3 | 587.18 | 78.98 |
| 11 | anova | 0.01 | 4 | 362.88 | 96.39 |
| 12 | anova | 0.02 | 4 | 426.60 | 97.80 |
| 13 | anova | 0.03 | 4 | 432.92 | 92.01 |
| 14 | anova | 0.04 | 4 | 432.92 | 92.01 |
| 15 | anova | 0.05 | 4 | 452.56 | 91.79 |
| 16 | anova | 0.06 | 4 | 520.20 | 94.67 |
| 17 | anova | 0.07 | 4 | 526.24 | 90.38 |
| 18 | anova | 0.08 | 4 | 545.32 | 99.33 |
| 19 | anova | 0.09 | 4 | 567.23 | 95.87 |
| 20 | anova | 0.10 | 4 | 587.18 | 78.98 |
| 21 | anova | 0.01 | 5 | 362.88 | 96.39 |
| 22 | anova | 0.02 | 5 | 426.60 | 97.80 |
| 23 | anova | 0.03 | 5 | 432.92 | 92.01 |
| 24 | anova | 0.04 | 5 | 432.92 | 92.01 |
| 25 | anova | 0.05 | 5 | 452.56 | 91.79 |
| 26 | anova | 0.06 | 5 | 520.20 | 94.67 |
| 27 | anova | 0.07 | 5 | 526.24 | 90.38 |
| 28 | anova | 0.08 | 5 | 545.32 | 99.33 |
| 29 | anova | 0.09 | 5 | 567.23 | 95.87 |
| 30 | anova | 0.10 | 5 | 587.18 | 78.98 |
| 31 | anova | 0.01 | 6 | 362.88 | 96.39 |
| 32 | anova | 0.02 | 6 | 426.60 | 97.80 |
| 33 | anova | 0.03 | 6 | 432.92 | 92.01 |
| 34 | anova | 0.04 | 6 | 432.92 | 92.01 |
| 35 | anova | 0.05 | 6 | 452.56 | 91.79 |
| 36 | anova | 0.06 | 6 | 520.20 | 94.67 |
| 37 | anova | 0.07 | 6 | 526.24 | 90.38 |
| 38 | anova | 0.08 | 6 | 545.32 | 99.33 |
| 39 | anova | 0.09 | 6 | 567.23 | 95.87 |
| 40 | anova | 0.10 | 6 | 587.18 | 78.98 |
| 41 | anova | 0.01 | 7 | 362.88 | 96.39 |
| 42 | anova | 0.02 | 7 | 426.60 | 97.80 |
| 43 | anova | 0.03 | 7 | 432.92 | 92.01 |
| 44 | anova | 0.04 | 7 | 432.92 | 92.01 |
| 45 | anova | 0.05 | 7 | 452.56 | 91.79 |
| 46 | anova | 0.06 | 7 | 520.20 | 94.67 |
| 47 | anova | 0.07 | 7 | 526.24 | 90.38 |
| 48 | anova | 0.08 | 7 | 545.32 | 99.33 |
| 49 | anova | 0.09 | 7 | 567.23 | 95.87 |
| 50 | anova | 0.10 | 7 | 587.18 | 78.98 |
plt(NMHM.trn_d,11,ylb=expression(O[3] ~ CART ~ predicted),
fich="./plots/O3_CRT_p_d.pdf",model=M.rptp_d,pfile=TRUE)
png 2
plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ CART ~ predicted),
fich="./plots/O3_CRT_pt_d.pdf",M.rptp_d,pfile=TRUE)
[1] 0.7061534
#
tb4=M.rptp_n[["model"]]$performances
table04=xtable(tb4)
print(table04,type="html")
| method | cp | minsplit | error | dispersion | |
|---|---|---|---|---|---|
| 1 | anova | 0.01 | 3 | 298.92 | 61.61 |
| 2 | anova | 0.02 | 3 | 327.20 | 68.58 |
| 3 | anova | 0.03 | 3 | 327.20 | 68.58 |
| 4 | anova | 0.04 | 3 | 354.39 | 71.92 |
| 5 | anova | 0.05 | 3 | 352.24 | 69.73 |
| 6 | anova | 0.06 | 3 | 382.90 | 91.63 |
| 7 | anova | 0.07 | 3 | 490.43 | 100.80 |
| 8 | anova | 0.08 | 3 | 546.27 | 98.70 |
| 9 | anova | 0.09 | 3 | 546.27 | 98.70 |
| 10 | anova | 0.10 | 3 | 546.27 | 98.70 |
| 11 | anova | 0.01 | 4 | 298.92 | 61.61 |
| 12 | anova | 0.02 | 4 | 327.20 | 68.58 |
| 13 | anova | 0.03 | 4 | 327.20 | 68.58 |
| 14 | anova | 0.04 | 4 | 354.39 | 71.92 |
| 15 | anova | 0.05 | 4 | 352.24 | 69.73 |
| 16 | anova | 0.06 | 4 | 382.90 | 91.63 |
| 17 | anova | 0.07 | 4 | 490.43 | 100.80 |
| 18 | anova | 0.08 | 4 | 546.27 | 98.70 |
| 19 | anova | 0.09 | 4 | 546.27 | 98.70 |
| 20 | anova | 0.10 | 4 | 546.27 | 98.70 |
| 21 | anova | 0.01 | 5 | 298.92 | 61.61 |
| 22 | anova | 0.02 | 5 | 327.20 | 68.58 |
| 23 | anova | 0.03 | 5 | 327.20 | 68.58 |
| 24 | anova | 0.04 | 5 | 354.39 | 71.92 |
| 25 | anova | 0.05 | 5 | 352.24 | 69.73 |
| 26 | anova | 0.06 | 5 | 382.90 | 91.63 |
| 27 | anova | 0.07 | 5 | 490.43 | 100.80 |
| 28 | anova | 0.08 | 5 | 546.27 | 98.70 |
| 29 | anova | 0.09 | 5 | 546.27 | 98.70 |
| 30 | anova | 0.10 | 5 | 546.27 | 98.70 |
| 31 | anova | 0.01 | 6 | 298.92 | 61.61 |
| 32 | anova | 0.02 | 6 | 327.20 | 68.58 |
| 33 | anova | 0.03 | 6 | 327.20 | 68.58 |
| 34 | anova | 0.04 | 6 | 354.39 | 71.92 |
| 35 | anova | 0.05 | 6 | 352.24 | 69.73 |
| 36 | anova | 0.06 | 6 | 382.90 | 91.63 |
| 37 | anova | 0.07 | 6 | 490.43 | 100.80 |
| 38 | anova | 0.08 | 6 | 546.27 | 98.70 |
| 39 | anova | 0.09 | 6 | 546.27 | 98.70 |
| 40 | anova | 0.10 | 6 | 546.27 | 98.70 |
| 41 | anova | 0.01 | 7 | 302.95 | 62.23 |
| 42 | anova | 0.02 | 7 | 327.20 | 68.58 |
| 43 | anova | 0.03 | 7 | 327.20 | 68.58 |
| 44 | anova | 0.04 | 7 | 354.39 | 71.92 |
| 45 | anova | 0.05 | 7 | 352.24 | 69.73 |
| 46 | anova | 0.06 | 7 | 382.90 | 91.63 |
| 47 | anova | 0.07 | 7 | 490.43 | 100.80 |
| 48 | anova | 0.08 | 7 | 546.27 | 98.70 |
| 49 | anova | 0.09 | 7 | 546.27 | 98.70 |
| 50 | anova | 0.10 | 7 | 546.27 | 98.70 |
plt(NMHM.trn_n,11,ylb=expression(O[3] ~ CART ~ predicted),
fich="./plots/O3_CRT_p_n.pdf",model=M.rptp_n,pfile=TRUE)
png 2
plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ CART ~ predicted),
fich="./plots/O3_CRT_pt_n.pdf",M.rptp_n,pfile=TRUE)
[1] 0.7903194
#
print(xtable(cc.rpt),type="html")
| Model | Tst | |
|---|---|---|
| M.rpt | 0.77 | 0.00 |
| M.rptp | 0.78 | 0.72 |
| M.rptp_d | 0.82 | 0.71 |
| M.rptp_n | 0.81 | 0.79 |
#
The results found account for a correlation of 0.7698914.
After this short analysis we can conclude that:
| LM | SVM | RF | MLP | CART | |
|---|---|---|---|---|---|
| Full_Model | 0.62 | 0.96 | 0.99 | 0.84 | 0.77 |
| Partial_Model | 0.63 | 0.97 | 0.99 | 0.89 | 0.78 |
| Daily_P_Model | 0.74 | 0.96 | 0.99 | 0.75 | 0.82 |
| Nightly_P_Model | 0.61 | 0.96 | 0.98 | 0.86 | 0.81 |
| LM | SVM | RF | MLP | CART | |
|---|---|---|---|---|---|
| Partial_Model | 0.59 | 0.89 | 0.90 | 0.87 | 0.72 |
| Daily_P_Model | 0.69 | 0.90 | 0.90 | 0.71 | 0.71 |
| Nightly_P_Model | 0.62 | 0.91 | 0.91 | 0.84 | 0.79 |
From the figures, it is clear that RF produces some kind of understimation of higher values, probably because the data set is density imbalanced. Regarding this particular factor it exhibits a pretty nice performance the SVM technology.
In a global view we can conclude that the best fit was scored for 0.6261245 method with a corrlation factor of 0.9664414
Let’s see how it becomes the emsemble method