Processing data report for MHAMID station

Introduction

Data about pollution measured by this station has been provided by Prof Ouarzazi in 2013. It accounts for Co, NO2, Wind Speed, Temperature, PM10, SO2, Solar Radiation and Ozone hourly based.

This will try to forecast 48h ahead.

## Loading required package: MASS
## Loading required package: Cubist
## Loading required package: lattice
## Loading required package: caret
## Loading required package: ggplot2
## Loading required package: xtable
## Loading required package: Peaks
## Loading required package: magic
## Loading required package: abind
## Loading required package: doMC
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
## Loading required package: gbm
## Loading required package: survival
## 
## Attaching package: 'survival'
## 
## The following object is masked from 'package:caret':
## 
##     cluster
## 
## Loading required package: splines
## Loaded gbm 2.1.1
## Loading required package: segmented
## Loading required package: stringr
## Loading required package: ztable
## Welcome to package ztable ver 0.1.5
## Loading required package: doParallel
## Loading required package: signal
## 
## Attaching package: 'signal'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, poly
## 
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: plyr
## Loading required package: compare
## 
## Attaching package: 'compare'
## 
## The following object is masked from 'package:base':
## 
##     isTRUE
## 
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-10. For overview type 'help("mgcv-package")'.
## 
## Attaching package: 'mgcv'
## 
## The following object is masked from 'package:magic':
## 
##     magic
## 
## Loading required package: elasticnet
## Loading required package: lars
## Loaded lars 1.2
## 
## Loading required package: pbapply
## Loading required package: e1071
## Loading required package: nnet
## 
## Attaching package: 'nnet'
## 
## The following object is masked from 'package:mgcv':
## 
##     multinom
## 
## Loading required package: kernlab
## 
## Attaching package: 'kernlab'
## 
## The following object is masked from 'package:ggplot2':
## 
##     alpha
## 
## Loading required package: pls
## 
## Attaching package: 'pls'
## 
## The following object is masked from 'package:caret':
## 
##     R2
## 
## The following object is masked from 'package:stats':
## 
##     loadings
## 
## Loading required package: GA
## Package 'GA' version 2.2
## Type 'citation("GA")' for citing this R package in publications.
## Loading required package: devtools
## Loading required package: caretEnsemble
## Warning: replacing previous import by 'grid::arrow' when loading
## 'caretEnsemble'
## Warning: replacing previous import by 'grid::unit' when loading
## 'caretEnsemble'
## Loading required package: mlbench
## Loading required package: mclust
## Package 'mclust' version 5.1
## Type 'citation("mclust")' for citing this R package in publications.
## 
## Attaching package: 'mclust'
## 
## The following object is masked from 'package:mgcv':
## 
##     mvn
## 
## Loading required package: analogue
## Loading required package: vegan
## Loading required package: permute
## 
## Attaching package: 'permute'
## 
## The following object is masked from 'package:devtools':
## 
##     check
## 
## The following object is masked from 'package:kernlab':
## 
##     how
## 
## This is vegan 2.3-2
## 
## Attaching package: 'vegan'
## 
## The following object is masked from 'package:pls':
## 
##     scores
## 
## The following object is masked from 'package:caret':
## 
##     tolerance
## 
## This is analogue 0.16-3
## 
## Attaching package: 'analogue'
## 
## The following objects are masked from 'package:pls':
## 
##     crossval, pcr, RMSEP
## 
## The following object is masked from 'package:compare':
## 
##     compare
## 
## The following object is masked from 'package:plyr':
## 
##     join
## 
## Loading required package: cluster
## Loading required package: randomForest
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## 
## The following object is masked from 'package:ggplot2':
## 
##     margin
## 
## Loading required package: rpart
## Loading required package: party
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## 
## Attaching package: 'modeltools'
## 
## The following object is masked from 'package:kernlab':
## 
##     prior
## 
## The following object is masked from 'package:plyr':
## 
##     empty
## 
## Loading required package: strucchange
## Loading required package: sandwich
## 
## Attaching package: 'strucchange'
## 
## The following object is masked from 'package:stringr':
## 
##     boundary
## 
## Loading required package: fRegression
## Loading required package: timeDate
## 
## Attaching package: 'timeDate'
## 
## The following objects are masked from 'package:e1071':
## 
##     kurtosis, skewness
## 
## The following object is masked from 'package:xtable':
## 
##     align
## 
## Loading required package: timeSeries
## 
## Attaching package: 'timeSeries'
## 
## The following object is masked from 'package:randomForest':
## 
##     outlier
## 
## The following object is masked from 'package:analogue':
## 
##     smoothSpline
## 
## The following object is masked from 'package:zoo':
## 
##     time<-
## 
## Loading required package: fBasics
## 
## 
## Rmetrics Package fBasics
## Analysing Markets and calculating Basic Statistics
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
## 
## Attaching package: 'fBasics'
## 
## The following object is masked from 'package:modeltools':
## 
##     getModel
## 
## The following object is masked from 'package:signal':
## 
##     triang
## 
## The following object is masked from 'package:ztable':
## 
##     tr
## 
## 
## 
## Rmetrics Package fRegression
## Regression Based Decision and Prediction
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
## Loading required package: polspline
## Loading required package: VIF
## Loading required package: ridge
## Loading required package: mboost
## Loading required package: stabs
## 
## Attaching package: 'stabs'
## 
## The following object is masked from 'package:modeltools':
## 
##     parameters
## 
## This is mboost 2.5-0. See 'package?mboost' and 'news(package  = "mboost")'
## for a complete list of changes.
## 
## 
## Attaching package: 'mboost'
## 
## The following object is masked from 'package:ggplot2':
## 
##     %+%
## 
## Loading required package: earth
## Loading required package: plotmo
## Loading required package: plotrix
## Loading required package: TeachingDemos
## Loading required package: car
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:VIF':
## 
##     vif
## 
## The following object is masked from 'package:fBasics':
## 
##     densityPlot

Let’s load the data from the csv files

#
  hourf=48
if (file.exists(paste("MHAMID_",hourf,".RData",sep=""))) {
   load(paste("MHAMID_",hourf,".RData",sep=""))
} else {
  MHM<-read.csv(file="Mhamid_data.csv",sep=";",dec=",",
              header=TRUE)
  DMHM<-MHM[! is.na(as.Date(as.character(MHM[,1]))),
          1:ncol(MHM)]
  DMHM=DMHM[-as.numeric(which(apply(DMHM,1,function(x){return(sum(is.na(x)))}) > 0 )),]
  newc<-paste(as.character(DMHM[,1]),
      paste(DMHM[,2],":00:00",sep=""),sep= " ")
  newd<-strptime(newc,"%d/%m/%y %H:%M:%S")
  antes<-newd - 3600
  NMHM<-DMHM[,-2]
  NMHM[,1]<-as.data.frame(newd)
  colnames(NMHM)=gsub('.Mhamid','',colnames(NMHM))
  colnames(NMHM)[9]="WS"
  colnames(NMHM)[10]="SR"
  pNMHM=NMHM
  pNMHM[,11]=as.numeric(format(NMHM$Date,"%H"))
  colnames(pNMHM)[11]="Hour"
  colnames(pNMHM)[5]="C_O3"
  pNMHM[,12]=apply(NMHM,1,o3f,delta=hourf,ref=NMHM)
  colnames(pNMHM)[12]="O3"
  pNMHM=pNMHM[! is.na(pNMHM$O3),]
  save(MHM,DMHM,NMHM,pNMHM,hourf,file=paste("MHAMID_",hourf,".RData",sep=""))
}
rm(MHM)
NMHM=pNMHM
#
plt_pairs(NMHM[,-1],fich="./plots/MHM_pairs.pdf",pfile=TRUE)

## png 
##   2
# plt_pairs(pNMHM[,-1],fich="./plots/pMHM_pairs.pdf",pfile=TRUE)
      [,1]                            [,2]                           
 Date "Min.   :2009-06-01 01:00:00  " "1st Qu.:2009-12-26 00:00:00  "
  CO  "Min.   :0.000  "               "1st Qu.:0.030  "              
  HR  "Min.   : 8.00  "               "1st Qu.:40.00  "              
 NO2  "Min.   : 4.00  "               "1st Qu.:12.00  "              
 C_O3 "Min.   :  0.00  "              "1st Qu.: 25.00  "             
 PM10 "Min.   :   0.00  "             "1st Qu.:  30.00  "            
 SO2  "Min.   : 0.000  "              "1st Qu.: 6.000  "             
  TC  "Min.   : 4.20  "               "1st Qu.:15.20  "              
  WS  "Min.   :0.100  "               "1st Qu.:0.700  "              
  SR  "Min.   :   0.00  "             "1st Qu.:   0.00  "            
 Hour "Min.   : 0.00  "               "1st Qu.: 5.00  "              
  O3  "Min.   :  0.00  "              "1st Qu.: 25.00  "             
      [,3]                            [,4]                           
 Date "Median :2010-03-24 23:00:00  " "Mean   :2010-04-03 05:38:13  "
  CO  "Median :0.050  "               "Mean   :0.102  "              
  HR  "Median :58.00  "               "Mean   :57.41  "              
 NO2  "Median :19.00  "               "Mean   :23.52  "              
 C_O3 "Median : 36.00  "              "Mean   : 43.32  "             
 PM10 "Median :  49.00  "             "Mean   :  60.59  "            
 SO2  "Median : 9.000  "              "Mean   : 9.565  "             
  TC  "Median :19.30  "               "Mean   :20.42  "              
  WS  "Median :1.100  "               "Mean   :1.231  "              
  SR  "Median :   0.65  "             "Mean   : 195.67  "            
 Hour "Median :11.00  "               "Mean   :11.42  "              
  O3  "Median : 36.00  "              "Mean   : 43.29  "             
      [,5]                            [,6]                           
 Date "3rd Qu.:2010-08-07 01:00:00  " "Max.   :2010-11-26 23:00:00  "
  CO  "3rd Qu.:0.100  "               "Max.   :4.010  "              
  HR  "3rd Qu.:75.00  "               "Max.   :99.00  "              
 NO2  "3rd Qu.:30.00  "               "Max.   :98.00  "              
 C_O3 "3rd Qu.: 53.00  "              "Max.   :236.00  "             
 PM10 "3rd Qu.:  73.00  "             "Max.   :1599.00  "            
 SO2  "3rd Qu.:11.000  "              "Max.   :46.000  "             
  TC  "3rd Qu.:24.90  "               "Max.   :43.90  "              
  WS  "3rd Qu.:1.600  "               "Max.   :7.600  "              
  SR  "3rd Qu.: 369.10  "             "Max.   :1092.00  "            
 Hour "3rd Qu.:17.00  "               "Max.   :23.00  "              
  O3  "3rd Qu.: 54.00  "              "Max.   :270.00  "             

Numerical treatment will be performed by using the well known open source statistical environment R (http://www.r-project.org).

Processing

In order to compare with Prof Ouarzazi’s results (corr = 0.84) for a local based model O3 ~ remaining variables at the same period, we will use several technologies.

Basic methodology will be: * To apply cross correlation learning validation as it becomes more robust that the fixed approach 70%,15%,15% * To apply full validation to all dataset, after selecting the best model, as Prof Ouarzazi did. * The hourly based moted was selected as for learning what it is possible to do, even when \(O_3\) should be accounted by its maximum per day and/or the dosage by 8h periods, depending on the specific regulation. * Uncertainty about future predictors was removed as we were no predicting Ozone with any lag.

Linear approach as reference

A linear model is considered as reference, for comparison of results in order to evaluate the degree of linearity

#
print(xtable(as.data.frame(car::vif(lm(O3~.,data=NMHM[,-1])))),type="html")
car::vif(lm(O3 ~ ., data = NMHM[, -1]))
CO 1.88
HR 3.31
NO2 1.51
C_O3 1.44
PM10 1.63
SO2 1.15
TC 3.20
WS 1.21
SR 1.51
Hour 1.33
if (file.exists(paste("MHM_lm_",hourf,".RData",sep=""))) {
  load(paste("MHM_lm_",hourf,".RData",sep=""))
} else {
  rej=which(colnames(NMHM) %in% c("Date","SO2"))
  M.lm=m_lin(NMHM[,-rej],vprd="O3",vexp=".",cv=10)
#
  idx = sample(1:nrow(NMHM),floor(0.15*nrow(NMHM)),replace=FALSE)
  NMHM.trn = NMHM[-idx,]
  NMHM.tst = NMHM[idx,]
  M.lmp=m_lin(NMHM.trn[,-rej],vprd="O3",vexp=".",cv=10)
  c.lmp=plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt.pdf",M.lmp,pfile=FALSE)
# For the day
  NMHM.trn_d = NMHM.trn[NMHM.trn$SR>0,]
  NMHM.tst_d = NMHM.tst[NMHM.tst$SR>0,]
  M.lmp_d=m_lin(NMHM.trn_d[,-rej],vprd="O3",vexp=".",cv=10)
  c.lmp_d=plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt_d.pdf",M.lmp_d,pfile=FALSE)
# For the night
  NMHM.trn_n = NMHM.trn[NMHM.trn$SR==0,]
  NMHM.tst_n = NMHM.tst[NMHM.tst$SR==0,]
  rej2=which(colnames(NMHM) %in% c("Date","SO2","SR"))
  M.lmp_n=m_lin(NMHM.trn_n[,-rej2],vprd="O3",vexp=".",cv=10)
  c.lmp_n=plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt_n.pdf",M.lmp_n,pfile=FALSE)
#
  print(xtable(summary(M.lm[["model"]]$best.model)),type="html")
  print(xtable(summary(M.lmp[["model"]]$best.model)),type="html")
  print(xtable(summary(M.lmp_d[["model"]]$best.model)),type="html")
  print(xtable(summary(M.lmp_n[["model"]]$best.model)),type="html")
#
  r2=data.frame(r2=(summary(M.lm[["model"]]$best.model))$r.squared,
          r2adj=(summary(M.lm[["model"]]$best.model))$adj.r.squared)
  r2=rbind(r2,c((summary(M.lmp[["model"]]$best.model))$r.squared,
          r2adj=(summary(M.lmp[["model"]]$best.model))$adj.r.squared))
  r2=rbind(r2,c((summary(M.lmp_d[["model"]]$best.model))$r.squared,
          r2adj=(summary(M.lmp_d[["model"]]$best.model))$adj.r.squared))
  r2=rbind(r2,c((summary(M.lmp_n[["model"]]$best.model))$r.squared,
          r2adj=(summary(M.lmp_n[["model"]]$best.model))$adj.r.squared))
  rownames(r2)=c("M.lm","M.lmp","M.lmp_d","M.lmp_n")
  print(xtable(r2),type="html")
#
  cc=data.frame(Model=M.lm[["cc"]],Tst=0)
  cc=rbind(cc,c(M.lmp[["cc"]],c.lmp))  
  cc=rbind(cc,c(M.lmp_d[["cc"]],c.lmp_d))
  cc=rbind(cc,c(M.lmp_n[["cc"]],c.lmp_n))
  rownames(cc)=c("M.lm","M.lmp","M.lmp_d","M.lmp_n")
  print(xtable(cc),type="html")
#
  cc.lm=cc
  r2.lm=r2
  rm(list=c("cc","r2"))
  save(M.lm,M.lmp,M.lmp_d,M.lmp_n,NMHM,NMHM.trn,rej,rej2,
       NMHM.tst,NMHM.trn_d,NMHM.trn_n,cc.lm,r2.lm,
       NMHM.tst_d,NMHM.tst_n,file=paste("MHM_lm_",hourf,".RData",sep=""))
}
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.6075 1.6330 -4.66 0.0000
CO 3.2618 1.2288 2.65 0.0080
HR 0.0965 0.0134 7.19 0.0000
NO2 0.0771 0.0131 5.87 0.0000
C_O3 0.8508 0.0070 121.42 0.0000
PM10 0.0201 0.0036 5.61 0.0000
TC 0.2957 0.0418 7.08 0.0000
WS -1.8830 0.2606 -7.23 0.0000
SR 0.0058 0.0007 7.92 0.0000
Hour 0.0250 0.0273 0.92 0.3596
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.4280 1.7721 -4.76 0.0000
CO 5.1560 1.3675 3.77 0.0002
HR 0.0973 0.0145 6.69 0.0000
NO2 0.0687 0.0143 4.82 0.0000
C_O3 0.8663 0.0075 114.89 0.0000
PM10 0.0196 0.0038 5.09 0.0000
TC 0.3127 0.0452 6.92 0.0000
WS -1.8873 0.2809 -6.72 0.0000
SR 0.0058 0.0008 7.30 0.0000
Hour 0.0107 0.0296 0.36 0.7165
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.7894 3.1894 -6.20 0.0000
CO 3.0262 2.9799 1.02 0.3099
HR 0.1871 0.0241 7.78 0.0000
NO2 0.0223 0.0205 1.09 0.2779
C_O3 0.9192 0.0097 94.68 0.0000
PM10 0.0179 0.0047 3.80 0.0001
TC 0.2375 0.0664 3.58 0.0004
WS -1.4867 0.3914 -3.80 0.0001
SR 0.0059 0.0011 5.11 0.0000
Hour 0.5376 0.0935 5.75 0.0000
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.5155 2.2363 -2.91 0.0036
CO 1.7822 1.6339 1.09 0.2755
HR 0.0866 0.0187 4.63 0.0000
NO2 0.1550 0.0202 7.67 0.0000
C_O3 0.7745 0.0123 62.93 0.0000
PM10 0.0274 0.0069 3.94 0.0001
TC 0.4258 0.0617 6.90 0.0000
WS -3.0934 0.4273 -7.24 0.0000
Hour -0.0757 0.0302 -2.50 0.0123
r2 r2adj
M.lm 0.73 0.73
M.lmp 0.74 0.74
M.lmp_d 0.79 0.79
M.lmp_n 0.58 0.58
Model Tst
M.lm 0.86 0.00
M.lmp 0.86 0.82
M.lmp_d 0.89 0.86
M.lmp_n 0.76 0.71
tb1=M.lm[["model"]]$performances
table01=xtable(tb1)
print(table01,type="html")
dummyparameter error dispersion
1 0.00 219.25 16.38
plt(NMHM.trn,11,ylb=expression(O[3] ~ LM ~ predicted),
     fich="./plots/O3_LM.pdf",model=M.lm,pfile=TRUE)

png 2

#
tb2=M.lmp[["model"]]$performances
table02=xtable(tb2)
print(table02,type="html")
dummyparameter error dispersion
1 0.00 217.93 20.47
  plt(NMHM.trn,11,ylb=expression(O[3] ~ LM ~ predicted),
     fich="./plots/O3_LM_p.pdf",model=M.lmp,pfile=TRUE)

png 2

  plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt.pdf",M.lmp,pfile=TRUE)

[1] 0.8242181

#
tb3=M.lmp_d[["model"]]$performances
table03=xtable(tb3)
print(table03,type="html")
dummyparameter error dispersion
1 0.00 231.61 31.26
  plt(NMHM.trn_d,11,ylb=expression(O[3] ~ LM ~ predicted),
     fich="./plots/O3_LM_p_d.pdf",model=M.lmp_d,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt_d.pdf",M.lmp_d,pfile=TRUE)

[1] 0.8625648

#
tb4=M.lmp_n[["model"]]$performances
table04=xtable(tb4)
print(table04,type="html")
dummyparameter error dispersion
1 0.00 191.84 42.00
  plt(NMHM.trn_n,11,ylb=expression(O[3] ~ LM ~ predicted),
     fich="./plots/O3_LM_p_n.pdf",model=M.lmp_n,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt_n.pdf",M.lmp_n,pfile=TRUE)

[1] 0.7119393

#
print(xtable(cc.lm),type="html")
Model Tst
M.lm 0.86 0.00
M.lmp 0.86 0.82
M.lmp_d 0.89 0.86
M.lmp_n 0.76 0.71
#

The results found account for a correlation of 0.8572399. It will considered as a reference.

SVM approach

A wrapper for SVM based regressors is applied looking for best parameters of learning.

## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 07:34:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> kernel </th> <th> cost </th> <th> eps </th> <th> gamma </th> <th> nu </th> <th> rhp </th> <th> nsv </th> <th> degree </th> <th> type </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right"> 2.00 </td> <td align="right"> 8.00 </td> <td align="right"> 0.10 </td> <td align="right"> 0.25 </td> <td align="right"> 0.50 </td> <td align="right"> -0.54 </td> <td align="right"> 5882 </td> <td align="right"> 3.00 </td> <td align="right"> 3.00 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 07:34:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> kernel </th> <th> cost </th> <th> eps </th> <th> gamma </th> <th> nu </th> <th> rhp </th> <th> nsv </th> <th> degree </th> <th> type </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right"> 2.00 </td> <td align="right"> 4.00 </td> <td align="right"> 0.10 </td> <td align="right"> 0.25 </td> <td align="right"> 0.50 </td> <td align="right"> -0.50 </td> <td align="right"> 5038 </td> <td align="right"> 3.00 </td> <td align="right"> 3.00 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 07:34:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> kernel </th> <th> cost </th> <th> eps </th> <th> gamma </th> <th> nu </th> <th> rhp </th> <th> nsv </th> <th> degree </th> <th> type </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right"> 2.00 </td> <td align="right"> 4.00 </td> <td align="right"> 0.10 </td> <td align="right"> 0.25 </td> <td align="right"> 0.50 </td> <td align="right"> -0.34 </td> <td align="right"> 2413 </td> <td align="right"> 3.00 </td> <td align="right"> 3.00 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 07:34:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> kernel </th> <th> cost </th> <th> eps </th> <th> gamma </th> <th> nu </th> <th> rhp </th> <th> nsv </th> <th> degree </th> <th> type </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right"> 2.00 </td> <td align="right"> 8.00 </td> <td align="right"> 0.10 </td> <td align="right"> 0.50 </td> <td align="right"> 0.50 </td> <td align="right"> -0.44 </td> <td align="right"> 2689 </td> <td align="right"> 3.00 </td> <td align="right"> 3.00 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 07:34:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> r2 </th> <th> r2adj </th>  </tr>
##   <tr> <td align="right"> M.svm </td> <td align="right"> 0.89 </td> <td align="right"> 0.89 </td> </tr>
##   <tr> <td align="right"> M.svmp </td> <td align="right"> 0.88 </td> <td align="right"> 0.88 </td> </tr>
##   <tr> <td align="right"> M.svmp_d </td> <td align="right"> 0.92 </td> <td align="right"> 0.92 </td> </tr>
##   <tr> <td align="right"> M.svmp_n </td> <td align="right"> 0.91 </td> <td align="right"> 0.91 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 07:34:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> Model </th> <th> Tst </th>  </tr>
##   <tr> <td align="right"> M.svm </td> <td align="right"> 0.94 </td> <td align="right"> 0.00 </td> </tr>
##   <tr> <td align="right"> M.svmp </td> <td align="right"> 0.94 </td> <td align="right"> 0.87 </td> </tr>
##   <tr> <td align="right"> M.svmp_d </td> <td align="right"> 0.96 </td> <td align="right"> 0.90 </td> </tr>
##   <tr> <td align="right"> M.svmp_n </td> <td align="right"> 0.96 </td> <td align="right"> 0.81 </td> </tr>
##    </table>
tb1=t(summary(M.svm[["model"]]$performances))
table01=xtable(tb1)
print(table01,type="html")
V1 V2 V3 V4 V5 V6
 gamma </td> <td> Min.   :0.1250   </td> <td> 1st Qu.:0.1250   </td> <td> Median :0.2500   </td> <td> Mean   :0.2917   </td> <td> 3rd Qu.:0.5000   </td> <td> Max.   :0.5000   </td> </tr>
  cost </td> <td> Min.   :2.000   </td> <td> 1st Qu.:2.000   </td> <td> Median :4.000   </td> <td> Mean   :4.667   </td> <td> 3rd Qu.:8.000   </td> <td> Max.   :8.000   </td> </tr>
 error </td> <td> Min.   :159.8   </td> <td> 1st Qu.:163.0   </td> <td> Median :165.3   </td> <td> Mean   :165.4   </td> <td> 3rd Qu.:166.3   </td> <td> Max.   :173.5   </td> </tr>
dispersion Min. :13.16 1st Qu.:13.57 Median :14.91 Mean :14.93 3rd Qu.:15.50 Max. :17.47
  plt(NMHM,11,ylb=expression(O[3] ~ SVM ~ predicted),
     fich="./plots/O3_SVM.pdf",model=M.svm,pfile=TRUE)

png 2

#
tb2=t(summary(M.svmp[["model"]]$performances))
table02=xtable(tb2)
print(table02,type="html")
V1 V2 V3 V4 V5 V6
 gamma </td> <td> Min.   :0.1250   </td> <td> 1st Qu.:0.1250   </td> <td> Median :0.2500   </td> <td> Mean   :0.2917   </td> <td> 3rd Qu.:0.5000   </td> <td> Max.   :0.5000   </td> </tr>
  cost </td> <td> Min.   :2.000   </td> <td> 1st Qu.:2.000   </td> <td> Median :4.000   </td> <td> Mean   :4.667   </td> <td> 3rd Qu.:8.000   </td> <td> Max.   :8.000   </td> </tr>
 error </td> <td> Min.   :162.7   </td> <td> 1st Qu.:166.3   </td> <td> Median :169.5   </td> <td> Mean   :169.4   </td> <td> 3rd Qu.:173.1   </td> <td> Max.   :175.9   </td> </tr>
dispersion Min. :11.64 1st Qu.:13.56 Median :15.13 Mean :15.49 3rd Qu.:16.43 Max. :20.38
  plt(NMHM.trn,11,ylb=expression(O[3] ~ SVM ~ predicted),
     fich="./plots/O3_SVM_p.pdf",model=M.svmp,pfile=TRUE)

png 2

  plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ SVM ~ predicted),
        fich="./plots/O3_SVM_pt.pdf",M.svmp,pfile=TRUE)

[1] 0.8719688

#
tb3=t(summary(M.svmp_d[["model"]]$performances))
table03=xtable(tb3)
print(table03,type="html")
V1 V2 V3 V4 V5 V6
 gamma </td> <td> Min.   :0.1250   </td> <td> 1st Qu.:0.1250   </td> <td> Median :0.2500   </td> <td> Mean   :0.2917   </td> <td> 3rd Qu.:0.5000   </td> <td> Max.   :0.5000   </td> </tr>
  cost </td> <td> Min.   :2.000   </td> <td> 1st Qu.:2.000   </td> <td> Median :4.000   </td> <td> Mean   :4.667   </td> <td> 3rd Qu.:8.000   </td> <td> Max.   :8.000   </td> </tr>
 error </td> <td> Min.   :175.9   </td> <td> 1st Qu.:179.3   </td> <td> Median :179.7   </td> <td> Mean   :185.8   </td> <td> 3rd Qu.:195.5   </td> <td> Max.   :205.5   </td> </tr>
dispersion Min. :17.69 1st Qu.:19.33 Median :19.97 Mean :21.93 3rd Qu.:26.05 Max. :27.91
  plt(NMHM.trn_d,11,ylb=expression(O[3] ~ SVM ~ predicted),
     fich="./plots/O3_SVM_p_d.pdf",model=M.svmp_d,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ SVM ~ predicted),
        fich="./plots/O3_SVM_pt_d.pdf",M.svmp_d,pfile=TRUE)

[1] 0.9028872

#
tb4=t(summary(M.svmp_n[["model"]]$performances))
table04=xtable(tb4)
print(table04,type="html")
V1 V2 V3 V4 V5 V6
 gamma </td> <td> Min.   :0.1250   </td> <td> 1st Qu.:0.1250   </td> <td> Median :0.2500   </td> <td> Mean   :0.2917   </td> <td> 3rd Qu.:0.5000   </td> <td> Max.   :0.5000   </td> </tr>
  cost </td> <td> Min.   :2.000   </td> <td> 1st Qu.:2.000   </td> <td> Median :4.000   </td> <td> Mean   :4.667   </td> <td> 3rd Qu.:8.000   </td> <td> Max.   :8.000   </td> </tr>
 error </td> <td> Min.   :144.2   </td> <td> 1st Qu.:147.6   </td> <td> Median :151.0   </td> <td> Mean   :152.6   </td> <td> 3rd Qu.:158.9   </td> <td> Max.   :162.8   </td> </tr>
dispersion Min. :29.42 1st Qu.:29.86 Median :30.64 Mean :30.82 3rd Qu.:32.01 Max. :32.62
  plt(NMHM.trn_n,11,ylb=expression(O[3] ~ SVM ~ predicted),
     fich="./plots/O3_SVM_p_n.pdf",model=M.svmp_n,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ SVM ~ predicted),
        fich="./plots/O3_SVM_pt_n.pdf",M.svmp_n,pfile=TRUE)

[1] 0.8099317

#
print(xtable(cc.svm),type="html")
Model Tst
M.svm 0.94 0.00
M.svmp 0.94 0.87
M.svmp_d 0.96 0.90
M.svmp_n 0.96 0.81

The results found account for a correlation of 0.9449091 which outperforms the initial proposal carried out by Prof. Ouarzazi.

RandomForest

Let’s test the randomForest technology.

## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 12:59:25 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> %IncMSE </th> <th> IncNodePurity </th>  </tr>
##   <tr> <td align="right"> CO </td> <td align="right"> 57.03 </td> <td align="right"> 229172.28 </td> </tr>
##   <tr> <td align="right"> HR </td> <td align="right"> 133.74 </td> <td align="right"> 271488.63 </td> </tr>
##   <tr> <td align="right"> NO2 </td> <td align="right"> 94.59 </td> <td align="right"> 367136.01 </td> </tr>
##   <tr> <td align="right"> C_O3 </td> <td align="right"> 1021.33 </td> <td align="right"> 4090432.62 </td> </tr>
##   <tr> <td align="right"> PM10 </td> <td align="right"> 37.58 </td> <td align="right"> 224834.50 </td> </tr>
##   <tr> <td align="right"> TC </td> <td align="right"> 131.32 </td> <td align="right"> 378995.90 </td> </tr>
##   <tr> <td align="right"> WS </td> <td align="right"> 20.67 </td> <td align="right"> 157950.16 </td> </tr>
##   <tr> <td align="right"> SR </td> <td align="right"> 103.38 </td> <td align="right"> 523152.17 </td> </tr>
##   <tr> <td align="right"> Hour </td> <td align="right"> 117.01 </td> <td align="right"> 334947.28 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 12:59:25 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> %IncMSE </th> <th> IncNodePurity </th>  </tr>
##   <tr> <td align="right"> CO </td> <td align="right"> 56.62 </td> <td align="right"> 193342.88 </td> </tr>
##   <tr> <td align="right"> HR </td> <td align="right"> 125.76 </td> <td align="right"> 243566.46 </td> </tr>
##   <tr> <td align="right"> NO2 </td> <td align="right"> 95.58 </td> <td align="right"> 334385.66 </td> </tr>
##   <tr> <td align="right"> C_O3 </td> <td align="right"> 1045.73 </td> <td align="right"> 3536722.84 </td> </tr>
##   <tr> <td align="right"> PM10 </td> <td align="right"> 35.86 </td> <td align="right"> 185906.68 </td> </tr>
##   <tr> <td align="right"> TC </td> <td align="right"> 128.10 </td> <td align="right"> 335533.71 </td> </tr>
##   <tr> <td align="right"> WS </td> <td align="right"> 18.67 </td> <td align="right"> 133251.26 </td> </tr>
##   <tr> <td align="right"> SR </td> <td align="right"> 100.75 </td> <td align="right"> 479583.70 </td> </tr>
##   <tr> <td align="right"> Hour </td> <td align="right"> 116.83 </td> <td align="right"> 304133.73 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 12:59:25 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> %IncMSE </th> <th> IncNodePurity </th>  </tr>
##   <tr> <td align="right"> CO </td> <td align="right"> 43.97 </td> <td align="right"> 111895.95 </td> </tr>
##   <tr> <td align="right"> HR </td> <td align="right"> 139.01 </td> <td align="right"> 151654.95 </td> </tr>
##   <tr> <td align="right"> NO2 </td> <td align="right"> 121.97 </td> <td align="right"> 241964.20 </td> </tr>
##   <tr> <td align="right"> C_O3 </td> <td align="right"> 1476.92 </td> <td align="right"> 2367957.40 </td> </tr>
##   <tr> <td align="right"> PM10 </td> <td align="right"> 33.74 </td> <td align="right"> 97727.43 </td> </tr>
##   <tr> <td align="right"> TC </td> <td align="right"> 109.62 </td> <td align="right"> 179191.81 </td> </tr>
##   <tr> <td align="right"> WS </td> <td align="right"> 21.40 </td> <td align="right"> 72493.98 </td> </tr>
##   <tr> <td align="right"> SR </td> <td align="right"> 96.46 </td> <td align="right"> 223458.28 </td> </tr>
##   <tr> <td align="right"> Hour </td> <td align="right"> 117.91 </td> <td align="right"> 241243.18 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 12:59:25 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> %IncMSE </th> <th> IncNodePurity </th>  </tr>
##   <tr> <td align="right"> CO </td> <td align="right"> 50.39 </td> <td align="right"> 81616.85 </td> </tr>
##   <tr> <td align="right"> HR </td> <td align="right"> 60.20 </td> <td align="right"> 88190.99 </td> </tr>
##   <tr> <td align="right"> NO2 </td> <td align="right"> 90.87 </td> <td align="right"> 140871.96 </td> </tr>
##   <tr> <td align="right"> C_O3 </td> <td align="right"> 527.45 </td> <td align="right"> 876222.74 </td> </tr>
##   <tr> <td align="right"> PM10 </td> <td align="right"> 40.11 </td> <td align="right"> 91241.72 </td> </tr>
##   <tr> <td align="right"> TC </td> <td align="right"> 91.88 </td> <td align="right"> 116270.38 </td> </tr>
##   <tr> <td align="right"> WS </td> <td align="right"> 19.25 </td> <td align="right"> 62142.00 </td> </tr>
##   <tr> <td align="right"> Hour </td> <td align="right"> 46.72 </td> <td align="right"> 65045.42 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 12:59:25 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> r2 </th> <th> r2adj </th>  </tr>
##   <tr> <td align="right"> M.rf </td> <td align="right"> 0.97 </td> <td align="right"> 0.97 </td> </tr>
##   <tr> <td align="right"> M.rfp </td> <td align="right"> 0.97 </td> <td align="right"> 0.97 </td> </tr>
##   <tr> <td align="right"> M.rfp_d </td> <td align="right"> 0.97 </td> <td align="right"> 0.97 </td> </tr>
##   <tr> <td align="right"> M.rfp_n </td> <td align="right"> 0.95 </td> <td align="right"> 0.95 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 12:59:25 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> Model </th> <th> Tst </th>  </tr>
##   <tr> <td align="right"> M.rf </td> <td align="right"> 0.98 </td> <td align="right"> 0.00 </td> </tr>
##   <tr> <td align="right"> M.rfp </td> <td align="right"> 0.98 </td> <td align="right"> 0.89 </td> </tr>
##   <tr> <td align="right"> M.rfp_d </td> <td align="right"> 0.99 </td> <td align="right"> 0.91 </td> </tr>
##   <tr> <td align="right"> M.rfp_n </td> <td align="right"> 0.98 </td> <td align="right"> 0.82 </td> </tr>
##    </table>
mtry ntree error dispersion
1 2 300.00 150.36 10.03
2 3 300.00 145.17 9.31
3 4 300.00 144.58 9.20
4 5 300.00 146.13 10.18
5 6 300.00 146.80 10.57
6 2 500.00 149.60 9.57
7 3 500.00 145.07 9.59
8 4 500.00 144.70 9.30
9 5 500.00 145.35 10.29
10 6 500.00 146.95 10.58
11 2 700.00 149.93 9.67
12 3 700.00 144.77 9.29
13 4 700.00 144.53 9.56
14 5 700.00 145.30 9.73
15 6 700.00 146.79 10.32
16 2 900.00 149.57 9.40
17 3 900.00 145.01 9.42
18 4 900.00 144.62 9.72
19 5 900.00 145.42 9.96
20 6 900.00 146.66 10.31
png 2
mtry ntree error dispersion
1 2 300.00 155.14 14.79
2 3 300.00 149.70 15.88
3 4 300.00 149.19 15.96
4 5 300.00 151.06 16.77
5 6 300.00 152.57 17.99
6 2 500.00 154.63 15.14
7 3 500.00 149.52 15.63
8 4 500.00 149.41 15.80
9 5 500.00 150.29 16.56
10 6 500.00 152.39 17.29
11 2 700.00 154.79 15.78
12 3 700.00 148.99 14.94
13 4 700.00 149.31 15.89
14 5 700.00 150.32 16.73
15 6 700.00 151.85 17.26
16 2 900.00 154.61 15.35
17 3 900.00 149.05 15.41
18 4 900.00 148.81 15.46
19 5 900.00 150.47 16.90
20 6 900.00 152.30 17.57
png 2 [1] 0.8881983
mtry ntree error dispersion
1 2 300.00 175.33 28.35
2 3 300.00 166.25 27.49
3 4 300.00 164.33 29.89
4 5 300.00 165.82 29.35
5 6 300.00 167.31 31.04
6 2 500.00 175.50 29.69
7 3 500.00 165.84 28.74
8 4 500.00 164.58 28.82
9 5 500.00 165.74 28.81
10 6 500.00 167.81 31.08
11 2 700.00 174.97 28.63
12 3 700.00 165.66 26.88
13 4 700.00 164.74 28.95
14 5 700.00 165.36 29.74
15 6 700.00 167.20 30.03
16 2 900.00 174.95 28.71
17 3 900.00 166.53 28.30
18 4 900.00 164.09 27.86
19 5 900.00 165.39 30.27
20 6 900.00 166.80 30.51
png 2 [1] 0.9115427
mtry ntree error dispersion
1 2 300.00 140.02 13.53
2 3 300.00 138.08 13.68
3 4 300.00 136.01 13.45
4 5 300.00 137.97 14.32
5 6 300.00 137.97 13.23
6 2 500.00 139.92 12.92
7 3 500.00 136.49 13.20
8 4 500.00 137.30 14.10
9 5 500.00 137.87 13.57
10 6 500.00 138.23 13.63
11 2 700.00 139.26 13.60
12 3 700.00 136.68 12.93
13 4 700.00 136.69 13.54
14 5 700.00 137.43 13.65
15 6 700.00 137.41 13.72
16 2 900.00 139.27 12.76
17 3 900.00 136.69 13.21
18 4 900.00 137.21 13.52
19 5 900.00 137.11 13.62
20 6 900.00 138.10 13.87
png 2 [1] 0.8158353
Model Tst
M.rf 0.98 0.00
M.rfp 0.98 0.89
M.rfp_d 0.99 0.91
M.rfp_n 0.98 0.82

The results found account for a correlation of 0.9849308.

FFNN: MLP

Let’s test backpropagation trained multilayer perceptron type neural network do their work.

## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:55:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> M.mlp[["model"]]$best.model$n </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right">   9 </td> </tr>
##   <tr> <td align="right"> 2 </td> <td align="right">  20 </td> </tr>
##   <tr> <td align="right"> 3 </td> <td align="right">   1 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:55:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> M.mlpp[["model"]]$best.model$n </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right">   9 </td> </tr>
##   <tr> <td align="right"> 2 </td> <td align="right">  20 </td> </tr>
##   <tr> <td align="right"> 3 </td> <td align="right">   1 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:55:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> M.mlpp_d[["model"]]$best.model$n </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right">   9 </td> </tr>
##   <tr> <td align="right"> 2 </td> <td align="right">   7 </td> </tr>
##   <tr> <td align="right"> 3 </td> <td align="right">   1 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:55:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> M.mlpp_n[["model"]]$best.model$n </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right">   8 </td> </tr>
##   <tr> <td align="right"> 2 </td> <td align="right">   6 </td> </tr>
##   <tr> <td align="right"> 3 </td> <td align="right">   1 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:55:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> r2 </th> <th> r2adj </th>  </tr>
##   <tr> <td align="right"> M.mlp </td> <td align="right"> 0.76 </td> <td align="right"> 0.76 </td> </tr>
##   <tr> <td align="right"> M.mlpp </td> <td align="right"> 0.76 </td> <td align="right"> 0.76 </td> </tr>
##   <tr> <td align="right"> M.mlpp_d </td> <td align="right"> 0.81 </td> <td align="right"> 0.81 </td> </tr>
##   <tr> <td align="right"> M.mlpp_n </td> <td align="right"> 0.61 </td> <td align="right"> 0.61 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:55:39 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> Model </th> <th> Tst </th>  </tr>
##   <tr> <td align="right"> M.mlp </td> <td align="right"> 0.87 </td> <td align="right"> 0.00 </td> </tr>
##   <tr> <td align="right"> M.mlpp </td> <td align="right"> 0.87 </td> <td align="right"> 0.83 </td> </tr>
##   <tr> <td align="right"> M.mlpp_d </td> <td align="right"> 0.90 </td> <td align="right"> 0.87 </td> </tr>
##   <tr> <td align="right"> M.mlpp_n </td> <td align="right"> 0.78 </td> <td align="right"> 0.71 </td> </tr>
##    </table>
tb1=M.mlp[["model"]]$performances
table01=xtable(tb1)
print(table01,type="html")
linout size maxit decay abstol reltol trace rang Var9 skip error dispersion
1 TRUE 4 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 216.57 11.33
2 TRUE 5 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 218.00 13.91
3 TRUE 6 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 217.75 14.70
4 TRUE 7 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 217.20 11.62
5 TRUE 8 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.80 13.95
6 TRUE 9 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 216.86 14.99
7 TRUE 10 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.67 11.83
8 TRUE 11 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.12 13.02
9 TRUE 12 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 212.56 11.02
10 TRUE 13 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 216.38 16.40
11 TRUE 14 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.21 16.30
12 TRUE 15 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 211.84 14.72
13 TRUE 16 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.56 13.80
14 TRUE 17 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 212.16 13.63
15 TRUE 18 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.29 12.25
16 TRUE 19 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 213.80 12.58
17 TRUE 20 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 209.43 15.63
  plt(NMHM,11,ylb=expression(O[3] ~ MLP ~ predicted),
     fich="./plots/O3_MLP.pdf",model=M.mlp,pfile=TRUE)
## Loading required package: scales
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:plotrix':
## 
##     rescale
## 
## The following object is masked from 'package:kernlab':
## 
##     alpha

png 2

#
tb2=M.mlpp[["model"]]$performances
table02=xtable(tb2)
print(table02,type="html")
linout size maxit decay abstol reltol trace rang Var9 skip error dispersion
1 TRUE 4 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.30 20.31
2 TRUE 5 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 213.79 19.43
3 TRUE 6 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 217.09 18.04
4 TRUE 7 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 216.47 21.43
5 TRUE 8 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.68 17.53
6 TRUE 9 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 215.47 22.63
7 TRUE 10 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.56 20.16
8 TRUE 11 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 213.87 17.59
9 TRUE 12 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 216.00 19.40
10 TRUE 13 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 213.23 18.96
11 TRUE 14 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 212.64 18.24
12 TRUE 15 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.81 19.55
13 TRUE 16 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 213.85 19.96
14 TRUE 17 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 215.60 21.36
15 TRUE 18 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 212.57 16.52
16 TRUE 19 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 214.74 19.82
17 TRUE 20 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 210.40 22.89
  plt(NMHM.trn,11,ylb=expression(O[3] ~ MLP ~ predicted),
     fich="./plots/O3_MLP_p.pdf",model=M.mlpp,pfile=TRUE)

png 2

  plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ MLP ~ predicted),
        fich="./plots/O3_MLP_pt.pdf",M.mlpp,pfile=TRUE)

[,1][1,] 0.8315941

#
tb3=M.mlpp_d[["model"]]$performances
table03=xtable(tb3)
print(table03,type="html")
linout size maxit decay abstol reltol trace rang Var9 skip error dispersion
1 TRUE 4 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 226.34 28.45
2 TRUE 5 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 231.26 29.15
3 TRUE 6 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 231.00 33.93
4 TRUE 7 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 224.57 24.86
5 TRUE 8 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 225.00 30.45
6 TRUE 9 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 230.51 29.18
7 TRUE 10 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 228.48 30.54
8 TRUE 11 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 229.91 27.61
9 TRUE 12 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 224.84 27.82
10 TRUE 13 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 229.69 26.32
11 TRUE 14 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 229.44 22.64
12 TRUE 15 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 235.05 33.53
13 TRUE 16 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 229.95 31.85
14 TRUE 17 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 229.49 31.32
15 TRUE 18 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 226.67 31.52
16 TRUE 19 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 225.61 28.68
17 TRUE 20 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 228.80 29.88
  plt(NMHM.trn_d,11,ylb=expression(O[3] ~ MLP ~ predicted),
     fich="./plots/O3_MLP_p_d.pdf",model=M.mlpp_d,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ MLP ~ predicted),
        fich="./plots/O3_MLP_pt_d.pdf",M.mlpp_d,pfile=TRUE)

[,1][1,] 0.8666309

#
tb4=M.mlpp_n[["model"]]$performances
table04=xtable(tb4)
print(table04,type="html")
linout size maxit decay abstol reltol trace rang Var9 skip error dispersion
1 TRUE 4 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 191.46 18.48
2 TRUE 5 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 190.72 20.75
3 TRUE 6 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 188.26 19.42
4 TRUE 7 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 192.25 20.39
5 TRUE 8 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 188.82 19.06
6 TRUE 9 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 193.21 21.39
7 TRUE 10 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 193.35 21.90
8 TRUE 11 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 193.26 25.22
9 TRUE 12 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 190.42 26.21
10 TRUE 13 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 195.78 16.96
11 TRUE 14 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 192.55 24.31
12 TRUE 15 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 191.25 27.31
13 TRUE 16 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 188.98 24.64
14 TRUE 17 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 192.78 24.03
15 TRUE 18 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 195.06 23.77
16 TRUE 19 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 190.44 21.26
17 TRUE 20 50000.00 0.02 0.00 0.00 FALSE 0.00 7.00 TRUE 193.56 26.52
  plt(NMHM.trn_n,11,ylb=expression(O[3] ~ MLP ~ predicted),
     fich="./plots/O3_MLP_p_n.pdf",model=M.mlpp_n,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ MLP ~ predicted),
        fich="./plots/O3_MLP_pt_n.pdf",M.mlpp_n,pfile=TRUE)

[,1][1,] 0.7132626

#
print(xtable(cc.mlp),type="html")
Model Tst
M.mlp 0.87 0.00
M.mlpp 0.87 0.83
M.mlpp_d 0.90 0.87
M.mlpp_n 0.78 0.71
#

The results found account for a correlation of 0.8703202.

CART solution

Now we will use classification and regression trees to have a look at their capabilities for this particular problem.

## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:57:10 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> CP </th> <th> nsplit </th> <th> rel error </th> <th> xerror </th> <th> xstd </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right"> 0.49 </td> <td align="right"> 0.00 </td> <td align="right"> 1.00 </td> <td align="right"> 1.00 </td> <td align="right"> 0.03 </td> </tr>
##   <tr> <td align="right"> 2 </td> <td align="right"> 0.12 </td> <td align="right"> 1.00 </td> <td align="right"> 0.51 </td> <td align="right"> 0.52 </td> <td align="right"> 0.02 </td> </tr>
##   <tr> <td align="right"> 3 </td> <td align="right"> 0.06 </td> <td align="right"> 2.00 </td> <td align="right"> 0.40 </td> <td align="right"> 0.41 </td> <td align="right"> 0.01 </td> </tr>
##   <tr> <td align="right"> 4 </td> <td align="right"> 0.04 </td> <td align="right"> 3.00 </td> <td align="right"> 0.34 </td> <td align="right"> 0.35 </td> <td align="right"> 0.01 </td> </tr>
##   <tr> <td align="right"> 5 </td> <td align="right"> 0.02 </td> <td align="right"> 4.00 </td> <td align="right"> 0.30 </td> <td align="right"> 0.31 </td> <td align="right"> 0.01 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:57:10 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> CP </th> <th> nsplit </th> <th> rel error </th> <th> xerror </th> <th> xstd </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right"> 0.49 </td> <td align="right"> 0.00 </td> <td align="right"> 1.00 </td> <td align="right"> 1.00 </td> <td align="right"> 0.03 </td> </tr>
##   <tr> <td align="right"> 2 </td> <td align="right"> 0.12 </td> <td align="right"> 1.00 </td> <td align="right"> 0.51 </td> <td align="right"> 0.52 </td> <td align="right"> 0.02 </td> </tr>
##   <tr> <td align="right"> 3 </td> <td align="right"> 0.06 </td> <td align="right"> 2.00 </td> <td align="right"> 0.39 </td> <td align="right"> 0.41 </td> <td align="right"> 0.01 </td> </tr>
##   <tr> <td align="right"> 4 </td> <td align="right"> 0.02 </td> <td align="right"> 3.00 </td> <td align="right"> 0.33 </td> <td align="right"> 0.34 </td> <td align="right"> 0.01 </td> </tr>
##   <tr> <td align="right"> 5 </td> <td align="right"> 0.02 </td> <td align="right"> 4.00 </td> <td align="right"> 0.31 </td> <td align="right"> 0.31 </td> <td align="right"> 0.01 </td> </tr>
##   <tr> <td align="right"> 6 </td> <td align="right"> 0.01 </td> <td align="right"> 5.00 </td> <td align="right"> 0.29 </td> <td align="right"> 0.29 </td> <td align="right"> 0.01 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:57:10 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> CP </th> <th> nsplit </th> <th> rel error </th> <th> xerror </th> <th> xstd </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right"> 0.53 </td> <td align="right"> 0.00 </td> <td align="right"> 1.00 </td> <td align="right"> 1.00 </td> <td align="right"> 0.04 </td> </tr>
##   <tr> <td align="right"> 2 </td> <td align="right"> 0.11 </td> <td align="right"> 1.00 </td> <td align="right"> 0.47 </td> <td align="right"> 0.47 </td> <td align="right"> 0.02 </td> </tr>
##   <tr> <td align="right"> 3 </td> <td align="right"> 0.08 </td> <td align="right"> 2.00 </td> <td align="right"> 0.36 </td> <td align="right"> 0.37 </td> <td align="right"> 0.01 </td> </tr>
##   <tr> <td align="right"> 4 </td> <td align="right"> 0.02 </td> <td align="right"> 3.00 </td> <td align="right"> 0.28 </td> <td align="right"> 0.29 </td> <td align="right"> 0.01 </td> </tr>
##   <tr> <td align="right"> 5 </td> <td align="right"> 0.01 </td> <td align="right"> 4.00 </td> <td align="right"> 0.26 </td> <td align="right"> 0.27 </td> <td align="right"> 0.01 </td> </tr>
##   <tr> <td align="right"> 6 </td> <td align="right"> 0.01 </td> <td align="right"> 5.00 </td> <td align="right"> 0.25 </td> <td align="right"> 0.26 </td> <td align="right"> 0.01 </td> </tr>
##   <tr> <td align="right"> 7 </td> <td align="right"> 0.01 </td> <td align="right"> 6.00 </td> <td align="right"> 0.23 </td> <td align="right"> 0.24 </td> <td align="right"> 0.01 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:57:10 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> CP </th> <th> nsplit </th> <th> rel error </th> <th> xerror </th> <th> xstd </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td align="right"> 0.44 </td> <td align="right"> 0.00 </td> <td align="right"> 1.00 </td> <td align="right"> 1.00 </td> <td align="right"> 0.04 </td> </tr>
##   <tr> <td align="right"> 2 </td> <td align="right"> 0.05 </td> <td align="right"> 1.00 </td> <td align="right"> 0.56 </td> <td align="right"> 0.56 </td> <td align="right"> 0.03 </td> </tr>
##   <tr> <td align="right"> 3 </td> <td align="right"> 0.05 </td> <td align="right"> 2.00 </td> <td align="right"> 0.50 </td> <td align="right"> 0.52 </td> <td align="right"> 0.02 </td> </tr>
##   <tr> <td align="right"> 4 </td> <td align="right"> 0.01 </td> <td align="right"> 3.00 </td> <td align="right"> 0.45 </td> <td align="right"> 0.46 </td> <td align="right"> 0.02 </td> </tr>
##   <tr> <td align="right"> 5 </td> <td align="right"> 0.01 </td> <td align="right"> 4.00 </td> <td align="right"> 0.44 </td> <td align="right"> 0.46 </td> <td align="right"> 0.02 </td> </tr>
##    </table>
## <!-- html table generated in R 3.2.2 by xtable 1.8-0 package -->
## <!-- Mon Dec 28 13:57:10 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> Model </th> <th> Tst </th>  </tr>
##   <tr> <td align="right"> M.rpt </td> <td align="right"> 0.84 </td> <td align="right"> 0.00 </td> </tr>
##   <tr> <td align="right"> M.rptp </td> <td align="right"> 0.84 </td> <td align="right"> 0.81 </td> </tr>
##   <tr> <td align="right"> M.rptp_d </td> <td align="right"> 0.88 </td> <td align="right"> 0.86 </td> </tr>
##   <tr> <td align="right"> M.rptp_n </td> <td align="right"> 0.75 </td> <td align="right"> 0.72 </td> </tr>
##    </table>
tb1=M.rpt[["model"]]$performances
table01=xtable(tb1)
print(table01,type="html")
method cp minsplit error dispersion
1 anova 0.01 3 259.20 19.90
2 anova 0.02 3 258.75 18.79
3 anova 0.03 3 261.82 19.86
4 anova 0.04 3 285.48 22.94
5 anova 0.05 3 288.85 19.16
6 anova 0.06 3 295.95 22.55
7 anova 0.07 3 338.75 23.29
8 anova 0.08 3 338.75 23.29
9 anova 0.09 3 338.75 23.29
10 anova 0.10 3 338.75 23.29
11 anova 0.01 4 259.20 19.90
12 anova 0.02 4 258.75 18.79
13 anova 0.03 4 261.82 19.86
14 anova 0.04 4 285.48 22.94
15 anova 0.05 4 288.85 19.16
16 anova 0.06 4 295.95 22.55
17 anova 0.07 4 338.75 23.29
18 anova 0.08 4 338.75 23.29
19 anova 0.09 4 338.75 23.29
20 anova 0.10 4 338.75 23.29
21 anova 0.01 5 259.20 19.90
22 anova 0.02 5 258.75 18.79
23 anova 0.03 5 261.82 19.86
24 anova 0.04 5 285.48 22.94
25 anova 0.05 5 288.85 19.16
26 anova 0.06 5 295.95 22.55
27 anova 0.07 5 338.75 23.29
28 anova 0.08 5 338.75 23.29
29 anova 0.09 5 338.75 23.29
30 anova 0.10 5 338.75 23.29
31 anova 0.01 6 259.20 19.90
32 anova 0.02 6 258.75 18.79
33 anova 0.03 6 261.82 19.86
34 anova 0.04 6 285.48 22.94
35 anova 0.05 6 288.85 19.16
36 anova 0.06 6 295.95 22.55
37 anova 0.07 6 338.75 23.29
38 anova 0.08 6 338.75 23.29
39 anova 0.09 6 338.75 23.29
40 anova 0.10 6 338.75 23.29
41 anova 0.01 7 259.20 19.90
42 anova 0.02 7 258.75 18.79
43 anova 0.03 7 261.82 19.86
44 anova 0.04 7 285.48 22.94
45 anova 0.05 7 288.85 19.16
46 anova 0.06 7 295.95 22.55
47 anova 0.07 7 338.75 23.29
48 anova 0.08 7 338.75 23.29
49 anova 0.09 7 338.75 23.29
50 anova 0.10 7 338.75 23.29
  plt(NMHM,11,ylb=expression(O[3] ~ CART ~ predicted),
     fich="./plots/O3_CRT.pdf",model=M.rpt,pfile=TRUE)

png 2

#
tb2=M.rptp[["model"]]$performances
table02=xtable(tb2)
print(table02,type="html")
method cp minsplit error dispersion
1 anova 0.01 3 252.10 25.90
2 anova 0.02 3 265.57 28.53
3 anova 0.03 3 281.56 33.19
4 anova 0.04 3 290.55 28.73
5 anova 0.05 3 290.55 28.73
6 anova 0.06 3 309.88 29.80
7 anova 0.07 3 319.93 38.48
8 anova 0.08 3 344.57 28.70
9 anova 0.09 3 344.57 28.70
10 anova 0.10 3 360.58 58.11
11 anova 0.01 4 252.10 25.90
12 anova 0.02 4 265.57 28.53
13 anova 0.03 4 281.56 33.19
14 anova 0.04 4 290.55 28.73
15 anova 0.05 4 290.55 28.73
16 anova 0.06 4 309.88 29.80
17 anova 0.07 4 319.93 38.48
18 anova 0.08 4 344.57 28.70
19 anova 0.09 4 344.57 28.70
20 anova 0.10 4 360.58 58.11
21 anova 0.01 5 252.10 25.90
22 anova 0.02 5 265.57 28.53
23 anova 0.03 5 281.56 33.19
24 anova 0.04 5 290.55 28.73
25 anova 0.05 5 290.55 28.73
26 anova 0.06 5 309.88 29.80
27 anova 0.07 5 319.93 38.48
28 anova 0.08 5 344.57 28.70
29 anova 0.09 5 344.57 28.70
30 anova 0.10 5 360.58 58.11
31 anova 0.01 6 252.10 25.90
32 anova 0.02 6 265.57 28.53
33 anova 0.03 6 281.56 33.19
34 anova 0.04 6 290.55 28.73
35 anova 0.05 6 290.55 28.73
36 anova 0.06 6 309.88 29.80
37 anova 0.07 6 319.93 38.48
38 anova 0.08 6 344.57 28.70
39 anova 0.09 6 344.57 28.70
40 anova 0.10 6 360.58 58.11
41 anova 0.01 7 252.10 25.90
42 anova 0.02 7 265.57 28.53
43 anova 0.03 7 281.56 33.19
44 anova 0.04 7 290.55 28.73
45 anova 0.05 7 290.55 28.73
46 anova 0.06 7 309.88 29.80
47 anova 0.07 7 319.93 38.48
48 anova 0.08 7 344.57 28.70
49 anova 0.09 7 344.57 28.70
50 anova 0.10 7 360.58 58.11
  plt(NMHM.trn,11,ylb=expression(O[3] ~ CART ~ predicted),
     fich="./plots/O3_CRT_p.pdf",model=M.rptp,pfile=TRUE)

png 2

  plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ CRT ~ predicted),
        fich="./plots/O3_CRT_pt.pdf",M.rptp,pfile=TRUE)

[1] 0.8055746

#
tb3=M.rptp_d[["model"]]$performances
table03=xtable(tb3)
print(table03,type="html")
method cp minsplit error dispersion
1 anova 0.01 3 264.15 22.73
2 anova 0.02 3 308.57 21.44
3 anova 0.03 3 309.47 23.79
4 anova 0.04 3 309.47 23.79
5 anova 0.05 3 309.47 23.79
6 anova 0.06 3 309.47 23.79
7 anova 0.07 3 309.47 23.79
8 anova 0.08 3 354.42 44.00
9 anova 0.09 3 395.93 26.39
10 anova 0.10 3 395.93 26.39
11 anova 0.01 4 264.15 22.73
12 anova 0.02 4 308.57 21.44
13 anova 0.03 4 309.47 23.79
14 anova 0.04 4 309.47 23.79
15 anova 0.05 4 309.47 23.79
16 anova 0.06 4 309.47 23.79
17 anova 0.07 4 309.47 23.79
18 anova 0.08 4 354.42 44.00
19 anova 0.09 4 395.93 26.39
20 anova 0.10 4 395.93 26.39
21 anova 0.01 5 264.15 22.73
22 anova 0.02 5 308.57 21.44
23 anova 0.03 5 309.47 23.79
24 anova 0.04 5 309.47 23.79
25 anova 0.05 5 309.47 23.79
26 anova 0.06 5 309.47 23.79
27 anova 0.07 5 309.47 23.79
28 anova 0.08 5 354.42 44.00
29 anova 0.09 5 395.93 26.39
30 anova 0.10 5 395.93 26.39
31 anova 0.01 6 264.15 22.73
32 anova 0.02 6 308.57 21.44
33 anova 0.03 6 309.47 23.79
34 anova 0.04 6 309.47 23.79
35 anova 0.05 6 309.47 23.79
36 anova 0.06 6 309.47 23.79
37 anova 0.07 6 309.47 23.79
38 anova 0.08 6 354.42 44.00
39 anova 0.09 6 395.93 26.39
40 anova 0.10 6 395.93 26.39
41 anova 0.01 7 264.15 22.73
42 anova 0.02 7 308.57 21.44
43 anova 0.03 7 309.47 23.79
44 anova 0.04 7 309.47 23.79
45 anova 0.05 7 309.47 23.79
46 anova 0.06 7 309.47 23.79
47 anova 0.07 7 309.47 23.79
48 anova 0.08 7 354.42 44.00
49 anova 0.09 7 395.93 26.39
50 anova 0.10 7 395.93 26.39
  plt(NMHM.trn_d,11,ylb=expression(O[3] ~ CART ~ predicted),
     fich="./plots/O3_CRT_p_d.pdf",model=M.rptp_d,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ CART ~ predicted),
        fich="./plots/O3_CRT_pt_d.pdf",M.rptp_d,pfile=TRUE)

[1] 0.8573526

#
tb4=M.rptp_n[["model"]]$performances
table04=xtable(tb4)
print(table04,type="html")
method cp minsplit error dispersion
1 anova 0.01 3 209.52 26.88
2 anova 0.02 3 213.64 30.28
3 anova 0.03 3 213.64 30.28
4 anova 0.04 3 213.64 30.28
5 anova 0.05 3 241.65 52.19
6 anova 0.06 3 257.95 47.13
7 anova 0.07 3 257.95 47.13
8 anova 0.08 3 257.95 47.13
9 anova 0.09 3 257.95 47.13
10 anova 0.10 3 257.95 47.13
11 anova 0.01 4 209.52 26.88
12 anova 0.02 4 213.64 30.28
13 anova 0.03 4 213.64 30.28
14 anova 0.04 4 213.64 30.28
15 anova 0.05 4 241.65 52.19
16 anova 0.06 4 257.95 47.13
17 anova 0.07 4 257.95 47.13
18 anova 0.08 4 257.95 47.13
19 anova 0.09 4 257.95 47.13
20 anova 0.10 4 257.95 47.13
21 anova 0.01 5 209.52 26.88
22 anova 0.02 5 213.64 30.28
23 anova 0.03 5 213.64 30.28
24 anova 0.04 5 213.64 30.28
25 anova 0.05 5 241.65 52.19
26 anova 0.06 5 257.95 47.13
27 anova 0.07 5 257.95 47.13
28 anova 0.08 5 257.95 47.13
29 anova 0.09 5 257.95 47.13
30 anova 0.10 5 257.95 47.13
31 anova 0.01 6 209.52 26.88
32 anova 0.02 6 213.64 30.28
33 anova 0.03 6 213.64 30.28
34 anova 0.04 6 213.64 30.28
35 anova 0.05 6 241.65 52.19
36 anova 0.06 6 257.95 47.13
37 anova 0.07 6 257.95 47.13
38 anova 0.08 6 257.95 47.13
39 anova 0.09 6 257.95 47.13
40 anova 0.10 6 257.95 47.13
41 anova 0.01 7 209.52 26.88
42 anova 0.02 7 213.64 30.28
43 anova 0.03 7 213.64 30.28
44 anova 0.04 7 213.64 30.28
45 anova 0.05 7 241.65 52.19
46 anova 0.06 7 257.95 47.13
47 anova 0.07 7 257.95 47.13
48 anova 0.08 7 257.95 47.13
49 anova 0.09 7 257.95 47.13
50 anova 0.10 7 257.95 47.13
  plt(NMHM.trn_n,11,ylb=expression(O[3] ~ CART ~ predicted),
     fich="./plots/O3_CRT_p_n.pdf",model=M.rptp_n,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ CART ~ predicted),
        fich="./plots/O3_CRT_pt_n.pdf",M.rptp_n,pfile=TRUE)

[1] 0.7151531

#
print(xtable(cc.rpt),type="html")
Model Tst
M.rpt 0.84 0.00
M.rptp 0.84 0.81
M.rptp_d 0.88 0.86
M.rptp_n 0.75 0.72
#

The results found account for a correlation of 0.8373584.

Conclusions

After this short analysis we can conclude that:

LM SVM RF MLP CART
Full_Model 0.86 0.94 0.98 0.87 0.84
Partial_Model 0.86 0.94 0.98 0.87 0.84
Daily_P_Model 0.89 0.96 0.99 0.90 0.88
Nightly_P_Model 0.76 0.96 0.98 0.78 0.75
LM SVM RF MLP CART
Partial_Model 0.82 0.87 0.89 0.83 0.81
Daily_P_Model 0.86 0.90 0.91 0.87 0.86
Nightly_P_Model 0.71 0.81 0.82 0.71 0.72

From the figures, it is clear that RF produces some kind of understimation of higher values, probably because the data set is density imbalanced. Regarding this particular factor it exhibits a pretty nice performance the SVM technology.

In a global view we can conclude that the best fit was scored for 0.887013 method with a corrlation factor of 0.961723

Ensembles

Let’s see how it becomes the emsemble method