Processing data report for MHAMID station

Introduction

Data about pollution measured by this station has been provided by Prof Ouarzazi in 2013. It accounts for Co, NO2, Wind Speed, Temperature, PM10, SO2, Solar Radiation and Ozone hourly based.

This will try to forecast 24h ahead.

## Loading required package: MASS
## Loading required package: Cubist
## Loading required package: lattice
## Loading required package: caret
## Loading required package: ggplot2
## Loading required package: xtable
## Loading required package: Peaks
## Loading required package: magic
## Loading required package: abind
## Loading required package: doMC
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
## Loading required package: gbm
## Loading required package: survival
## 
## Attaching package: 'survival'
## 
## The following object is masked from 'package:caret':
## 
##     cluster
## 
## Loading required package: splines
## Loaded gbm 2.1.1
## Loading required package: segmented
## Loading required package: stringr
## Loading required package: ztable
## Welcome to package ztable ver 0.1.5
## Loading required package: doParallel
## Loading required package: signal
## 
## Attaching package: 'signal'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, poly
## 
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: plyr
## Loading required package: compare
## 
## Attaching package: 'compare'
## 
## The following object is masked from 'package:base':
## 
##     isTRUE
## 
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-10. For overview type 'help("mgcv-package")'.
## 
## Attaching package: 'mgcv'
## 
## The following object is masked from 'package:magic':
## 
##     magic
## 
## Loading required package: elasticnet
## Loading required package: lars
## Loaded lars 1.2
## 
## Loading required package: pbapply
## Loading required package: e1071
## Loading required package: nnet
## 
## Attaching package: 'nnet'
## 
## The following object is masked from 'package:mgcv':
## 
##     multinom
## 
## Loading required package: kernlab
## 
## Attaching package: 'kernlab'
## 
## The following object is masked from 'package:ggplot2':
## 
##     alpha
## 
## Loading required package: pls
## 
## Attaching package: 'pls'
## 
## The following object is masked from 'package:caret':
## 
##     R2
## 
## The following object is masked from 'package:stats':
## 
##     loadings
## 
## Loading required package: GA
## Package 'GA' version 2.2
## Type 'citation("GA")' for citing this R package in publications.
## Loading required package: devtools
## Loading required package: caretEnsemble

## Warning: replacing previous import by 'grid::arrow' when loading
## 'caretEnsemble'

## Warning: replacing previous import by 'grid::unit' when loading
## 'caretEnsemble'

## Loading required package: mlbench
## Loading required package: mclust
## Package 'mclust' version 5.1
## Type 'citation("mclust")' for citing this R package in publications.
## 
## Attaching package: 'mclust'
## 
## The following object is masked from 'package:mgcv':
## 
##     mvn
## 
## Loading required package: analogue
## Loading required package: vegan
## Loading required package: permute
## 
## Attaching package: 'permute'
## 
## The following object is masked from 'package:devtools':
## 
##     check
## 
## The following object is masked from 'package:kernlab':
## 
##     how
## 
## This is vegan 2.3-2
## 
## Attaching package: 'vegan'
## 
## The following object is masked from 'package:pls':
## 
##     scores
## 
## The following object is masked from 'package:caret':
## 
##     tolerance
## 
## This is analogue 0.16-3
## 
## Attaching package: 'analogue'
## 
## The following objects are masked from 'package:pls':
## 
##     crossval, pcr, RMSEP
## 
## The following object is masked from 'package:compare':
## 
##     compare
## 
## The following object is masked from 'package:plyr':
## 
##     join
## 
## Loading required package: cluster
## Loading required package: randomForest
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## 
## The following object is masked from 'package:ggplot2':
## 
##     margin
## 
## Loading required package: rpart
## Loading required package: party
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## 
## Attaching package: 'modeltools'
## 
## The following object is masked from 'package:kernlab':
## 
##     prior
## 
## The following object is masked from 'package:plyr':
## 
##     empty
## 
## Loading required package: strucchange
## Loading required package: sandwich
## 
## Attaching package: 'strucchange'
## 
## The following object is masked from 'package:stringr':
## 
##     boundary
## 
## Loading required package: fRegression
## Loading required package: timeDate
## 
## Attaching package: 'timeDate'
## 
## The following objects are masked from 'package:e1071':
## 
##     kurtosis, skewness
## 
## The following object is masked from 'package:xtable':
## 
##     align
## 
## Loading required package: timeSeries
## 
## Attaching package: 'timeSeries'
## 
## The following object is masked from 'package:randomForest':
## 
##     outlier
## 
## The following object is masked from 'package:analogue':
## 
##     smoothSpline
## 
## The following object is masked from 'package:zoo':
## 
##     time<-
## 
## Loading required package: fBasics
## 
## 
## Rmetrics Package fBasics
## Analysing Markets and calculating Basic Statistics
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
## 
## Attaching package: 'fBasics'
## 
## The following object is masked from 'package:modeltools':
## 
##     getModel
## 
## The following object is masked from 'package:signal':
## 
##     triang
## 
## The following object is masked from 'package:ztable':
## 
##     tr
## 
## 
## 
## Rmetrics Package fRegression
## Regression Based Decision and Prediction
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
## Loading required package: polspline
## Loading required package: VIF
## Loading required package: ridge
## Loading required package: mboost
## Loading required package: stabs
## 
## Attaching package: 'stabs'
## 
## The following object is masked from 'package:modeltools':
## 
##     parameters
## 
## This is mboost 2.5-0. See 'package?mboost' and 'news(package  = "mboost")'
## for a complete list of changes.
## 
## 
## Attaching package: 'mboost'
## 
## The following object is masked from 'package:ggplot2':
## 
##     %+%
## 
## Loading required package: earth
## Loading required package: plotmo
## Loading required package: plotrix
## Loading required package: TeachingDemos
## Loading required package: car
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:VIF':
## 
##     vif
## 
## The following object is masked from 'package:fBasics':
## 
##     densityPlot

Let’s load the data from the csv files

#
# if (file.exists("MHAMID.RData")) {
#   load("MHAMID.RData")
# } else {
  MHM<-read.csv(file="Mhamid_data.csv",sep=";",dec=",",
              header=TRUE)
  DMHM<-MHM[! is.na(as.Date(as.character(MHM[,1]))),
          1:ncol(MHM)]
  DMHM=DMHM[-as.numeric(which(apply(DMHM,1,function(x){return(sum(is.na(x)))}) > 0 )),]
  newc<-paste(as.character(DMHM[,1]),
      paste(DMHM[,2],":00:00",sep=""),sep= " ")
  newd<-strptime(newc,"%d/%m/%y %H:%M:%S")
  antes<-newd - 3600
  NMHM<-DMHM[,-2]
  NMHM[,1]<-as.data.frame(newd)
  colnames(NMHM)=gsub('.Mhamid','',colnames(NMHM))
  colnames(NMHM)[9]="WS"
  colnames(NMHM)[10]="SR"
  pNMHM=NMHM
  pNMHM[,11]=as.numeric(format(NMHM$Date,"%H"))
  colnames(pNMHM)[11]="Hour"
  colnames(pNMHM)[5]="C_O3"
  hourf=24
  pNMHM[,12]=apply(NMHM,1,o3f,delta=hourf,ref=NMHM)
  colnames(pNMHM)[12]="O3"
  pNMHM=pNMHM[! is.na(pNMHM$O3),]
  save(MHM,DMHM,NMHM,pNMHM,hourf,file="MHAMID.RData")
#}
rm(MHM)
NMHM=pNMHM
#
plt_pairs(NMHM[,-1],fich="./plots/MHM_pairs.pdf",pfile=TRUE)

## png 
##   2

# plt_pairs(pNMHM[,-1],fich="./plots/pMHM_pairs.pdf",pfile=TRUE)

      [,1]                            [,2]                           
 Date "Min.   :2009-06-01 01:00:00  " "1st Qu.:2009-12-26 13:00:00  "
  CO  "Min.   :0.0000  "              "1st Qu.:0.0300  "             
  HR  "Min.   : 8.00  "               "1st Qu.:39.00  "              
 NO2  "Min.   : 4.00  "               "1st Qu.:12.00  "              
 C_O3 "Min.   :  0.0  "               "1st Qu.: 25.0  "              
 PM10 "Min.   :   0.00  "             "1st Qu.:  30.00  "            
 SO2  "Min.   : 0.000  "              "1st Qu.: 6.000  "             
  TC  "Min.   : 4.20  "               "1st Qu.:15.20  "              
  WS  "Min.   :0.100  "               "1st Qu.:0.700  "              
  SR  "Min.   :   0.00  "             "1st Qu.:   0.00  "            
 Hour "Min.   : 0.00  "               "1st Qu.: 5.00  "              
  O3  "Min.   :  0.00  "              "1st Qu.: 25.00  "             
      [,3]                            [,4]                           
 Date "Median :2010-04-05 20:00:00  " "Mean   :2010-04-04 04:25:23  "
  CO  "Median :0.0500  "              "Mean   :0.1012  "             
  HR  "Median :57.00  "               "Mean   :57.11  "              
 NO2  "Median :19.00  "               "Mean   :23.69  "              
 C_O3 "Median : 36.0  "               "Mean   : 43.5  "              
 PM10 "Median :  49.00  "             "Mean   :  60.96  "            
 SO2  "Median : 9.000  "              "Mean   : 9.503  "             
  TC  "Median :19.40  "               "Mean   :20.53  "              
  WS  "Median :1.100  "               "Mean   :1.229  "              
  SR  "Median :   1.14  "             "Mean   : 197.05  "            
 Hour "Median :11.00  "               "Mean   :11.43  "              
  O3  "Median : 36.00  "              "Mean   : 43.49  "             
      [,5]                            [,6]                           
 Date "3rd Qu.:2010-08-06 09:00:00  " "Max.   :2010-11-27 23:00:00  "
  CO  "3rd Qu.:0.1000  "              "Max.   :4.0100  "             
  HR  "3rd Qu.:75.00  "               "Max.   :99.00  "              
 NO2  "3rd Qu.:31.00  "               "Max.   :98.00  "              
 C_O3 "3rd Qu.: 54.0  "               "Max.   :236.0  "              
 PM10 "3rd Qu.:  74.00  "             "Max.   :4187.00  "            
 SO2  "3rd Qu.:11.000  "              "Max.   :46.000  "             
  TC  "3rd Qu.:25.10  "               "Max.   :42.80  "              
  WS  "3rd Qu.:1.600  "               "Max.   :7.600  "              
  SR  "3rd Qu.: 371.40  "             "Max.   :1092.00  "            
 Hour "3rd Qu.:17.00  "               "Max.   :23.00  "              
  O3  "3rd Qu.: 54.00  "              "Max.   :270.00  "

Numerical treatment will be performed by using the well known open source statistical environment R (http://www.r-project.org).

Processing

In order to compare with Prof Ouarzazi’s results (corr = 0.84) for a local based model O3 ~ remaining variables at the same period, we will use several technologies.

Basic methodology will be: * To apply cross correlation learning validation as it becomes more robust that the fixed approach 70%,15%,15% * To apply full validation to all dataset, after selecting the best model, as Prof Ouarzazi did. * The hourly based moted was selected as for learning what it is possible to do, even when \(O_3\) should be accounted by its maximum per day and/or the dosage by 8h periods, depending on the specific regulation. * Uncertainty about future predictors was removed as we were no predicting Ozone with any lag.

Linear approach as reference

A linear model is considered as reference, for comparison of results in order to evaluate the degree of linearity

#
print(xtable(as.data.frame(car::vif(lm(O3~.,data=NMHM[,-1])))),type="html")

	car::vif(lm(O3 ~ ., data = NMHM[, -1]))
CO	1.72
HR	3.31
NO2	1.49
C_O3	1.43
PM10	1.37
SO2	1.14
TC	3.20
WS	1.21
SR	1.51
Hour	1.33

if (file.exists(paste("MHM_lm_",hourf,".RData",sep=""))) {
  load(paste("MHM_lm_",hourf,".RData",sep=""))
} else {
  rej=which(colnames(NMHM) %in% c("Date","SO2"))
  M.lm=m_lin(NMHM[,-rej],vprd="O3",vexp=".",cv=10)
#
  idx = sample(1:nrow(NMHM),floor(0.15*nrow(NMHM)),replace=FALSE)
  NMHM.trn = NMHM[-idx,]
  NMHM.tst = NMHM[idx,]
  M.lmp=m_lin(NMHM.trn[,-rej],vprd="O3",vexp=".",cv=10)
  c.lmp=plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt.pdf",M.lmp,pfile=FALSE)
# For the day
  NMHM.trn_d = NMHM.trn[NMHM.trn$SR>0,]
  NMHM.tst_d = NMHM.tst[NMHM.tst$SR>0,]
  M.lmp_d=m_lin(NMHM.trn_d[,-rej],vprd="O3",vexp=".",cv=10)
  c.lmp_d=plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt_d.pdf",M.lmp_d,pfile=FALSE)
# For the night
  NMHM.trn_n = NMHM.trn[NMHM.trn$SR==0,]
  NMHM.tst_n = NMHM.tst[NMHM.tst$SR==0,]
  rej2=which(colnames(NMHM) %in% c("Date","SO2","SR"))
  M.lmp_n=m_lin(NMHM.trn_n[,-rej2],vprd="O3",vexp=".",cv=10)
  c.lmp_n=plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt_n.pdf",M.lmp_n,pfile=FALSE)
#
  print(xtable(summary(M.lm[["model"]]$best.model)),type="html")
  print(xtable(summary(M.lmp[["model"]]$best.model)),type="html")
  print(xtable(summary(M.lmp_d[["model"]]$best.model)),type="html")
  print(xtable(summary(M.lmp_n[["model"]]$best.model)),type="html")
#
  r2=data.frame(r2=(summary(M.lm[["model"]]$best.model))$r.squared,
          r2adj=(summary(M.lm[["model"]]$best.model))$adj.r.squared)
  r2=rbind(r2,c((summary(M.lmp[["model"]]$best.model))$r.squared,
          r2adj=(summary(M.lmp[["model"]]$best.model))$adj.r.squared))
  r2=rbind(r2,c((summary(M.lmp_d[["model"]]$best.model))$r.squared,
          r2adj=(summary(M.lmp_d[["model"]]$best.model))$adj.r.squared))
  r2=rbind(r2,c((summary(M.lmp_n[["model"]]$best.model))$r.squared,
          r2adj=(summary(M.lmp_n[["model"]]$best.model))$adj.r.squared))
  rownames(r2)=c("M.lm","M.lmp","M.lmp_d","M.lmp_n")
  print(xtable(r2),type="html")
#
  cc=data.frame(Model=M.lm[["cc"]],Tst=0)
  cc=rbind(cc,c(M.lmp[["cc"]],c.lmp))  
  cc=rbind(cc,c(M.lmp_d[["cc"]],c.lmp_d))
  cc=rbind(cc,c(M.lmp_n[["cc"]],c.lmp_n))
  rownames(cc)=c("M.lm","M.lmp","M.lmp_d","M.lmp_n")
  print(xtable(cc),type="html")
#
  cc.lm=cc
  r2.lm=r2
  rm(list=c("cc","r2"))
  save(M.lm,M.lmp,M.lmp_d,M.lmp_n,NMHM,NMHM.trn,rej,rej2,
       NMHM.tst,NMHM.trn_d,NMHM.trn_n,cc.lm,r2.lm,
       NMHM.tst_d,NMHM.tst_n,file=paste("MHM_lm_",hourf,".RData",sep=""))
}

tb1=M.lm[["model"]]$performances
table01=xtable(tb1)
print(table01,type="html")

	dummyparameter	error	dispersion
1	0.00	180.27	19.35

plt(NMHM.trn,11,ylb=expression(O[3] ~ LM ~ predicted),
     fich="./plots/O3_LM.pdf",model=M.lm,pfile=TRUE)

png 2

#
tb2=M.lmp[["model"]]$performances
table02=xtable(tb2)
print(table02,type="html")

	dummyparameter	error	dispersion
1	0.00	176.53	16.05

  plt(NMHM.trn,11,ylb=expression(O[3] ~ LM ~ predicted),
     fich="./plots/O3_LM_p.pdf",model=M.lmp,pfile=TRUE)

png 2

  plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt.pdf",M.lmp,pfile=TRUE)

[1] 0.8721071

#
tb3=M.lmp_d[["model"]]$performances
table03=xtable(tb3)
print(table03,type="html")

	dummyparameter	error	dispersion
1	0.00	180.21	22.37

  plt(NMHM.trn_d,11,ylb=expression(O[3] ~ LM ~ predicted),
     fich="./plots/O3_LM_p_d.pdf",model=M.lmp_d,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt_d.pdf",M.lmp_d,pfile=TRUE)

[1] 0.8864268

#
tb4=M.lmp_n[["model"]]$performances
table04=xtable(tb4)
print(table04,type="html")

	dummyparameter	error	dispersion
1	0.00	164.48	23.92

  plt(NMHM.trn_n,11,ylb=expression(O[3] ~ LM ~ predicted),
     fich="./plots/O3_LM_p_n.pdf",model=M.lmp_n,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ LM ~ predicted),
        fich="./plots/O3_LM_pt_n.pdf",M.lmp_n,pfile=TRUE)

[1] 0.8141526

#
print(xtable(cc.lm),type="html")

	Model	Tst
M.lm	0.88	0.00
M.lmp	0.89	0.87
M.lmp_d	0.91	0.89
M.lmp_n	0.79	0.81

The results found account for a correlation of 0.8843296. It will considered as a reference.

SVM approach

A wrapper for SVM based regressors is applied looking for best parameters of learning.

tb1=t(summary(M.svm[["model"]]$performances))
table01=xtable(tb1)
print(table01,type="html")

	V1	V2	V3	V4	V5	V6
`gamma </td> <td> Min. :0.1250 </td> <td> 1st Qu.:0.1250 </td> <td> Median :0.2500 </td> <td> Mean :0.2917 </td> <td> 3rd Qu.:0.5000 </td> <td> Max. :0.5000 </td> </tr>`
`cost </td> <td> Min. :2.000 </td> <td> 1st Qu.:2.000 </td> <td> Median :4.000 </td> <td> Mean :4.667 </td> <td> 3rd Qu.:8.000 </td> <td> Max. :8.000 </td> </tr>`
`error </td> <td> Min. :141.0 </td> <td> 1st Qu.:143.8 </td> <td> Median :145.7 </td> <td> Mean :146.2 </td> <td> 3rd Qu.:148.3 </td> <td> Max. :151.3 </td> </tr>`
dispersion	Min. :16.51	1st Qu.:17.71	Median :18.12	Mean :18.91	3rd Qu.:20.08	Max. :22.89

  plt(NMHM,11,ylb=expression(O[3] ~ SVM ~ predicted),
     fich="./plots/O3_SVM.pdf",model=M.svm,pfile=TRUE)

png 2

#
tb2=t(summary(M.svmp[["model"]]$performances))
table02=xtable(tb2)
print(table02,type="html")

	V1	V2	V3	V4	V5	V6
`gamma </td> <td> Min. :0.1250 </td> <td> 1st Qu.:0.1250 </td> <td> Median :0.2500 </td> <td> Mean :0.2917 </td> <td> 3rd Qu.:0.5000 </td> <td> Max. :0.5000 </td> </tr>`
`cost </td> <td> Min. :2.000 </td> <td> 1st Qu.:2.000 </td> <td> Median :4.000 </td> <td> Mean :4.667 </td> <td> 3rd Qu.:8.000 </td> <td> Max. :8.000 </td> </tr>`
`error </td> <td> Min. :142.6 </td> <td> 1st Qu.:143.6 </td> <td> Median :146.6 </td> <td> Mean :147.4 </td> <td> 3rd Qu.:149.8 </td> <td> Max. :153.6 </td> </tr>`
dispersion	Min. :14.57	1st Qu.:14.99	Median :16.13	Mean :16.64	3rd Qu.:17.96	Max. :19.92

  plt(NMHM.trn,11,ylb=expression(O[3] ~ SVM ~ predicted),
     fich="./plots/O3_SVM_p.pdf",model=M.svmp,pfile=TRUE)

png 2

  plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ SVM ~ predicted),
        fich="./plots/O3_SVM_pt.pdf",M.svmp,pfile=TRUE)

[1] 0.8989905

#
tb3=t(summary(M.svmp_d[["model"]]$performances))
table03=xtable(tb3)
print(table03,type="html")

	V1	V2	V3	V4	V5	V6
`gamma </td> <td> Min. :0.1250 </td> <td> 1st Qu.:0.1250 </td> <td> Median :0.2500 </td> <td> Mean :0.2917 </td> <td> 3rd Qu.:0.5000 </td> <td> Max. :0.5000 </td> </tr>`
`cost </td> <td> Min. :2.000 </td> <td> 1st Qu.:2.000 </td> <td> Median :4.000 </td> <td> Mean :4.667 </td> <td> 3rd Qu.:8.000 </td> <td> Max. :8.000 </td> </tr>`
`error </td> <td> Min. :142.0 </td> <td> 1st Qu.:144.4 </td> <td> Median :147.6 </td> <td> Mean :152.1 </td> <td> 3rd Qu.:162.8 </td> <td> Max. :166.9 </td> </tr>`
dispersion	Min. :13.70	1st Qu.:14.21	Median :14.55	Mean :15.37	3rd Qu.:16.17	Max. :18.96

  plt(NMHM.trn_d,11,ylb=expression(O[3] ~ SVM ~ predicted),
     fich="./plots/O3_SVM_p_d.pdf",model=M.svmp_d,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ SVM ~ predicted),
        fich="./plots/O3_SVM_pt_d.pdf",M.svmp_d,pfile=TRUE)

[1] 0.9140482

#
tb4=t(summary(M.svmp_n[["model"]]$performances))
table04=xtable(tb4)
print(table04,type="html")

	V1	V2	V3	V4	V5	V6
`gamma </td> <td> Min. :0.1250 </td> <td> 1st Qu.:0.1250 </td> <td> Median :0.2500 </td> <td> Mean :0.2917 </td> <td> 3rd Qu.:0.5000 </td> <td> Max. :0.5000 </td> </tr>`
`cost </td> <td> Min. :2.000 </td> <td> 1st Qu.:2.000 </td> <td> Median :4.000 </td> <td> Mean :4.667 </td> <td> 3rd Qu.:8.000 </td> <td> Max. :8.000 </td> </tr>`
`error </td> <td> Min. :138.1 </td> <td> 1st Qu.:140.2 </td> <td> Median :140.7 </td> <td> Mean :140.8 </td> <td> 3rd Qu.:141.5 </td> <td> Max. :143.7 </td> </tr>`
dispersion	Min. :16.25	1st Qu.:18.08	Median :19.95	Mean :19.82	3rd Qu.:22.01	Max. :23.42

  plt(NMHM.trn_n,11,ylb=expression(O[3] ~ SVM ~ predicted),
     fich="./plots/O3_SVM_p_n.pdf",model=M.svmp_n,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ SVM ~ predicted),
        fich="./plots/O3_SVM_pt_n.pdf",M.svmp_n,pfile=TRUE)

[1] 0.8461495

#
print(xtable(cc.svm),type="html")

	Model	Tst
M.svm	0.95	0.00
M.svmp	0.95	0.90
M.svmp_d	0.96	0.91
M.svmp_n	0.94	0.85

The results found account for a correlation of 0.9490801 which outperforms the initial proposal carried out by Prof. Ouarzazi.

RandomForest

Let’s test the randomForest technology.

	mtry	ntree	error	dispersion
1	2	300.00	137.73	12.31
2	3	300.00	133.33	10.73
3	4	300.00	133.55	9.78
4	5	300.00	134.57	9.18
5	6	300.00	136.09	8.78
6	2	500.00	136.70	12.35
7	3	500.00	133.37	10.98
8	4	500.00	133.68	9.78
9	5	500.00	134.91	9.79
10	6	500.00	136.27	9.07
11	2	700.00	136.80	12.51
12	3	700.00	133.31	11.04
13	4	700.00	133.50	9.95
14	5	700.00	134.28	9.51
15	6	700.00	136.11	9.15
16	2	900.00	136.91	12.67
17	3	900.00	133.19	10.59
18	4	900.00	133.37	9.95
19	5	900.00	134.19	9.53
20	6	900.00	135.70	8.79

png 2

	mtry	ntree	error	dispersion
1	2	300.00	138.82	12.38
2	3	300.00	134.82	11.41
3	4	300.00	135.27	11.03
4	5	300.00	136.52	11.53
5	6	300.00	137.37	10.76
6	2	500.00	138.98	12.23
7	3	500.00	134.39	11.79
8	4	500.00	134.72	11.64
9	5	500.00	136.04	10.99
10	6	500.00	137.83	11.23
11	2	700.00	138.76	12.79
12	3	700.00	135.36	11.77
13	4	700.00	135.09	10.94
14	5	700.00	135.87	11.15
15	6	700.00	137.46	11.24
16	2	900.00	138.90	12.75
17	3	900.00	134.62	11.42
18	4	900.00	134.84	10.91
19	5	900.00	135.62	11.07
20	6	900.00	136.94	11.20

png 2

[1] 0.9116535

	mtry	ntree	error	dispersion
1	2	300.00	154.00	25.90
2	3	300.00	147.06	23.68
3	4	300.00	145.65	22.00
4	5	300.00	147.36	21.90
5	6	300.00	148.80	21.64
6	2	500.00	152.77	23.74
7	3	500.00	145.45	22.31
8	4	500.00	145.67	22.18
9	5	500.00	146.92	21.97
10	6	500.00	149.29	22.42
11	2	700.00	152.57	24.14
12	3	700.00	145.57	23.34
13	4	700.00	145.55	22.51
14	5	700.00	146.50	21.90
15	6	700.00	148.65	21.29
16	2	900.00	152.67	24.69
17	3	900.00	145.95	23.57
18	4	900.00	145.30	22.63
19	5	900.00	146.82	21.65
20	6	900.00	149.06	22.00

png 2

[1] 0.9233124

	mtry	ntree	error	dispersion
1	2	300.00	130.69	19.73
2	3	300.00	128.96	18.70
3	4	300.00	128.80	18.33
4	5	300.00	130.05	18.03
5	6	300.00	129.96	18.18
6	2	500.00	129.76	19.21
7	3	500.00	127.92	18.71
8	4	500.00	128.19	18.04
9	5	500.00	129.75	17.93
10	6	500.00	130.71	17.87
11	2	700.00	130.01	19.62
12	3	700.00	128.07	18.24
13	4	700.00	128.01	18.20
14	5	700.00	129.28	17.81
15	6	700.00	130.89	18.16
16	2	900.00	129.83	19.72
17	3	900.00	127.96	18.45
18	4	900.00	128.15	18.14
19	5	900.00	129.69	18.34
20	6	900.00	130.60	18.03

png 2

[1] 0.8656699

	Model	Tst
M.rf	0.99	0.00
M.rfp	0.98	0.91
M.rfp_d	0.99	0.92
M.rfp_n	0.97	0.87

The results found account for a correlation of 0.9851756.

FFNN: MLP

Let’s test backpropagation trained multilayer perceptron type neural network do their work.

tb1=M.mlp[["model"]]$performances
table01=xtable(tb1)
print(table01,type="html")

	linout	size	maxit	decay	trace	Var9	skip	error	dispersion
1	TRUE	4	50000.00	0.02	FALSE	7.00	TRUE	178.60	15.85
2	TRUE	5	50000.00	0.02	FALSE	7.00	TRUE	178.27	16.04
3	TRUE	6	50000.00	0.02	FALSE	7.00	TRUE	179.90	14.54
4	TRUE	7	50000.00	0.02	FALSE	7.00	TRUE	177.49	16.32
5	TRUE	8	50000.00	0.02	FALSE	7.00	TRUE	178.47	15.71
6	TRUE	9	50000.00	0.02	FALSE	7.00	TRUE	178.82	17.90
7	TRUE	10	50000.00	0.02	FALSE	7.00	TRUE	177.28	16.25
8	TRUE	11	50000.00	0.02	FALSE	7.00	TRUE	177.02	15.10
9	TRUE	12	50000.00	0.02	FALSE	7.00	TRUE	177.16	15.65
10	TRUE	13	50000.00	0.02	FALSE	7.00	TRUE	177.33	16.68
11	TRUE	14	50000.00	0.02	FALSE	7.00	TRUE	176.42	15.52
12	TRUE	15	50000.00	0.02	FALSE	7.00	TRUE	177.08	16.89
13	TRUE	16	50000.00	0.02	FALSE	7.00	TRUE	174.85	14.84
14	TRUE	17	50000.00	0.02	FALSE	7.00	TRUE	177.50	14.82
15	TRUE	18	50000.00	0.02	FALSE	7.00	TRUE	174.97	13.07
16	TRUE	19	50000.00	0.02	FALSE	7.00	TRUE	177.39	15.40
17	TRUE	20	50000.00	0.02	FALSE	7.00	TRUE	176.30	14.74

  plt(NMHM,11,ylb=expression(O[3] ~ MLP ~ predicted),
     fich="./plots/O3_MLP.pdf",model=M.mlp,pfile=TRUE)

## Loading required package: scales
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:plotrix':
## 
##     rescale
## 
## The following object is masked from 'package:kernlab':
## 
##     alpha

png 2

#
tb2=M.mlpp[["model"]]$performances
table02=xtable(tb2)
print(table02,type="html")

	linout	size	maxit	decay	trace	Var9	skip	error	dispersion
1	TRUE	4	50000.00	0.02	FALSE	7.00	TRUE	174.58	14.87
2	TRUE	5	50000.00	0.02	FALSE	7.00	TRUE	175.84	15.17
3	TRUE	6	50000.00	0.02	FALSE	7.00	TRUE	176.16	14.51
4	TRUE	7	50000.00	0.02	FALSE	7.00	TRUE	174.30	14.42
5	TRUE	8	50000.00	0.02	FALSE	7.00	TRUE	174.43	14.94
6	TRUE	9	50000.00	0.02	FALSE	7.00	TRUE	173.59	14.56
7	TRUE	10	50000.00	0.02	FALSE	7.00	TRUE	173.34	15.31
8	TRUE	11	50000.00	0.02	FALSE	7.00	TRUE	174.77	15.62
9	TRUE	12	50000.00	0.02	FALSE	7.00	TRUE	173.18	13.83
10	TRUE	13	50000.00	0.02	FALSE	7.00	TRUE	174.71	14.88
11	TRUE	14	50000.00	0.02	FALSE	7.00	TRUE	174.17	13.34
12	TRUE	15	50000.00	0.02	FALSE	7.00	TRUE	172.58	14.18
13	TRUE	16	50000.00	0.02	FALSE	7.00	TRUE	173.29	15.52
14	TRUE	17	50000.00	0.02	FALSE	7.00	TRUE	173.51	12.74
15	TRUE	18	50000.00	0.02	FALSE	7.00	TRUE	172.54	14.10
16	TRUE	19	50000.00	0.02	FALSE	7.00	TRUE	175.30	15.24
17	TRUE	20	50000.00	0.02	FALSE	7.00	TRUE	173.92	15.88

  plt(NMHM.trn,11,ylb=expression(O[3] ~ MLP ~ predicted),
     fich="./plots/O3_MLP_p.pdf",model=M.mlpp,pfile=TRUE)

png 2

  plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ MLP ~ predicted),
        fich="./plots/O3_MLP_pt.pdf",M.mlpp,pfile=TRUE)

[,1][1,] 0.8750303

#
tb3=M.mlpp_d[["model"]]$performances
table03=xtable(tb3)
print(table03,type="html")

	linout	size	maxit	decay	trace	Var9	skip	error	dispersion
1	TRUE	4	50000.00	0.02	FALSE	7.00	TRUE	177.35	21.03
2	TRUE	5	50000.00	0.02	FALSE	7.00	TRUE	178.17	23.83
3	TRUE	6	50000.00	0.02	FALSE	7.00	TRUE	177.31	20.31
4	TRUE	7	50000.00	0.02	FALSE	7.00	TRUE	178.89	20.24
5	TRUE	8	50000.00	0.02	FALSE	7.00	TRUE	179.37	19.76
6	TRUE	9	50000.00	0.02	FALSE	7.00	TRUE	176.03	20.23
7	TRUE	10	50000.00	0.02	FALSE	7.00	TRUE	180.69	21.82
8	TRUE	11	50000.00	0.02	FALSE	7.00	TRUE	180.27	20.02
9	TRUE	12	50000.00	0.02	FALSE	7.00	TRUE	180.59	19.87
10	TRUE	13	50000.00	0.02	FALSE	7.00	TRUE	178.46	19.68
11	TRUE	14	50000.00	0.02	FALSE	7.00	TRUE	179.87	16.80
12	TRUE	15	50000.00	0.02	FALSE	7.00	TRUE	178.64	22.18
13	TRUE	16	50000.00	0.02	FALSE	7.00	TRUE	176.26	21.60
14	TRUE	17	50000.00	0.02	FALSE	7.00	TRUE	182.02	20.85
15	TRUE	18	50000.00	0.02	FALSE	7.00	TRUE	180.15	24.89
16	TRUE	19	50000.00	0.02	FALSE	7.00	TRUE	180.04	16.83
17	TRUE	20	50000.00	0.02	FALSE	7.00	TRUE	181.40	21.49

  plt(NMHM.trn_d,11,ylb=expression(O[3] ~ MLP ~ predicted),
     fich="./plots/O3_MLP_p_d.pdf",model=M.mlpp_d,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ MLP ~ predicted),
        fich="./plots/O3_MLP_pt_d.pdf",M.mlpp_d,pfile=TRUE)

[,1][1,] 0.8887564

#
tb4=M.mlpp_n[["model"]]$performances
table04=xtable(tb4)
print(table04,type="html")

	linout	size	maxit	decay	trace	Var9	skip	error	dispersion
1	TRUE	4	50000.00	0.02	FALSE	7.00	TRUE	164.17	19.77
2	TRUE	5	50000.00	0.02	FALSE	7.00	TRUE	163.73	19.57
3	TRUE	6	50000.00	0.02	FALSE	7.00	TRUE	164.93	17.92
4	TRUE	7	50000.00	0.02	FALSE	7.00	TRUE	163.95	18.40
5	TRUE	8	50000.00	0.02	FALSE	7.00	TRUE	163.65	21.63
6	TRUE	9	50000.00	0.02	FALSE	7.00	TRUE	164.82	16.77
7	TRUE	10	50000.00	0.02	FALSE	7.00	TRUE	164.99	19.95
8	TRUE	11	50000.00	0.02	FALSE	7.00	TRUE	164.25	20.74
9	TRUE	12	50000.00	0.02	FALSE	7.00	TRUE	160.89	15.95
10	TRUE	13	50000.00	0.02	FALSE	7.00	TRUE	161.98	17.00
11	TRUE	14	50000.00	0.02	FALSE	7.00	TRUE	162.29	19.32
12	TRUE	15	50000.00	0.02	FALSE	7.00	TRUE	164.34	24.00
13	TRUE	16	50000.00	0.02	FALSE	7.00	TRUE	161.72	18.97
14	TRUE	17	50000.00	0.02	FALSE	7.00	TRUE	169.01	25.53
15	TRUE	18	50000.00	0.02	FALSE	7.00	TRUE	165.33	23.34
16	TRUE	19	50000.00	0.02	FALSE	7.00	TRUE	167.62	20.64
17	TRUE	20	50000.00	0.02	FALSE	7.00	TRUE	167.11	24.30

  plt(NMHM.trn_n,11,ylb=expression(O[3] ~ MLP ~ predicted),
     fich="./plots/O3_MLP_p_n.pdf",model=M.mlpp_n,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ MLP ~ predicted),
        fich="./plots/O3_MLP_pt_n.pdf",M.mlpp_n,pfile=TRUE)

[,1][1,] 0.8072019

#
print(xtable(cc.mlp),type="html")

	Model	Tst
M.mlp	0.89	0.00
M.mlpp	0.90	0.88
M.mlpp_d	0.91	0.89
M.mlpp_n	0.80	0.81

The results found account for a correlation of 0.8864128.

CART solution

Now we will use classification and regression trees to have a look at their capabilities for this particular problem.

tb1=M.rpt[["model"]]$performances
table01=xtable(tb1)
print(table01,type="html")

	method	cp	minsplit	error	dispersion
1	anova	0.01	3	200.65	18.85
2	anova	0.02	3	221.98	18.63
3	anova	0.03	3	245.00	18.23
4	anova	0.04	3	245.00	18.23
5	anova	0.05	3	245.00	18.23
6	anova	0.06	3	245.00	18.23
7	anova	0.07	3	245.00	18.23
8	anova	0.08	3	245.00	18.23
9	anova	0.09	3	302.03	29.84
10	anova	0.10	3	314.51	14.55
11	anova	0.01	4	200.65	18.85
12	anova	0.02	4	221.98	18.63
13	anova	0.03	4	245.00	18.23
14	anova	0.04	4	245.00	18.23
15	anova	0.05	4	245.00	18.23
16	anova	0.06	4	245.00	18.23
17	anova	0.07	4	245.00	18.23
18	anova	0.08	4	245.00	18.23
19	anova	0.09	4	302.03	29.84
20	anova	0.10	4	314.51	14.55
21	anova	0.01	5	200.65	18.85
22	anova	0.02	5	221.98	18.63
23	anova	0.03	5	245.00	18.23
24	anova	0.04	5	245.00	18.23
25	anova	0.05	5	245.00	18.23
26	anova	0.06	5	245.00	18.23
27	anova	0.07	5	245.00	18.23
28	anova	0.08	5	245.00	18.23
29	anova	0.09	5	302.03	29.84
30	anova	0.10	5	314.51	14.55
31	anova	0.01	6	200.65	18.85
32	anova	0.02	6	221.98	18.63
33	anova	0.03	6	245.00	18.23
34	anova	0.04	6	245.00	18.23
35	anova	0.05	6	245.00	18.23
36	anova	0.06	6	245.00	18.23
37	anova	0.07	6	245.00	18.23
38	anova	0.08	6	245.00	18.23
39	anova	0.09	6	302.03	29.84
40	anova	0.10	6	314.51	14.55
41	anova	0.01	7	200.65	18.85
42	anova	0.02	7	221.98	18.63
43	anova	0.03	7	245.00	18.23
44	anova	0.04	7	245.00	18.23
45	anova	0.05	7	245.00	18.23
46	anova	0.06	7	245.00	18.23
47	anova	0.07	7	245.00	18.23
48	anova	0.08	7	245.00	18.23
49	anova	0.09	7	302.03	29.84
50	anova	0.10	7	314.51	14.55

  plt(NMHM,11,ylb=expression(O[3] ~ CART ~ predicted),
     fich="./plots/O3_CRT.pdf",model=M.rpt,pfile=TRUE)

png 2

#
tb2=M.rptp[["model"]]$performances
table02=xtable(tb2)
print(table02,type="html")

	method	cp	minsplit	error	dispersion
1	anova	0.01	3	202.85	23.14
2	anova	0.02	3	222.31	23.32
3	anova	0.03	3	241.92	25.16
4	anova	0.04	3	241.92	25.16
5	anova	0.05	3	241.92	25.16
6	anova	0.06	3	241.92	25.16
7	anova	0.07	3	241.92	25.16
8	anova	0.08	3	241.92	25.16
9	anova	0.09	3	302.26	37.82
10	anova	0.10	3	314.30	28.57
11	anova	0.01	4	202.85	23.14
12	anova	0.02	4	222.31	23.32
13	anova	0.03	4	241.92	25.16
14	anova	0.04	4	241.92	25.16
15	anova	0.05	4	241.92	25.16
16	anova	0.06	4	241.92	25.16
17	anova	0.07	4	241.92	25.16
18	anova	0.08	4	241.92	25.16
19	anova	0.09	4	302.26	37.82
20	anova	0.10	4	314.30	28.57
21	anova	0.01	5	202.85	23.14
22	anova	0.02	5	222.31	23.32
23	anova	0.03	5	241.92	25.16
24	anova	0.04	5	241.92	25.16
25	anova	0.05	5	241.92	25.16
26	anova	0.06	5	241.92	25.16
27	anova	0.07	5	241.92	25.16
28	anova	0.08	5	241.92	25.16
29	anova	0.09	5	302.26	37.82
30	anova	0.10	5	314.30	28.57
31	anova	0.01	6	202.85	23.14
32	anova	0.02	6	222.31	23.32
33	anova	0.03	6	241.92	25.16
34	anova	0.04	6	241.92	25.16
35	anova	0.05	6	241.92	25.16
36	anova	0.06	6	241.92	25.16
37	anova	0.07	6	241.92	25.16
38	anova	0.08	6	241.92	25.16
39	anova	0.09	6	302.26	37.82
40	anova	0.10	6	314.30	28.57
41	anova	0.01	7	202.85	23.14
42	anova	0.02	7	222.31	23.32
43	anova	0.03	7	241.92	25.16
44	anova	0.04	7	241.92	25.16
45	anova	0.05	7	241.92	25.16
46	anova	0.06	7	241.92	25.16
47	anova	0.07	7	241.92	25.16
48	anova	0.08	7	241.92	25.16
49	anova	0.09	7	302.26	37.82
50	anova	0.10	7	314.30	28.57

  plt(NMHM.trn,11,ylb=expression(O[3] ~ CART ~ predicted),
     fich="./plots/O3_CRT_p.pdf",model=M.rptp,pfile=TRUE)

png 2

  plt_prd(NMHM.tst[,-1],11,ylb=expression(O[3] ~ CRT ~ predicted),
        fich="./plots/O3_CRT_pt.pdf",M.rptp,pfile=TRUE)

[1] 0.8591158

#
tb3=M.rptp_d[["model"]]$performances
table03=xtable(tb3)
print(table03,type="html")

	method	cp	minsplit	error	dispersion
1	anova	0.01	3	205.78	32.30
2	anova	0.02	3	249.56	32.69
3	anova	0.03	3	261.96	25.69
4	anova	0.04	3	261.96	25.69
5	anova	0.05	3	261.96	25.69
6	anova	0.06	3	261.96	25.69
7	anova	0.07	3	261.96	25.69
8	anova	0.08	3	261.96	25.69
9	anova	0.09	3	347.53	32.54
10	anova	0.10	3	347.53	32.54
11	anova	0.01	4	205.78	32.30
12	anova	0.02	4	249.56	32.69
13	anova	0.03	4	261.96	25.69
14	anova	0.04	4	261.96	25.69
15	anova	0.05	4	261.96	25.69
16	anova	0.06	4	261.96	25.69
17	anova	0.07	4	261.96	25.69
18	anova	0.08	4	261.96	25.69
19	anova	0.09	4	347.53	32.54
20	anova	0.10	4	347.53	32.54
21	anova	0.01	5	205.78	32.30
22	anova	0.02	5	249.56	32.69
23	anova	0.03	5	261.96	25.69
24	anova	0.04	5	261.96	25.69
25	anova	0.05	5	261.96	25.69
26	anova	0.06	5	261.96	25.69
27	anova	0.07	5	261.96	25.69
28	anova	0.08	5	261.96	25.69
29	anova	0.09	5	347.53	32.54
30	anova	0.10	5	347.53	32.54
31	anova	0.01	6	205.78	32.30
32	anova	0.02	6	249.56	32.69
33	anova	0.03	6	261.96	25.69
34	anova	0.04	6	261.96	25.69
35	anova	0.05	6	261.96	25.69
36	anova	0.06	6	261.96	25.69
37	anova	0.07	6	261.96	25.69
38	anova	0.08	6	261.96	25.69
39	anova	0.09	6	347.53	32.54
40	anova	0.10	6	347.53	32.54
41	anova	0.01	7	205.78	32.30
42	anova	0.02	7	249.56	32.69
43	anova	0.03	7	261.96	25.69
44	anova	0.04	7	261.96	25.69
45	anova	0.05	7	261.96	25.69
46	anova	0.06	7	261.96	25.69
47	anova	0.07	7	261.96	25.69
48	anova	0.08	7	261.96	25.69
49	anova	0.09	7	347.53	32.54
50	anova	0.10	7	347.53	32.54

  plt(NMHM.trn_d,11,ylb=expression(O[3] ~ CART ~ predicted),
     fich="./plots/O3_CRT_p_d.pdf",model=M.rptp_d,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_d[,-1],11,ylb=expression(O[3] ~ CART ~ predicted),
        fich="./plots/O3_CRT_pt_d.pdf",M.rptp_d,pfile=TRUE)

[1] 0.8814218

#
tb4=M.rptp_n[["model"]]$performances
table04=xtable(tb4)
print(table04,type="html")

	method	cp	minsplit	error	dispersion
1	anova	0.01	3	178.53	19.90
2	anova	0.02	3	185.87	20.95
3	anova	0.03	3	185.87	20.95
4	anova	0.04	3	185.87	20.95
5	anova	0.05	3	197.76	38.15
6	anova	0.06	3	207.53	34.97
7	anova	0.07	3	233.92	33.70
8	anova	0.08	3	233.92	33.70
9	anova	0.09	3	233.92	33.70
10	anova	0.10	3	233.92	33.70
11	anova	0.01	4	178.53	19.90
12	anova	0.02	4	185.87	20.95
13	anova	0.03	4	185.87	20.95
14	anova	0.04	4	185.87	20.95
15	anova	0.05	4	197.76	38.15
16	anova	0.06	4	207.53	34.97
17	anova	0.07	4	233.92	33.70
18	anova	0.08	4	233.92	33.70
19	anova	0.09	4	233.92	33.70
20	anova	0.10	4	233.92	33.70
21	anova	0.01	5	178.53	19.90
22	anova	0.02	5	185.87	20.95
23	anova	0.03	5	185.87	20.95
24	anova	0.04	5	185.87	20.95
25	anova	0.05	5	197.76	38.15
26	anova	0.06	5	207.53	34.97
27	anova	0.07	5	233.92	33.70
28	anova	0.08	5	233.92	33.70
29	anova	0.09	5	233.92	33.70
30	anova	0.10	5	233.92	33.70
31	anova	0.01	6	178.53	19.90
32	anova	0.02	6	185.87	20.95
33	anova	0.03	6	185.87	20.95
34	anova	0.04	6	185.87	20.95
35	anova	0.05	6	197.76	38.15
36	anova	0.06	6	207.53	34.97
37	anova	0.07	6	233.92	33.70
38	anova	0.08	6	233.92	33.70
39	anova	0.09	6	233.92	33.70
40	anova	0.10	6	233.92	33.70
41	anova	0.01	7	178.53	19.90
42	anova	0.02	7	185.87	20.95
43	anova	0.03	7	185.87	20.95
44	anova	0.04	7	185.87	20.95
45	anova	0.05	7	197.76	38.15
46	anova	0.06	7	207.53	34.97
47	anova	0.07	7	233.92	33.70
48	anova	0.08	7	233.92	33.70
49	anova	0.09	7	233.92	33.70
50	anova	0.10	7	233.92	33.70

  plt(NMHM.trn_n,11,ylb=expression(O[3] ~ CART ~ predicted),
     fich="./plots/O3_CRT_p_n.pdf",model=M.rptp_n,pfile=TRUE)

png 2

  plt_prd(NMHM.tst_n[,-1],11,ylb=expression(O[3] ~ CART ~ predicted),
        fich="./plots/O3_CRT_pt_n.pdf",M.rptp_n,pfile=TRUE)

[1] 0.7870983

#
print(xtable(cc.rpt),type="html")

	Model	Tst
M.rpt	0.87	0.00
M.rptp	0.87	0.86
M.rptp_d	0.90	0.88
M.rptp_n	0.77	0.79

The results found account for a correlation of 0.8709754.

Conclusions

After this short analysis we can conclude that:

	LM	SVM	RF	MLP	CART
Full_Model	0.88	0.95	0.99	0.89	0.87
Partial_Model	0.89	0.95	0.98	0.90	0.87
Daily_P_Model	0.91	0.96	0.99	0.91	0.90
Nightly_P_Model	0.79	0.94	0.97	0.80	0.77

	LM	SVM	RF	MLP	CART
Partial_Model	0.87	0.90	0.91	0.88	0.86
Daily_P_Model	0.89	0.91	0.92	0.89	0.88
Nightly_P_Model	0.81	0.85	0.87	0.81	0.79

From the figures, it is clear that RF produces some kind of understimation of higher values, probably because the data set is density imbalanced. Regarding this particular factor it exhibits a pretty nice performance the SVM technology.

In a global view we can conclude that the best fit was scored for 0.9097719 method with a corrlation factor of 0.9556527

Ensembles

Let’s see how it becomes the emsemble method