Training Models From GA selected features for Metalicity Prediction

In previous steps features were selected by using GA technology to assess the suitability of such features.

Differnt GA strategies for fitness evaluation and we have tested nearcenter and randomforest technologies.

Feature selection was prepared acording to document paso2

suppressPackageStartupMessages(library(googleVis))
suppressPackageStartupMessages(library(xtable))
suppressPackageStartupMessages(library(Peaks))
suppressPackageStartupMessages(library(magic))
suppressPackageStartupMessages(library(segmented))
suppressPackageStartupMessages(library(fftw))
suppressPackageStartupMessages(library(FITSio))
suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(utils))
suppressPackageStartupMessages(library(e1071))
suppressPackageStartupMessages(library(quantmod))
suppressPackageStartupMessages(library(JADE))
suppressPackageStartupMessages(library(zoo))
suppressPackageStartupMessages(library(plyr))
suppressPackageStartupMessages(library(doMC))
suppressPackageStartupMessages(library(multicore))
suppressPackageStartupMessages(library(parallel))
suppressPackageStartupMessages(library(foreach))
suppressPackageStartupMessages(library(compiler))
suppressPackageStartupMessages(library(galgo))

##Feature extraction as defined by the GA

For feature selection BT-Settl 2012 library from France Allard was selected and wavelength reduction was performed to become compatible with the data coming from the satellite IPAC.

# Procesado del genético de Tª
setwd("~/git/M_sel")
# load('nearcenter_ALL_m_5_900.RData') plot(bb.nc.3,type='fitness')
# plot(bb.nc.3,type='fitness', filter='nosolutions')
# plot(bb.nc.3,type='confusion') cpm<-classPredictionMatrix(bb.nc.3)
# cm<-confusionMatrix(bb.nc.3,cpm) sec<-sensitivityClass(bb.nc.3,cm)
# spc<-specificityClass(bb.nc.3,cm)
# plot(bb.nc.3,type='confusion',set=c(1,0), splits=1)
# plot(bb.nc.3,type='confusion',set=c(1,0),
# splits=1,chromosomes=list(bb.nc.3$bestChromosomes[[1]]))
# plot(bb.nc.3,type='generankstability')
# rchr<-lapply(bb.nc.3$bestChromosomes[1:300],robustGeneBackwardElimination,bb.nc.3,result='shortest')
# fsm<-forwardSelectionModels(bb.nc.3, plot=FALSE) fsm$models
# rownames(ALL)[fsm$models[[3]]]
load("m_m-settl_5_svm.RData")
datos <- `m_m-settl_5_svm`
features <- list()
features$M <- c("44:53_79:86", "44:51_52:63", "46:53_67:83", "53:62_121:137", 
    "53:62_127:143", "56:63_79:95", "59:70_49:56", "67:81_46:53", "88:95_103:116", 
    "101:117_82:96", "107:116_55:62", "110:117_136:143", "134:143_109:116", 
    "144:151_106:113")
output <- 15
#

Now, we start to load the original data bp_clean and bq_clean, and we will process the features according to the feature list.

In order to generate features we will use the formula \( \int{\lambda_{1}}{\lambda_{2}}{1-\frac{F_{line}}{F_{cont}} d\lambda} \) as depicted in formula (1) of the above reference.

The band frequences are defined in the previous table but \( F_{cont} \) is not defined. As such a genetic algorithm testing different potential continuum spectra will be tested and looking the best matching to explain spectral parameters by precious feature functions.

Once the signal points and continuous regions were identified, models were setted-up and assessed by crossvalidation procedure. Iedas for the implementation were taken from http://moderntoolmaking.blogspot.com.es/2013/03/caretensemble-classification-example.html

# Cargamos bp_clean (BT_SETTL) & bq_clean (IPAC)
load("~/git/M_prep/M_prep_cleanip_BT-Settl.RData")
rm(xtmp)
# Buscamos las catacterísticas para extraerla del conjunto de train par T
signal <- unlist(lapply(features$M, function(x) {
    a <- strsplit(x, "_")
    return(a[[1]][1])
}))
noise <- unlist(lapply(features$M, function(x) {
    a <- strsplit(x, "_")
    return(a[[1]][2])
}))
sn <- cbind(signal, noise)
int_spec <- function(x, idx, norm = 0) {
    y <- x$data[[1]][eval(parse(text = idx)), ]
    xz <- diff(as.numeric(y[, 1]), 1)
    yz <- as.numeric(y[, 2])
    if (norm > 0) {
        yz <- rep(1, length(xz))
    }
    z <- sum(xz * rollmean(yz, 2))
    return(z)
}
#
feature_extr <- function(sn, bp) {
    sig <- sn[1]
    noi <- sn[2]
    Fcont <- unlist(lapply(bp, int_spec, noi, 0))/unlist(lapply(bp, int_spec, 
        noi, 1))
    fea <- unlist(lapply(bp, int_spec, sig, 1)) - unlist(lapply(bp, int_spec, 
        sig, 0))/Fcont
    return(fea)
}
xx <- apply(sn, 1, feature_extr, bp_clean)
colnames(xx) <- as.character(sn[, 1])
xx <- cbind(xx, unlist(lapply(bp_clean, function(x) {
    return(x$stellarp[3])
})))
colnames(xx)[output] <- "G"

# Just informing for the features
newfea <- cbind(t(apply(sn, 1, function(x) {
    return(range(bp_clean[[1]]$data[[1]][eval(parse(text = x[1])), 1]))
})), t(apply(sn, 1, function(x) {
    return(range(bp_clean[[1]]$data[[1]][eval(parse(text = x[2])), 1]))
})))
colnames(newfea) <- c("Signal_from", "Signal_To", "Cont_From", "Cont_To")

print(xtable(newfea), type = "html")
Signal_from Signal_To Cont_From Cont_To
1 8615.80 8648.20 8741.80 8767.00
2 8615.80 8641.00 8644.60 8684.20
3 8623.00 8648.20 8698.60 8756.20
4 8648.20 8680.60 8893.00 8950.60
5 8648.20 8680.60 8914.60 8972.20
6 8659.00 8684.20 8741.80 8799.40
7 8669.80 8709.40 8633.80 8659.00
8 8698.60 8749.00 8623.00 8648.20
9 8774.20 8799.40 8828.20 8875.00
10 8821.00 8878.60 8752.60 8803.00
11 8842.60 8875.00 8655.40 8680.60
12 8853.40 8878.60 8947.00 8972.20
13 8939.80 8972.20 8849.80 8875.00
14 8975.80 9001.00 8839.00 8864.20
#

Regression and modeling

Let's build up several models and compare their performance. Generally speaking we split randomly the learning set and we will learn by ten folder cross validation. Then we will test the models against the unseen data and we will test the ensamble technology, even.

# Setup
gc(reset = TRUE)
##             used   (Mb) gc trigger (Mb)  max used   (Mb)
## Ncells    794635   42.5    1590760   85    794635   42.5
## Vcells 520693182 3972.6  752686138 5743 520693182 3972.6
set.seed(42)  #From random.org

# Libraries
library(caret)
## Loading required package: cluster
## Loading required package: lattice
## Attaching package: 'lattice'
## The following object is masked from 'package:multicore':
## 
## parallel
## Loading required package: reshape2
## Attaching package: 'caret'
## The following object is masked from 'package:galgo':
## 
## best, confusionMatrix
library(devtools)
## Attaching package: 'devtools'
## The following object is masked from 'package:R.oo':
## 
## check, unload
# Solo una vez: install_github('caretEnsemble', 'zachmayer') #Install
# zach's caretEnsemble package Code gathered from the author's post.
library(caretEnsemble)

# Data
library(mlbench)
xx <- as.data.frame(xx)
X <- xx[, -output]
rownames(X) <- 1:nrow(X)
X <- data.frame(X)
Y <- xx[, output]

# Split train/test
train <- runif(nrow(X)) <= 0.66

# Setup CV Folds returnData=FALSE saves some space
folds = 10
repeats = 1
myControl <- trainControl(method = "cv", number = folds, repeats = repeats, 
    returnResamp = "none", returnData = FALSE, savePredictions = TRUE, verboseIter = FALSE, 
    allowParallel = TRUE, index = createMultiFolds(Y[train], k = folds, times = repeats))
# Train some models
model1 <- train(X[train, ], Y[train], method = "gbm", trControl = myControl, 
    tuneGrid = expand.grid(.n.trees = 500, .interaction.depth = 15, .shrinkage = 0.01))
## Loading required package: survival
## Loading required package: splines
## Loaded gbm 2.1
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.7033            -nan     0.0100    0.0129
##      2        0.6905            -nan     0.0100    0.0117
##      3        0.6776            -nan     0.0100    0.0116
##      4        0.6650            -nan     0.0100    0.0128
##      5        0.6530            -nan     0.0100    0.0114
##      6        0.6408            -nan     0.0100    0.0124
##      7        0.6291            -nan     0.0100    0.0116
##      8        0.6177            -nan     0.0100    0.0114
##      9        0.6064            -nan     0.0100    0.0109
##     10        0.5953            -nan     0.0100    0.0107
##     20        0.4945            -nan     0.0100    0.0089
##     40        0.3436            -nan     0.0100    0.0061
##     60        0.2401            -nan     0.0100    0.0042
##     80        0.1690            -nan     0.0100    0.0027
##    100        0.1203            -nan     0.0100    0.0020
##    120        0.0868            -nan     0.0100    0.0013
##    140        0.0633            -nan     0.0100    0.0009
##    160        0.0471            -nan     0.0100    0.0006
##    180        0.0358            -nan     0.0100    0.0004
##    200        0.0277            -nan     0.0100    0.0002
##    220        0.0220            -nan     0.0100    0.0002
##    240        0.0180            -nan     0.0100    0.0001
##    260        0.0152            -nan     0.0100    0.0001
##    280        0.0131            -nan     0.0100    0.0001
##    300        0.0114            -nan     0.0100    0.0000
##    320        0.0103            -nan     0.0100    0.0000
##    340        0.0094            -nan     0.0100    0.0000
##    360        0.0087            -nan     0.0100   -0.0000
##    380        0.0081            -nan     0.0100   -0.0000
##    400        0.0077            -nan     0.0100   -0.0000
##    420        0.0072            -nan     0.0100   -0.0000
##    440        0.0069            -nan     0.0100    0.0000
##    460        0.0066            -nan     0.0100   -0.0000
##    480        0.0064            -nan     0.0100   -0.0000
##    500        0.0061            -nan     0.0100   -0.0000
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.7055            -nan     0.0100    0.0129
##      2        0.6926            -nan     0.0100    0.0123
##      3        0.6800            -nan     0.0100    0.0139
##      4        0.6674            -nan     0.0100    0.0133
##      5        0.6547            -nan     0.0100    0.0130
##      6        0.6425            -nan     0.0100    0.0094
##      7        0.6305            -nan     0.0100    0.0120
##      8        0.6189            -nan     0.0100    0.0137
##      9        0.6076            -nan     0.0100    0.0124
##     10        0.5962            -nan     0.0100    0.0123
##     20        0.4953            -nan     0.0100    0.0084
##     40        0.3429            -nan     0.0100    0.0059
##     60        0.2398            -nan     0.0100    0.0049
##     80        0.1678            -nan     0.0100    0.0026
##    100        0.1194            -nan     0.0100    0.0018
##    120        0.0859            -nan     0.0100    0.0015
##    140        0.0628            -nan     0.0100    0.0010
##    160        0.0466            -nan     0.0100    0.0006
##    180        0.0354            -nan     0.0100    0.0004
##    200        0.0275            -nan     0.0100    0.0003
##    220        0.0217            -nan     0.0100    0.0003
##    240        0.0178            -nan     0.0100    0.0001
##    260        0.0149            -nan     0.0100    0.0001
##    280        0.0129            -nan     0.0100    0.0001
##    300        0.0113            -nan     0.0100    0.0000
##    320        0.0101            -nan     0.0100    0.0000
##    340        0.0092            -nan     0.0100    0.0000
##    360        0.0085            -nan     0.0100   -0.0000
##    380        0.0079            -nan     0.0100    0.0000
##    400        0.0075            -nan     0.0100   -0.0000
##    420        0.0071            -nan     0.0100   -0.0000
##    440        0.0068            -nan     0.0100   -0.0000
##    460        0.0065            -nan     0.0100   -0.0000
##    480        0.0062            -nan     0.0100   -0.0000
##    500        0.0060            -nan     0.0100   -0.0000
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.6884            -nan     0.0100    0.0131
##      2        0.6761            -nan     0.0100    0.0136
##      3        0.6634            -nan     0.0100    0.0139
##      4        0.6512            -nan     0.0100    0.0118
##      5        0.6391            -nan     0.0100    0.0107
##      6        0.6278            -nan     0.0100    0.0123
##      7        0.6159            -nan     0.0100    0.0111
##      8        0.6048            -nan     0.0100    0.0108
##      9        0.5939            -nan     0.0100    0.0114
##     10        0.5828            -nan     0.0100    0.0107
##     20        0.4847            -nan     0.0100    0.0095
##     40        0.3363            -nan     0.0100    0.0058
##     60        0.2352            -nan     0.0100    0.0047
##     80        0.1660            -nan     0.0100    0.0027
##    100        0.1178            -nan     0.0100    0.0019
##    120        0.0852            -nan     0.0100    0.0014
##    140        0.0625            -nan     0.0100    0.0009
##    160        0.0468            -nan     0.0100    0.0005
##    180        0.0355            -nan     0.0100    0.0004
##    200        0.0277            -nan     0.0100    0.0003
##    220        0.0220            -nan     0.0100    0.0002
##    240        0.0179            -nan     0.0100    0.0001
##    260        0.0151            -nan     0.0100    0.0001
##    280        0.0129            -nan     0.0100    0.0001
##    300        0.0115            -nan     0.0100    0.0000
##    320        0.0103            -nan     0.0100    0.0000
##    340        0.0094            -nan     0.0100    0.0000
##    360        0.0086            -nan     0.0100    0.0000
##    380        0.0081            -nan     0.0100   -0.0000
##    400        0.0076            -nan     0.0100   -0.0000
##    420        0.0072            -nan     0.0100    0.0000
##    440        0.0068            -nan     0.0100   -0.0000
##    460        0.0065            -nan     0.0100   -0.0000
##    480        0.0063            -nan     0.0100    0.0000
##    500        0.0060            -nan     0.0100   -0.0000
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.6986            -nan     0.0100    0.0101
##      2        0.6854            -nan     0.0100    0.0123
##      3        0.6726            -nan     0.0100    0.0118
##      4        0.6599            -nan     0.0100    0.0134
##      5        0.6474            -nan     0.0100    0.0135
##      6        0.6350            -nan     0.0100    0.0141
##      7        0.6239            -nan     0.0100    0.0116
##      8        0.6122            -nan     0.0100    0.0112
##      9        0.6012            -nan     0.0100    0.0113
##     10        0.5903            -nan     0.0100    0.0106
##     20        0.4906            -nan     0.0100    0.0095
##     40        0.3413            -nan     0.0100    0.0070
##     60        0.2377            -nan     0.0100    0.0041
##     80        0.1675            -nan     0.0100    0.0028
##    100        0.1188            -nan     0.0100    0.0018
##    120        0.0850            -nan     0.0100    0.0015
##    140        0.0621            -nan     0.0100    0.0009
##    160        0.0462            -nan     0.0100    0.0006
##    180        0.0348            -nan     0.0100    0.0004
##    200        0.0268            -nan     0.0100    0.0003
##    220        0.0212            -nan     0.0100    0.0002
##    240        0.0173            -nan     0.0100    0.0002
##    260        0.0144            -nan     0.0100    0.0001
##    280        0.0123            -nan     0.0100    0.0000
##    300        0.0108            -nan     0.0100    0.0001
##    320        0.0097            -nan     0.0100    0.0000
##    340        0.0088            -nan     0.0100    0.0000
##    360        0.0081            -nan     0.0100    0.0000
##    380        0.0076            -nan     0.0100    0.0000
##    400        0.0071            -nan     0.0100    0.0000
##    420        0.0067            -nan     0.0100   -0.0000
##    440        0.0063            -nan     0.0100   -0.0000
##    460        0.0061            -nan     0.0100   -0.0000
##    480        0.0058            -nan     0.0100   -0.0000
##    500        0.0056            -nan     0.0100   -0.0000
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.6674            -nan     0.0100    0.0137
##      2        0.6550            -nan     0.0100    0.0127
##      3        0.6436            -nan     0.0100    0.0121
##      4        0.6317            -nan     0.0100    0.0119
##      5        0.6199            -nan     0.0100    0.0111
##      6        0.6086            -nan     0.0100    0.0121
##      7        0.5972            -nan     0.0100    0.0099
##      8        0.5859            -nan     0.0100    0.0108
##      9        0.5748            -nan     0.0100    0.0115
##     10        0.5643            -nan     0.0100    0.0106
##     20        0.4707            -nan     0.0100    0.0073
##     40        0.3276            -nan     0.0100    0.0048
##     60        0.2300            -nan     0.0100    0.0038
##     80        0.1628            -nan     0.0100    0.0027
##    100        0.1166            -nan     0.0100    0.0018
##    120        0.0845            -nan     0.0100    0.0012
##    140        0.0614            -nan     0.0100    0.0009
##    160        0.0460            -nan     0.0100    0.0004
##    180        0.0352            -nan     0.0100    0.0004
##    200        0.0276            -nan     0.0100    0.0002
##    220        0.0222            -nan     0.0100    0.0002
##    240        0.0182            -nan     0.0100    0.0001
##    260        0.0154            -nan     0.0100    0.0001
##    280        0.0132            -nan     0.0100    0.0001
##    300        0.0116            -nan     0.0100    0.0000
##    320        0.0105            -nan     0.0100    0.0000
##    340        0.0095            -nan     0.0100   -0.0000
##    360        0.0087            -nan     0.0100   -0.0000
##    380        0.0081            -nan     0.0100    0.0000
##    400        0.0076            -nan     0.0100   -0.0000
##    420        0.0072            -nan     0.0100    0.0000
##    440        0.0068            -nan     0.0100   -0.0000
##    460        0.0065            -nan     0.0100   -0.0000
##    480        0.0062            -nan     0.0100   -0.0000
##    500        0.0060            -nan     0.0100   -0.0000
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.6981            -nan     0.0100    0.0139
##      2        0.6853            -nan     0.0100    0.0102
##      3        0.6726            -nan     0.0100    0.0141
##      4        0.6601            -nan     0.0100    0.0119
##      5        0.6477            -nan     0.0100    0.0106
##      6        0.6355            -nan     0.0100    0.0120
##      7        0.6245            -nan     0.0100    0.0121
##      8        0.6128            -nan     0.0100    0.0111
##      9        0.6016            -nan     0.0100    0.0109
##     10        0.5904            -nan     0.0100    0.0100
##     20        0.4900            -nan     0.0100    0.0093
##     40        0.3390            -nan     0.0100    0.0064
##     60        0.2370            -nan     0.0100    0.0047
##     80        0.1672            -nan     0.0100    0.0032
##    100        0.1192            -nan     0.0100    0.0019
##    120        0.0864            -nan     0.0100    0.0014
##    140        0.0634            -nan     0.0100    0.0009
##    160        0.0467            -nan     0.0100    0.0006
##    180        0.0351            -nan     0.0100    0.0003
##    200        0.0273            -nan     0.0100    0.0003
##    220        0.0217            -nan     0.0100    0.0002
##    240        0.0177            -nan     0.0100    0.0002
##    260        0.0148            -nan     0.0100    0.0001
##    280        0.0126            -nan     0.0100    0.0001
##    300        0.0110            -nan     0.0100    0.0000
##    320        0.0098            -nan     0.0100    0.0000
##    340        0.0088            -nan     0.0100    0.0000
##    360        0.0081            -nan     0.0100    0.0000
##    380        0.0075            -nan     0.0100    0.0000
##    400        0.0070            -nan     0.0100   -0.0000
##    420        0.0066            -nan     0.0100   -0.0000
##    440        0.0063            -nan     0.0100   -0.0000
##    460        0.0060            -nan     0.0100   -0.0000
##    480        0.0057            -nan     0.0100   -0.0000
##    500        0.0055            -nan     0.0100   -0.0000
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.7174            -nan     0.0100    0.0131
##      2        0.7046            -nan     0.0100    0.0130
##      3        0.6921            -nan     0.0100    0.0131
##      4        0.6795            -nan     0.0100    0.0130
##      5        0.6668            -nan     0.0100    0.0135
##      6        0.6545            -nan     0.0100    0.0124
##      7        0.6428            -nan     0.0100    0.0105
##      8        0.6314            -nan     0.0100    0.0125
##      9        0.6199            -nan     0.0100    0.0114
##     10        0.6098            -nan     0.0100    0.0118
##     20        0.5064            -nan     0.0100    0.0104
##     40        0.3511            -nan     0.0100    0.0065
##     60        0.2453            -nan     0.0100    0.0042
##     80        0.1718            -nan     0.0100    0.0026
##    100        0.1220            -nan     0.0100    0.0020
##    120        0.0875            -nan     0.0100    0.0011
##    140        0.0642            -nan     0.0100    0.0008
##    160        0.0475            -nan     0.0100    0.0006
##    180        0.0359            -nan     0.0100    0.0004
##    200        0.0278            -nan     0.0100    0.0003
##    220        0.0221            -nan     0.0100    0.0002
##    240        0.0181            -nan     0.0100    0.0002
##    260        0.0152            -nan     0.0100    0.0001
##    280        0.0132            -nan     0.0100    0.0001
##    300        0.0117            -nan     0.0100    0.0000
##    320        0.0105            -nan     0.0100    0.0000
##    340        0.0095            -nan     0.0100    0.0000
##    360        0.0088            -nan     0.0100   -0.0000
##    380        0.0082            -nan     0.0100    0.0000
##    400        0.0077            -nan     0.0100    0.0000
##    420        0.0074            -nan     0.0100   -0.0000
##    440        0.0070            -nan     0.0100   -0.0000
##    460        0.0067            -nan     0.0100   -0.0000
##    480        0.0064            -nan     0.0100   -0.0000
##    500        0.0062            -nan     0.0100   -0.0000
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.6932            -nan     0.0100    0.0109
##      2        0.6805            -nan     0.0100    0.0128
##      3        0.6682            -nan     0.0100    0.0116
##      4        0.6557            -nan     0.0100    0.0114
##      5        0.6436            -nan     0.0100    0.0124
##      6        0.6321            -nan     0.0100    0.0114
##      7        0.6203            -nan     0.0100    0.0112
##      8        0.6089            -nan     0.0100    0.0101
##      9        0.5974            -nan     0.0100    0.0084
##     10        0.5866            -nan     0.0100    0.0122
##     20        0.4886            -nan     0.0100    0.0089
##     40        0.3395            -nan     0.0100    0.0063
##     60        0.2380            -nan     0.0100    0.0038
##     80        0.1678            -nan     0.0100    0.0028
##    100        0.1190            -nan     0.0100    0.0021
##    120        0.0865            -nan     0.0100    0.0012
##    140        0.0631            -nan     0.0100    0.0009
##    160        0.0470            -nan     0.0100    0.0007
##    180        0.0355            -nan     0.0100    0.0004
##    200        0.0276            -nan     0.0100    0.0003
##    220        0.0221            -nan     0.0100    0.0001
##    240        0.0181            -nan     0.0100    0.0001
##    260        0.0152            -nan     0.0100    0.0001
##    280        0.0130            -nan     0.0100    0.0001
##    300        0.0113            -nan     0.0100    0.0000
##    320        0.0102            -nan     0.0100    0.0000
##    340        0.0092            -nan     0.0100    0.0000
##    360        0.0085            -nan     0.0100   -0.0000
##    380        0.0080            -nan     0.0100    0.0000
##    400        0.0075            -nan     0.0100   -0.0000
##    420        0.0072            -nan     0.0100    0.0000
##    440        0.0068            -nan     0.0100    0.0000
##    460        0.0065            -nan     0.0100   -0.0000
##    480        0.0062            -nan     0.0100   -0.0000
##    500        0.0059            -nan     0.0100   -0.0000
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.7039            -nan     0.0100    0.0121
##      2        0.6911            -nan     0.0100    0.0137
##      3        0.6788            -nan     0.0100    0.0136
##      4        0.6661            -nan     0.0100    0.0130
##      5        0.6540            -nan     0.0100    0.0131
##      6        0.6425            -nan     0.0100    0.0120
##      7        0.6301            -nan     0.0100    0.0122
##      8        0.6183            -nan     0.0100    0.0119
##      9        0.6072            -nan     0.0100    0.0097
##     10        0.5961            -nan     0.0100    0.0105
##     20        0.4943            -nan     0.0100    0.0093
##     40        0.3415            -nan     0.0100    0.0060
##     60        0.2372            -nan     0.0100    0.0043
##     80        0.1659            -nan     0.0100    0.0026
##    100        0.1180            -nan     0.0100    0.0016
##    120        0.0853            -nan     0.0100    0.0015
##    140        0.0619            -nan     0.0100    0.0008
##    160        0.0455            -nan     0.0100    0.0006
##    180        0.0342            -nan     0.0100    0.0005
##    200        0.0266            -nan     0.0100    0.0003
##    220        0.0211            -nan     0.0100    0.0002
##    240        0.0172            -nan     0.0100    0.0002
##    260        0.0144            -nan     0.0100    0.0001
##    280        0.0124            -nan     0.0100    0.0000
##    300        0.0108            -nan     0.0100    0.0000
##    320        0.0096            -nan     0.0100    0.0000
##    340        0.0088            -nan     0.0100    0.0000
##    360        0.0082            -nan     0.0100    0.0000
##    380        0.0076            -nan     0.0100    0.0000
##    400        0.0071            -nan     0.0100    0.0000
##    420        0.0068            -nan     0.0100    0.0000
##    440        0.0064            -nan     0.0100   -0.0000
##    460        0.0062            -nan     0.0100    0.0000
##    480        0.0059            -nan     0.0100   -0.0000
##    500        0.0056            -nan     0.0100   -0.0000
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.7014            -nan     0.0100    0.0136
##      2        0.6886            -nan     0.0100    0.0134
##      3        0.6761            -nan     0.0100    0.0121
##      4        0.6634            -nan     0.0100    0.0114
##      5        0.6509            -nan     0.0100    0.0126
##      6        0.6391            -nan     0.0100    0.0131
##      7        0.6269            -nan     0.0100    0.0122
##      8        0.6153            -nan     0.0100    0.0115
##      9        0.6037            -nan     0.0100    0.0109
##     10        0.5923            -nan     0.0100    0.0094
##     20        0.4920            -nan     0.0100    0.0098
##     40        0.3409            -nan     0.0100    0.0057
##     60        0.2378            -nan     0.0100    0.0042
##     80        0.1673            -nan     0.0100    0.0031
##    100        0.1189            -nan     0.0100    0.0021
##    120        0.0858            -nan     0.0100    0.0014
##    140        0.0629            -nan     0.0100    0.0008
##    160        0.0464            -nan     0.0100    0.0007
##    180        0.0351            -nan     0.0100    0.0004
##    200        0.0271            -nan     0.0100    0.0003
##    220        0.0212            -nan     0.0100    0.0002
##    240        0.0172            -nan     0.0100    0.0001
##    260        0.0143            -nan     0.0100    0.0001
##    280        0.0123            -nan     0.0100    0.0001
##    300        0.0108            -nan     0.0100    0.0000
##    320        0.0097            -nan     0.0100    0.0000
##    340        0.0088            -nan     0.0100    0.0000
##    360        0.0081            -nan     0.0100    0.0000
##    380        0.0076            -nan     0.0100    0.0000
##    400        0.0072            -nan     0.0100   -0.0000
##    420        0.0068            -nan     0.0100   -0.0000
##    440        0.0065            -nan     0.0100   -0.0000
##    460        0.0062            -nan     0.0100   -0.0000
##    480        0.0060            -nan     0.0100   -0.0000
##    500        0.0058            -nan     0.0100   -0.0000
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        0.6978            -nan     0.0100    0.0138
##      2        0.6846            -nan     0.0100    0.0113
##      3        0.6719            -nan     0.0100    0.0115
##      4        0.6590            -nan     0.0100    0.0122
##      5        0.6468            -nan     0.0100    0.0118
##      6        0.6346            -nan     0.0100    0.0108
##      7        0.6231            -nan     0.0100    0.0117
##      8        0.6110            -nan     0.0100    0.0128
##      9        0.5999            -nan     0.0100    0.0118
##     10        0.5889            -nan     0.0100    0.0107
##     20        0.4880            -nan     0.0100    0.0103
##     40        0.3372            -nan     0.0100    0.0061
##     60        0.2351            -nan     0.0100    0.0043
##     80        0.1652            -nan     0.0100    0.0028
##    100        0.1162            -nan     0.0100    0.0020
##    120        0.0833            -nan     0.0100    0.0015
##    140        0.0607            -nan     0.0100    0.0009
##    160        0.0445            -nan     0.0100    0.0006
##    180        0.0335            -nan     0.0100    0.0004
##    200        0.0258            -nan     0.0100    0.0003
##    220        0.0203            -nan     0.0100    0.0002
##    240        0.0164            -nan     0.0100    0.0001
##    260        0.0136            -nan     0.0100    0.0001
##    280        0.0117            -nan     0.0100    0.0000
##    300        0.0103            -nan     0.0100    0.0000
##    320        0.0092            -nan     0.0100    0.0000
##    340        0.0084            -nan     0.0100    0.0000
##    360        0.0078            -nan     0.0100    0.0000
##    380        0.0072            -nan     0.0100   -0.0000
##    400        0.0068            -nan     0.0100   -0.0000
##    420        0.0064            -nan     0.0100   -0.0000
##    440        0.0061            -nan     0.0100   -0.0000
##    460        0.0058            -nan     0.0100   -0.0000
##    480        0.0056            -nan     0.0100   -0.0000
##    500        0.0054            -nan     0.0100   -0.0000
model2 <- train(X[train, ], Y[train], method = "blackboost", trControl = myControl)
## This is mboost 2.2-2. See 'package?mboost' and the NEWS file for a
## complete list of changes. Note: The default for the computation of the
## degrees of freedom has changed.  For details see section 'Global Options'
## of '?bols'.
## Loading required package: grid
## Loading required package: modeltools
## Loading required package: stats4
## Attaching package: 'modeltools'
## The following object is masked from 'package:R.oo':
## 
## clone, dimension
## The following object is masked from 'package:plyr':
## 
## empty
## Loading required package: coin
## Loading required package: mvtnorm
## Loading required package: sandwich
## Loading required package: strucchange
## Loading required package: vcd
## Loading required package: MASS
## Loading required package: colorspace
model3 <- train(X[train, ], Y[train], method = "parRF", trControl = myControl)
## randomForest 4.6-7
## Type rfNews() to see new features/changes/bug fixes.
model4 <- train(X[train, ], Y[train], method = "mlpWeightDecay", trControl = myControl, 
    trace = FALSE)
## Loading required package: Rcpp
## Attaching package: 'RSNNS'
## The following object is masked from 'package:caret':
## 
## confusionMatrix, train
## The following object is masked from 'package:galgo':
## 
## confusionMatrix
model5 <- train(X[train, ], Y[train], method = "ppr", trControl = myControl)
model6 <- train(X[train, ], Y[train], method = "earth", trControl = myControl)
## Loading required package: plotmo
## Loading required package: plotrix
model7 <- train(X[train, ], Y[train], method = "glm", trControl = myControl)
model8 <- train(X[train, ], Y[train], method = "svmRadial", trControl = myControl)
## Attaching package: 'kernlab'
## The following object is masked from 'package:modeltools':
## 
## prior
## The following object is masked from 'package:galgo':
## 
## scaling
model9 <- train(X[train, ], Y[train], method = "gam", trControl = myControl)
## This is mgcv 1.7-22. For overview type 'help("mgcv-package")'.
## Attaching package: 'mgcv'
## The following object is masked from 'package:magic':
## 
## magic
model10 <- train(X[train, ], Y[train], method = "glmnet", trControl = myControl)
## Loading required package: Matrix
## Loaded glmnet 1.9-3

# Make a list of all the models
all.models <- list(model1, model2, model3, model4, model5, model6, model7, model8, 
    model9, model10)
names(all.models) <- sapply(all.models, function(x) x$method)
sort(sapply(all.models, function(x) min(x$results$RMSE)))
##          parRF            gbm          earth     blackboost      svmRadial 
##         0.1338         0.1385         0.1586         0.1832         0.2050 
##            ppr mlpWeightDecay            glm         glmnet            gam 
##         0.2117         0.2959         0.4951         0.6587         1.5920

# Make a greedy ensemble - currently can only use RMSE
greedy <- caretEnsemble(all.models, iter = 1000L)
## Loading required package: pbapply
sort(greedy$weights, decreasing = TRUE)
##          parRF          earth            gbm            ppr mlpWeightDecay 
##          0.440          0.276          0.139          0.096          0.027 
##      svmRadial 
##          0.022
greedy$error
##   RMSE 
## 0.1287

# Make a linear regression ensemble
linear <- caretStack(all.models, method = "glm", trControl = trainControl(method = "cv"))
summary(linear$ens_model$finalModel)
## 
## Call:
## NULL
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8912  -0.0518   0.0037   0.0503   0.4768  
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.01764    0.01654    1.07  0.28732    
## gbm             0.05273    0.15107    0.35  0.72738    
## blackboost      0.03662    0.08151    0.45  0.65365    
## parRF           0.49006    0.12280    3.99  9.0e-05 ***
## mlpWeightDecay -0.03350    0.03486   -0.96  0.33763    
## ppr             0.16677    0.03915    4.26  3.0e-05 ***
## earth           0.29905    0.06839    4.37  1.9e-05 ***
## glm            -0.08775    0.02396   -3.66  0.00031 ***
## svmRadial       0.07955    0.06083    1.31  0.19233    
## gam            -0.00771    0.00426   -1.81  0.07179 .  
## glmnet          0.02879    0.02955    0.97  0.33099    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.0157)
## 
##     Null deviance: 164.2338  on 230  degrees of freedom
## Residual deviance:   3.4546  on 220  degrees of freedom
## AIC: -291.3
## 
## Number of Fisher Scoring iterations: 2
linear$error
##   parameter   RMSE Rsquared  RMSESD RsquaredSD
## 1      none 0.1347   0.9731 0.03507    0.01274

# Predict for test set:
preds <- data.frame(sapply(all.models, predict, newdata = X[!train, ]))
preds$ENS_greedy <- predict(greedy, newdata = X[!train, ])
preds$ENS_linear <- predict(linear, newdata = X[!train, ])
sort(sqrt(colMeans((preds - Y[!train])^2)))
##     ENS_greedy     ENS_linear            gbm          parRF          earth 
##         0.1583         0.1614         0.1695         0.1737         0.2034 
##            ppr     blackboost mlpWeightDecay      svmRadial            glm 
##         0.2368         0.2476         0.2728         0.2779         0.4881 
##         glmnet            gam 
##         0.6942         2.0974

IPAC validation

After having the models built as well as the ensamble of them it is time to predict for the IPAC dataset. Thus we start to prepare the data,

# Cargamos bq_clean (IPAC)
yy <- apply(sn, 1, feature_extr, bq_clean)
yy <- as.data.frame(yy)
colnames(yy) <- as.character(sn[, 1])
colnames(yy) <- str_replace(paste("X", colnames(yy), sep = ""), ":", ".")
# Predict for new dataset:
predf <- data.frame(sapply(all.models, predict, newdata = yy))
## Error: object 'X53.62.1' not found
predf$ENS_greedy <- predict(greedy, newdata = yy)
## Error: variables in the training data missing in newdata
predf$ENS_linear <- predict(linear, newdata = yy)
## Error: object 'X53.62.1' not found

Neighbourhood Analysis

In order to realize how close or isolated the both sets (BT-SETTL and IPAC) are a PCA analysis is performed:

# Cargamos bq_clean (IPAC)
zz <- rbind(X, yy)
pcaz <- prcomp(zz)
plot(pcaz$x[, 1], pcaz$x[, 2], pch = ".")
points(pcaz$x[(nrow(X) + 1):nrow(zz), 1], pcaz$x[(nrow(X) + 1):nrow(zz), 2], 
    pch = "x", col = 3)
points(pcaz$x[1:nrow(X), 1], pcaz$x[1:nrow(X), 2], pch = "+", col = 2)

plot of chunk lee04

rownames(yy)[896 - nrow(X)]
## [1] "LP_799-3.7512.txt"

According to the obtained predictions, some comparison will be performed against the prediction carried out by using spectrafull projection by means of ICA/JADE technology. This was carried out by Miss Prendes Gero at http://innova.uned.es, so we will download her prediction datasets and just compare them.

# Cargamos bq_clean (IPAC)
load("~/git/M_sel/belen_resul_T.RData")
idx <- unlist(lapply(bq_clean, function(x) {
    return(x$name)
})) %in% rownames(dd)
plot(dd[, 2], predf$ENS_greedy[idx])
## Error: error in evaluating the argument 'y' in selecting a method for
## function 'plot': Error: object 'predf' not found
rownames(dd)[124]
## [1] "LP_799-3.7512.txt"
plot(dd[, 2], predf$ENS_greedy[idx], xlim = c(1500, 3500), ylim = c(1500, 3500))
## Error: error in evaluating the argument 'y' in selecting a method for
## function 'plot': Error: object 'predf' not found
lines(c(1500, 3500), c(1500, 3500), col = 2)
## Error: plot.new has not been called yet
hist((dd[, 3] - predf$ENS_greedy[idx])/sd(dd[, 1] - predf$ENS_greedy[idx]), 
    breaks = 20)
## Error: object 'predf' not found
mean(dd[, 3] - predf$ENS_greedy[idx])
## Error: error in evaluating the argument 'x' in selecting a method for
## function 'mean': Error: object 'predf' not found
sd(dd[, 3] - predf$ENS_greedy[idx])
## Error: object 'predf' not found