Used oil viscosity and
Total Acid Number (TAN) by FT-IR.
#Spectra Segmentation
- Introduction
- Data Segmentation

Used oil viscosity and

Total Acid Number (TAN) by FT-IR.

#Spectra Segmentation

Introduction

The starting point is a set of FT-IR spectra files already preprocessed as a whole in order to normalize them. The whole set was stored at BL2012_tmp.RData database.

The object name for normallized spectra was acusns and the one for the physial properties was resul. This last object includes the name of the espectra, its TAN according to ASTM D-664, its viscosity by 100ºC according to the ASTM D-445 and its viscosity by 40ºC.

Data Segmentation

The first step is to get data properly loaded into the system.

suppressPackageStartupMessages(library(googleVis))
suppressPackageStartupMessages(library(SVGAnnotation))
suppressPackageStartupMessages(library(JADE))
suppressPackageStartupMessages(library(cvTools))
suppressPackageStartupMessages(library(robustbase))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(pls))
suppressPackageStartupMessages(library(e1071))
suppressPackageStartupMessages(library(xtable))
suppressPackageStartupMessages(library(graphics))
#
panel.hist <- function(x, ...) {
    usr <- par("usr")
    on.exit(par(usr))
    par(usr = c(usr[1:2], 0, 1.5))
    h <- hist(x, plot = FALSE)
    breaks <- h$breaks
    nB <- length(breaks)
    y <- h$counts
    y <- y/max(y)
    rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}
## put (absolute) correlations on the upper panels, with size proportional
## to the correlations.
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...) {
    usr <- par("usr")
    on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits = digits)[1]
    txt <- paste(prefix, txt, sep = "")
    if (missing(cex.cor)) 
        cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}
#
if (!file_test("-f", "BL2012_tmp.RData")) {
    ## Error message coming ...
    stop("No valid database was found.")
} else {
    load(file = "BL2012_tmp.RData")
}
# ===========================================
data <- as.data.frame(t(acusns))
colnames(data) <- acus[, 1]
# Si se parte de la BL2012.Rdata se debe borrar porque son 2.1Gb
# 
# rm(datz,ddp,dp,rr_tot,rs,rs_tot,rsu,rt_tot,ru_tot,trans);
# rm(Line01,Mot,call,call0,call2,call3,call4,cvExpr,folds);
# rm(desp,fact,cn,form,formula,gas1,i,ica,icc,iddat,iddx,idx);
# rm(ip,itrn,namesf,ncc,ndica, ndx,numf,pca,ref,rfiles,sets,ss,sss);
# rm(str_tb_remov,xref,y,ymax,ymin,yref,ddat,vres);
# rm(amplifica,calcula_fac,calcula_off,cvMod,dataSubset,desplaza,doCV);
# rm(doCall,fact2,factu,rdos,getSubsetList);
vres <- resul[resul[, 1] %in% colnames(acusns), ]
data <- data[rownames(data) %in% vres[, 1], ]
# Now data are the spectra and vres stores physical properties
dd <- data[sort.list(rownames(data)), ]
data <- dd
dd <- vres[sort.list(vres[, 1]), ]
vres <- dd
rm(dd)

Then, it is time to produce different features from the spectra’s data. The oil spectrum has 3451 points sampled by 1 \(cm^{-1}\). There are in total 145 patterns (spectra samples).

Next action will be to define potential spectra based features to be connected to relevant physical features. In order to accomplish this task, literature review was used to show up the relevant parts of the spectrum already highlighted before.

Thus the following table summarizes the identified regions

w_lengths <- data.frame(From = c(3600, 3540, 3050, 1800, 1750, 1630, 1300, 800), 
    To = c(3200, 3380, 2750, 1500, 1550, 1540, 600, 650))
table01 <- xtable(w_lengths, caption = "Wavelengths of interest in $cm^{-1}$")
print(table01, type = "html")

Wavelengths of interest in \(cm^{-1}\)
	From	To
1	3600.00	3200.00
2	3540.00	3380.00
3	3050.00	2750.00
4	1800.00	1500.00
5	1750.00	1550.00
6	1630.00	1540.00
7	1300.00	600.00
8	800.00	650.00

Now, after defining the spectra windows, we face its segmentation.

segmentar <- function(x, datos) {
    org <- min(which(as.numeric(colnames(datos)) <= x[1]))
    dst <- max(which(as.numeric(colnames(datos)) >= x[2]))
    if (dst > org) {
        return(datos[, org:dst])
    }
    return(NA)
}
#
brutef <- apply(w_lengths, 1, segmentar, data)

After producing the segmentation, some features are identified as with potential interest, * Area - averaged bordered rectangular area * mean * median * Standard deviation * sd/mean

features <- function(datos) {
    ini <- 1
    lst <- dim(datos)[2]
    area <- lst * (datos[1] + datos[lst])/2
    f1 <- apply(datos, 1, sum) - area
    f2 <- apply(datos, 1, mean)
    f3 <- apply(datos, 1, median)
    f4 <- apply(datos, 1, sd)
    f5 <- f4/f2
    final <- cbind(f1, f2, f3, f4, f5)
    colnames(final) <- c("Area", "Mean", "Median", "SD", "CV")
    return(final)
}
#
feat <- lapply(brutef, features)

After producing the set of features, it is time to check their linear correlation against the physical parameters.

for (j in 1:dim(w_lengths)[1]) {
    pairs(cbind(feat[[j]], vres[, 2:4]), lower.panel = panel.smooth, upper.panel = panel.cor, 
        diag.panel = panel.hist, main = paste("Wavelengths:", w_lengths[j, 1], 
            " - ", w_lengths[j, 2], sep = ""))
}

plot of chunk pairs