This tutorial was created under the supervision of Dr. José Alexandre Melo Dematte as part of the remote sensing for soil assessment discipline.
The AlradSpectra software was developed to perform preprocessing, multivariate modeling and prediction using spectral data.
Developed by researchers at:
Federal University of Santa Catarina and Federal University of Santa Maria, Brazil
Authors and contributors:
Andre Dotto; Luiz Ruiz; Ricardo Dalmolin; Alexandre ten Caten; Diego Gris.
How to cite:
DOTTO, A.C.; DALMOLIN, R.S.D.; TEM CATEN, A.; GRIS, D.J.; RUIZ, L.F.C. Alrad Spectra:
a graphical user interface in R to perform preprocessing, multivariate modeling and
predictionusing spectroscopic data. Submitted.
You can also find the Alrad Spectra source code at: GitHub AlradSpectra.
If you do not have R:
R exists in a 32-bit and 64-bit architecture. If you have a 32/64-bit Windows version, find the download at the following link:
Download R 3.4.2 for Windows (75 megabytes, 32/64 bit)
https://cran.r-project.org/bin/windows/base/R-3.4.2-win.exe
If you want to use R in a more user-friendly interface, we recommend installing Rstudio:
https://www.rstudio.com/products/rstudio/download/
Figure 1. Terminal R and RStudio interface.
R vs. RStudio terminal.
First step: You need to install devtools package only for the first time.
install.packages("devtools")
Now, install Alrad Spectra from GitHub.
devtools::install_github("AlradSpectra/AlradSpectra")
Load and Initialize AlradSpectra, Figure 2.
AlradSpectra::AlradSpectra()
AlradSpectra interface.
To run an AlradSpectra example you will need to download the following data files:
Now, let’s read our first data file. Click in the Browse button to find your local file path, e.g: H:/Banco_VIS_NIR_MIR.csv, Figure 3.
Example of browsing a file path.
After loading the file and according to its format, fill the parameters field.In our example, copy the information that appears in the gaps: Separator (;), Decimal separator (.), Header (TRUE), Spectral data starts at column (19), Spectral data ends at column (2169), Spectrum starts at wevelenght (350), Spectral ends at wavelenght (2500), Y variable is at column (8) and Y variable name (Argila (%) or Clay (%)).
Then, click in the Import data button (Figure 4) and Let’s check our file clicking in the View data button (Figure 5).
Importing data from file path.
Viewing data.
To view the imported spectra, click in the following button (Figure 6):
Viewing imported spectra.
To view Y descriptive statistics, click in the following button (Figure 7):
Viewing Y descriptive statistics.
Finaly, if you wanna view Y descriptive histogram click in its button (Figure 8).
Viewing Y histogram.
After importing and reading the VIS_NIR_MIR file, let’s start the Spectral Preprocessing by clicking in the option Spectral Preprocessing at the top of the page, as Figure 9.
Spectral processing.
At this stage you can choose which preprocessing will be more suitable for your analysis, such:
Smoothing: A simple moving average of spectral data using a convolution function. Package: prospectr.
Binning: Compute average values of a signal in pre-determined bins. Package: prospectr.
Absorbance: Transforms reflectance to absorbance values (log10(1/R).
Detrend: Normalizes each row by applying a Standard Normal Variate transformation followed by fitting a second order linear model and returning the fitted residuals. Packages: prospectr.
Continuum Removal: The continuum removal techinique was introduced by Clark and Roush (1984). The algorithm find points lying on the convex hull of a spectrum, connects the points by linear interpolation and normalizes the spectrum by dividing the input data by the interpolated line. Package: prospectr. Data type: Reflectance; Interpolation method: Linear; Normalization method: Division.
Savitzky-Golay Derivate - SGD: The Savitzky-Golay algorithm fits a local polynomial regression on the signal. It reuires evenly spaced data points. Package: prospectr.
Standard Normal Variate - SNV: normalizes each row by substracting each row by its mean and dividing by its standards deviation. Package: prospectr.
Multiplication Scatter/Signal Correction - MSC: Performs multiplication scatter/signal correction on spectral data. Package: pls.
Normalization: Different tupes of data normalization. Package: clusterSim.
After choosing the most suitable spectral preprocessing, firstly you will run the function (Run button), than you can view the spectra plot (View spectra) and save the preprocessed spectra if you want, as shown previously.
Figure 10 highlight some examples of spectra plots from Smoothing, Absorbance, Continuum Removal and SGD, respectively.
Spectra from different processing.
The next step is to start the modeling process. For that, we will need to select input data for modeling clicking on the above Modeling tab, as shown below. In our example you will be able to choose Original, Absorbance, Continuum Removal or Savitzky-Golay Derivate - SGD, Figure 11.
Modeling data.
As an example, let’s select the Original data for modeling and determine that 30% of the population will be the validation set size. Then, you can Split the data (A) to see the number of training samples and the number of validation samples. You can also perform a Levene’s Test for Homogeneity of Variance (B). In addition, you can view descriptive statistics of groups (C) and view box plots (D), as shown in the Figure 12 e 13.
Sequence of analysis.
Results from the data processing.
Once you have gone through these steps, you can choose which statistical model would be more suitable for your data set. The following statistic analysis are available:
Multiple Linear Regression - MLR: is a statistical method that uses several explanatory variables to predict the outcome of a response variable in a simple linear model. Package: caret.
Partial Least Squares Regression - PLSR: is considered the most common regression method applied in chemometrics and can deal with comples modeling problems. Packges: pls/caret.
Support Vector Machine - SVM: SVM models are efficient in modeling linear or nonlinear relationships and handing large databases. Packages: e1071/caret.
Random Forest - RF: RF models are blanck boxes approach that are very hard to interpret. Packages: randomForest/caret.
Artificial Neural Network - ANN: calculates the output from the hidden layer based on the activation function. Packages: elmNN/caret.
Gaussian Process Regression: Gaussian process applies a kernel function for training and predicting. Packages: kernlab/caret.
After choosing the statistical model you can determine some parameter to tuning your model, as shown bellow (Figure 14).
Statistical model and tuning parameters.
The next step is to run your statistical model. In this example, let’s choose a multiple linear regression model. To perform the analysis you have to click in the Run MLR model button, as shown in the previous image.
Then, you can view variables importance (A), MLR prediction statistics (B) and the measured vs. predicted results (C), Figure 15.
Results from the statistic analyisis.
The final step is the Prediction and for that you have to import spectral data. For this, some conditions are required, such:
File must contain only spectral data;
Spectral data for Prediction and Modeling must be the same lenght;
Spectral data used here must have the same preprocessing used to built the model;
The file must be in .csv or .txt format.
Before predicting, let’s browse our file and type the following parameters, Separator (leave blank for tab) “;”, Decimal separator “.”, Header “TRUE”, Spectrum starts at wavelength “350” and Spectrum ends at wavelength “2500”, Figure 16.
Importing data for prediction.
The next steps consist of:
Import data; Select model for prediction (in our example, Multiple Linear Regression - MLR); And finally, view clay predictions, Figure 17.
Clay prediction
After that you can save the predictions and initialize a new project or even close the AlradSpectra interface.
If you have any doubts do not hesitate to contact us: