Modeling based on the dataset which contains 47 variables and hundreds of observations for each carbon brush of each engine. Each observation represents the parameter measured or calculated from the beginning of each carbon brush usage until each date of revision this carbon brush by staff (isReplased 0/1 variable gives the information whether carbon brush was replaced during revision).

All these variables are included in dataset according to the request and based on measurements every 1 minute.

Now I need reduce the number of variable as more as possible. At the first stage, I test the information gain of variables against “isReplaced” parameter for each engine separately, and actually, divide huge dataset for smaller datasets (for each engine). Variables with Zero informaition gain will be removed.

In the table below, we can see the weights of each variable in information gain of “isReplaced” for each engine.

According to the table, prominent information gain brings “C” value- it is correct for all engines. “C” value gives information about the horizontal location of the carbon brush. This means, that every horizontal position has a significantly different number of carbon brushes replacements during a period of time covered by the dataset. Only for curiosity, let’s find a percent of the replaced brushes for each “C” position.

How it was expected, the carbon brush lifespan significantly depends on its horizontal position.

Cement Mill 1

Next step, I’ll separate the data relevant to Cement MIll 1 engine, and remove variables which bring Zero information gain.
Let’s see the correlation table of 26 variables left in the dataset.

According to the correlation table, besides the relatively independent variables the dataset contains three groups of variables with a strong correlation within the groups:
a) group of variables connected with a temperature of coils;
b) group of variables connected with engine power;
c) group of variables connected with engine work hours.

I’ll replace these three groups with three principal components.
Material type variables represent the time of production each material during the mill running. They naturally correlate with variable of engine running time. Those material type variables will be omitted.

Temperature Of Coils

The group consists of coils average temperature variables - R, S, T coils two sensors on each coil- six variables: R1, R2, S1, S2, T1, T2. Also, there is a correlation between coil average temperature and the average temperature of the rear bearing. Al these seven variables will be represented by principal component “tempCoilsAVG.PC”. The principal component has a strong correlation with all variables of the group.


Engine Power

The group consists of four correlated variables: average Current, average Power, a standard deviation of Current, and standard deviation of Power. Also, there is a correlation between Current-Power variables and standard deviation of vibration of the front bearing. Al these five variables will be represented by principal component “powerAVG.PC”. The principal component has a strong correlation with all variables of the group.


Engine Work Hours

The group consists of four correlated variables: work hours, general duration of start phase, general duration of “long start” phase, and a sum of unstable engine runs and runs with a long start phase-“longStart_unstableRun_Sum”.All these variables naturally correlated with engine work hours- they are increased with time. Also, these three variables (except work hours)- are the product of a classification algorithm- it makes them “difficult to obtain” in real-time model implementation. It causes me to decide to omit these three artificial variables, and leave only work time var.

Finally, we have reduced the number of variables in dataset from 47 to 9, without strong correlations between them.

Modeling

All variables should be scaled in the range between 0 and 1.
Finally, the dataset for training the model looks like that:

## 'data.frame':    1565 obs. of  9 variables:
##  $ C                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ R                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ stdVibrationRearBearing : num  0.745 0.79 0.842 0.859 0.86 ...
##  $ avgVibrationFrontBearing: num  0.586 0.653 0.733 0.762 0.764 ...
##  $ workHours               : num  0.0872 0.1158 0.1524 0.1774 0.2023 ...
##  $ avgPM10stack            : num  0 0.0269 0.0585 0.0756 0.0916 ...
##  $ stdTempFrontBearing     : num  0.0864 0.1015 0.1694 0.1919 0.2009 ...
##  $ tempCoilsAVG.PC         : num  0.456 0.528 0.588 0.622 0.649 ...
##  $ powerAVG.PC             : num  0.834 0.837 0.844 0.848 0.851 ...

In order to compare different models, I choose to evaluete their acuracy and sensetivity parameters. Also, ROC curves will be compared. Due to a relatively small number of observations, I’ll use the cross-validation method.



NoModel Let’s define kind of baseline: Curbon brushes were replaced in 8.75% of cases. I’ll take random 8.75% of observations and define them as “1”-“isReplaced”, and after, I’ll calculate sensitivity and accuracy of this random choices.
Confusion matrix of random choice (“NoModel”):

##           NoModel
##            predicted 0 predicted 1
##   actual 0        1304         124
##   actual 1         118          19

Decision Tree model:

According to the optimal combination of accuracy and sensitivity of the model, the optimal probability is 0.06. Now we’ll convert predictions with probability 0.06 and more to “1” and predictions with probability less than 0.06 to “0”.
Confusion matrix of Desicion Tree model:

##           dt_prediction
##            predicted 0 predicted 1
##   actual 0         956         472
##   actual 1          37         100

Random Forest

According to the optimal combination of accuracy and sensitivity of the model, the optimal probability is 0.04. Now we’ll convert predictions with probability 0.04 and more to “1” and predictions with probability less than 0.04 to “0”.
Confusion matrix of Desicion Tree model:

##           RF_prediction
##            predicted 0 predicted 1
##   actual 0         993         435
##   actual 1          26         111

Naive Bayes model:

According to the optimal combination of accuracy and sensitivity of the model, the optimal probability is 0.08. Now we’ll convert predictions with probability 0.08 and more to “1” and predictions with probability less than 0.08 to “0”.
Confusion matrix of Naive Bayes model:

##           nb_prediction
##            predicted 0 predicted 1
##   actual 0         925         503
##   actual 1          43          94

KNN model:

According to the optimal combination of accuracy and sensitivity of the model, the optimal probability is 0.9. Now we’ll convert predictions with probability 0.9 and more to “1” and predictions with probability less than 0.9 to “0”.
Confusion matrix of KNN model:

##           knn_prediction
##            predicted 0 predicted 1
##   actual 0         410        1018
##   actual 1          51          86

** Logistic regression model:**

According to the optimal combination of accuracy and sensitivity of the model, the optimal probability is 0.08. Now we’ll convert predictions with probability 0.08 and more to “1” and predictions with probability less than 0.08 to “0”.
Confusion matrix of Logistic Regression model:

##           glm_pred
##            predicted 0 predicted 1
##   actual 0         902         526
##   actual 1          45          92

> Mixture Discriminant Analysis

According to the optimal combination of accuracy and sensitivity of the model, the optimal probability is 0.06. Now we’ll convert predictions with probability 0.06 and more to “1” and predictions with probability less than 0.06 to “0”.
Confusion matrix of Mixture Discriminant Analysis model:

##           mda_pred
##            predicted 0 predicted 1
##   actual 0         841         587
##   actual 1          41          96

Support Vector Machines model:

According to the optimal combination of accuracy and sensitivity of the model, the optimal probability is 0.02. Now we’ll convert predictions with probability 0.02 and more to “1” and predictions with probability less than 0.02 to “0”.
Confusion matrix of SVM:

##           svm_prediction
##            predicted 0 predicted 1
##   actual 0         310        1118
##   actual 1           6         131

Artificial Neural Network

According to the optimal combination of accuracy and sensitivity of the model, the optimal probability is 0.08. Now we’ll convert predictions with probability 0.08 and more to “1” and predictions with probability less than 0.08 to “0”.
Confusion matrix of Artificial Neural Network model:

##           ann_pred
##            predicted 0 predicted 1
##   actual 0         935         493
##   actual 1          40          97

Comparison applied models:

Sensitivity And Accuracy:

##                                 model sensitivity  accuracy
## 1                  Random Forest (RF)   0.8102190 0.7054313
## 2                  Decision Tree (DT)   0.7299270 0.6747604
## 3     Artificial Neural Network (ANN)   0.7080292 0.6594249
## 4                    Naive Bayes (NB)   0.6861314 0.6511182
## 5           Logistic Regression (GLM)   0.6715328 0.6351438
## 6 Mixture Discriminant Analysis (MDA)   0.7007299 0.5987220
## 7       Support Vector Machines (SVM)   0.9562044 0.2817891
## 8            K-Nearest Neighbor (KNN)   0.6277372 0.3169329
## 9                            No Model   0.1386861 0.8453674

It looks like that better result provides Randon Forest model(RF): about 81% of correctly predicted “isReplaced” carbon brushes and more then 70% of correct predictions.

For final desision let’s compare the ROC (Receiver Operating Characteristics curve) of the models.

ROC:

Obviously, according to chosen parameters for methods evaluation - sensitivity, accuracy, and ROC curve -the algorithm provide the best solution is Random Forest.

To illustrate the work of the Random Forest model, let’s try to predict the replacement of carbon brushes to any of the revision dates, let’s say - December 13, 2011.

For that purpose, I’ll prepare the data exactly in the same way as I made it in the previous stage, replace the data connected to revision on 13 Dec 2011 to separate dataset.

Finally, the dataset for training the model looks like that:

## 'data.frame':    1536 obs. of  9 variables:
##  $ C                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ R                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ stdVibrationRearBearing : num  0.766 0.807 0.854 0.871 0.872 ...
##  $ avgVibrationFrontBearing: num  0.586 0.653 0.733 0.762 0.764 ...
##  $ workHours               : num  0.0702 0.0932 0.1226 0.1427 0.1628 ...
##  $ avgPM10stack            : num  0 0.0269 0.0585 0.0756 0.0916 ...
##  $ stdTempFrontBearing     : num  0.0864 0.1015 0.1694 0.1919 0.2009 ...
##  $ tempCoilsAVG.PC         : num  0.474 0.543 0.601 0.634 0.661 ...
##  $ powerAVG.PC             : num  0.835 0.838 0.845 0.849 0.852 ...

Random forests is an ensemble learning method that operates by constructing a multitude of decision trees at training time. In this case I assemble 1500 trees. For illustration only and better understanding, let’s build and drow single Decision Tree model. (The chart shows scaled values!!!).

Obviously, how it was expected, the most powerful classifier is the number of engine work hours.

Now let’s train the Random Forest model and try to make prediction wich carbon brushes probably be deteriorated at 13 Dec 2011.

Prediction of carbon brush replacement is given in probability (a percentage between 0 and 100) in the right table after a first double star. In the left table, the probability of replacement is shown by the range of the symbol size.
Results of carbon brushes revision are shown in the right table (0 - brush was not replaced, 1- brush was replaced) before the first double star.
Correspondence between prediction and results of carbon brushes revision is shown in the left table: “V”- prediction was correct, “X”- prediction does not correspond to revision result, and by the range of color: green: correct prediction, purple: wrong prediction.
The lifespan of the carbon brush is shown in the right table in the brackets, after second double star. According to the previous research, the mean lifespan of the cement mill 1 carbon brushes is 412 days. The lifespan parameter in “Days” is given here for reference only- the models use “work hours” as lifespan param.


Confusion matrix of the prediction:

##           randomForest.Prediction
##            predicted 0 predicted 1
##   actual 0          12           8
##   actual 1           2           8
## [1] "prediction sensitivity is 80%"
## [1] "prediction accuracy is 67%"

Before analyzing the prediction results, let’s remember, that the main goal of the research- is to provide an indication of carbon brushes deterioration based on common external indicators. How it was shown, there are cases when local factors, like brush spring, affect the brush deterioration significantly stronge then common external factors like engine work hours, temperature, vibration and others. In those cases, “Wrong” prediction could point to the problem of the certain carbon brush.
For instance, let’s take C1R03 and C1R04 neighbor carbon brushes. Both of them were in use 358 days. In both cases prediction is to replace the brushes, but actually, C1R03 was replaced (“correct” prediction), and C1R04 was in use 790 days - more than one year and two months longer than its neighbor. Obviously, the “wrong” prediction point to some problem with C1R04.
Exactly the same problem with C2R09 and C2R10 neighbor carbon brushes. Both of them were in use 677 days at the day of prediction- significantly more than average. In both cases prediction is to replace the brushes, but actually, C2R09 was replaced (“correct” prediction), and C2R10 was in use more than seven months longer than its neighbor. Obviously, the “wrong” prediction point to some problem with C2R10.
C3R02, C3R03, and C3R09 -all of them were in use significantly more than average - and were predicted to replace. Obviously, the “wrong” prediction point to problems with those carbon brushes.
Let’s check the case with brushes C1R08 and C1R09. Both of them were in use 274 days at the day of prediction. In both cases prediction is to replace the brushes, but actually, C1R08 was replaced (“correct” prediction), and C1R09 was in use 107 days more.

Actually, only cases C1R01, C1R02, C1R05, and C2R04 can be considered as “wrong” prediction. Hopefully, the further learning process of the model will reduce the wrong predictions.