The purpose of the project:

To recognize factors affect engine carbon brushes lifespan and find an appropriate algorithm allows predicting carbon brushes replacement (predictive maintenance).

Main stages of the project:

  1. Data gathering;
  2. Exploratory Data Analysis;
  3. Modeling;

Additional information:

  1. From research to production (recomendations).
  2. Preliminary conclusions and facts.

___________________________________________

1. Data gathering:

 a) retrieve and put in order raw data;
 b) build convenient for further work datasets based on the collected  raw data;
 c) accomplish datasets by data from the plant PI system.

  1. Provided raw csv files contain data of carbon brushes inspections for each engine separately since the middle of 2016. Previous data, since the year 2004 till middle 2016, was obtained from Access database and combined with data of csv files separately for each engine. In case, a carbon brush was replaced due to deterioration - the event is signed by “1”, otherwise by “0”.



  1. In this step, raw data was transformed to datasets convenient for further work. Preliminary data analysis was done due to recognize and care by outliers before retrieving additional data from PI plant system on the next step.
Raw data was collected, processed and analyzed separately for each engine:

Cement Mill 1 Engine
Cement Mill 2 Engine
Cement Mill 3 Engine
Cement Mill 4 Engine
Cement Mill 10 Engine
Cement Mill 11 Engine
Cement Mill 12 Engine

Raw Material Mill A Engine
Raw Material Mill B Engine
Raw Material Mill C Engine
  1. The following data was required to be retrieved from PI, and to be tested in addition to data of carbon brushes inspections:
    1. a non-normal long starting phase of engine work period (based on applied current);

      In order to find “abnormal” starts was proposed method based on Nelson rules for detection “out-of-control” conditions. For this purposes, were analyzed all 43961 engine starts of 10 mills, from 2010 til 2018.

    2. types of engines work periods and starting phase;

      Trying to clustering the quality of the start phases I concluded, that is important to clusterize whole working periods because the start phase is a part working period and the difference between start phases is determined by the difference between work periods. It is better way to clusterize whole work period using start phase as one of the classifiers. CM1, CM2, CM3, CM4, CM10, CM11, CM12, RMA, RMB, RMC.

    3. a quantity of starts and time of engine work periods during carbon brush lifespan;
    4. a temperature of engine coils during engine work;
    5. average current during engine work;
    6. engine power;
    7. engine vibration;
    8. a temperature of engine bearings (front and rear);
    9. dust (particles) concentration in mill stack;
    10. what cement types and how long each type was produced by the mills.

2. Exploratory Data Analysis:
Exploratory Data Analysis was provided for all mills separately and contains three main stages:

  1. Reducing the number of variables (using principal component analysis);
  2. Finding crosscorrelation between all dataset variables;
  3. Deep research of the pairs of variables (this step, meanwhile, is implemented particaly).


3. Modeling:
So far, the prediction model for carbon brushes replacement Cement Mill 1 engine is built. Is planned to build models for the rest mills.

Modeling based on the dataset which contains 47 variables and hundreds of observations for each carbon brush of each engine. Each observation represents the parameter measured or calculated from the beginning of each carbon brush usage until each date of revision this carbon brush by staff (isReplased 0/1 variable gives the information whether carbon brush was replaced during revision).
Variables: general duration of start phase, general duration of “long start” phase, a sum of unstable engine runs and runs with a long start phase-“longStart_unstableRun_Sum” are the product of a classification algorithm (description here).

Finally, a huge number of variables was reduced from 47 to 9, by using “information gain” criterion and substitution sets of variables by their “Principal Component”.

In order to compare different models, I choose to evaluete their acuracy and sensetivity parameters. Also, ROC curves are compared. Due to a relatively small number of observations, I’ll use the cross-validation method.
Seven algorithms were compared with “red line” (noModel)- a random decision of carbon brushes replacement.

##                                 model sensitivity  accuracy
## 1                  Random Forest (RF)   0.8102190 0.7054313
## 2                  Decision Tree (DT)   0.7299270 0.6747604
## 3     Artificial Neural Network (ANN)   0.7080292 0.6594249
## 4                    Naive Bayes (NB)   0.6861314 0.6511182
## 5           Logistic Regression (GLM)   0.6715328 0.6351438
## 6 Mixture Discriminant Analysis (MDA)   0.7007299 0.5987220
## 7       Support Vector Machines (SVM)   0.9562044 0.2817891
## 8            K-Nearest Neighbor (KNN)   0.6277372 0.3169329
## 9                            No Model   0.1386861 0.8453674

When it was obvious that the best performance provides by Decision Tree, was tried an ensemble method for DT - Random Forest. Finally, the RF model was used for prediction.
To illustrate the prediction of the Random Forest model, results of carbon brushes revision on December 13, 2011, was taken and compared with prediction. The confusion matrix of the prediction:

##           randomForest.Prediction
##            predicted 0 predicted 1
##   actual 0          12           8
##   actual 1           2           8
## [1] "prediction sensitivity is 80%"
## [1] "prediction accuracy is 67%"

Comments about prediction are here.


4. From research to production (recomendations).


In order to implement the prediction model for trying period, with minimal investment, using existing factory software infrastructure, I would propose the following schema:

5. Preliminary conclusions and facts (for future consideration).

During the project, some intermediate conclusions were made and some fact were collected. These conclusions and facts need further research. I suppose it might be useful to put them in a separate section for future consideration.