1 Data to insight to decision


1.1 1.1 Business understanding

Here we take an example of a business that uses many machines to build their final products. As the supply chain is stopped every-time a machine break, the manager asked a consulting firm to build a predictive model that finds which machine is going to break next and why.




1.2 1.2 Analytical Solution

The solution will predict which machine will break next and propose some reasons about why these machines have different lifetimes. A survival analysis is a good choice as we can visualise the machine’s lifetime very easily.




1.3 1.3 Analytical Solution

The business has data on each machine for the last pasts years and they are able to provide some more information such as the team that used it and its provider. These information seems sufficient to start an analysis.




1.4 1.3 Analytics base table

# Load our dataset
maintenance = read.table('maintenance_data.csv',sep=';',header=TRUE)

# Set the variable's types
maintenance$lifetime <- as.numeric(maintenance$lifetime)
headmaintenance <- head(maintenance, n=20)
library(DT)
## Warning: package 'DT' was built under R version 3.5.2
datatable(headmaintenance, options = list(pageLength = 5))



2 2 Data exploration

2.1 2.1 Data quality report

As we summarise the data, we can see that the business have used 1000 machines. Machine have an average lifetime of 55 weeks, with some brand new machines and others that are running since almost two years. In our dataset almost 40 % of the machines have being broken in the past two years.

# Librairy
library(ggplot2)
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggvis)
## Warning: package 'ggvis' was built under R version 3.5.2
## 
## Attaching package: 'ggvis'
## The following object is masked from 'package:ggplot2':
## 
##     resolution
library(corrplot)
## Warning: package 'corrplot' was built under R version 3.5.2
## corrplot 0.84 loaded
# display the summary
summary(maintenance)
##     lifetime        broken       pressureInd      moistureInd    
##  Min.   : 1.0   Min.   :0.000   Min.   : 33.48   Min.   : 58.55  
##  1st Qu.:34.0   1st Qu.:0.000   1st Qu.: 85.56   1st Qu.: 92.77  
##  Median :60.0   Median :0.000   Median : 97.22   Median : 99.43  
##  Mean   :55.2   Mean   :0.397   Mean   : 98.60   Mean   : 99.38  
##  3rd Qu.:80.0   3rd Qu.:1.000   3rd Qu.:112.25   3rd Qu.:106.12  
##  Max.   :93.0   Max.   :1.000   Max.   :173.28   Max.   :128.60  
##  temperatureInd      team          provider  
##  Min.   : 42.28   TeamA:336   Provider1:254  
##  1st Qu.: 87.68   TeamB:356   Provider2:266  
##  Median :100.59   TeamC:308   Provider3:242  
##  Mean   :100.63               Provider4:238  
##  3rd Qu.:113.66                              
##  Max.   :172.54



2.2 2.2 First visualisations

The first graph show the statistical distribution of a machine’s lifetime. We can see that on average machines break after 80 weeks, but they can break from within 60 to 100 weeks. The second and third graphs show that for team C or provider 3 their machines tend to break several weeks before the others. Below we present the correlation between all the others variables that aren’t factors.

## When do machine breack on average?
maintenance_broken <- maintenance %>% filter(broken == 1)
par(mfrow=c(1,3))
boxplot(lifetime~broken,data=maintenance_broken, main="Borken machines", xlab="", ylab="Lifetime",col="#357EC7")
boxplot(lifetime~team,data=maintenance_broken, main="Per team", xlab="", ylab="",col="#357EC7")
boxplot(lifetime~provider,data=maintenance_broken, main="Per provider", xlab="", ylab="",col="#357EC7")

# Correlation matrix 
cor(maintenance_cor <- maintenance %>% select(lifetime:temperatureInd))
##                    lifetime      broken   pressureInd  moistureInd
## lifetime       1.0000000000  0.70265621  0.0009427588  0.001196246
## broken         0.7026562129  1.00000000 -0.0289424521 -0.019520365
## pressureInd    0.0009427588 -0.02894245  1.0000000000  0.020542548
## moistureInd    0.0011962464 -0.01952037  0.0205425477  1.000000000
## temperatureInd 0.0017444443  0.01536360  0.0036412524 -0.009842375
##                temperatureInd
## lifetime          0.001744444
## broken            0.015363604
## pressureInd       0.003641252
## moistureInd      -0.009842375
## temperatureInd    1.000000000



2.2.1 3 Survival analysis

As we have seen above, all variables tell us something about when a machine will break. We decide to use them all into our survival analysis model. We create a model using the gaussian method and use directly the maintenance dataset presented above. You can find here the coefficients for all our variables and just below see the survival plot and the summary of our survival model.

library(survival)
## Warning: package 'survival' was built under R version 3.5.2
# Choose the dependant variables to be used in the survival regression model.
dependantvars = Surv(maintenance$lifetime, maintenance$broken)

# Create model (use the gaussian method)
survreg = survreg(dependantvars~pressureInd+moistureInd+temperatureInd+team+provider, dist="gaussian",data=maintenance)
print(survreg)
## Call:
## survreg(formula = dependantvars ~ pressureInd + moistureInd + 
##     temperatureInd + team + provider, data = maintenance, dist = "gaussian")
## 
## Coefficients:
##       (Intercept)       pressureInd       moistureInd    temperatureInd 
##      8.035095e+01     -7.139623e-04      6.012531e-03     -1.042198e-02 
##         teamTeamB         teamTeamC providerProvider2 providerProvider3 
##     -5.669818e-02     -6.217791e+00      1.249442e+01     -1.438417e+01 
## providerProvider4 
##      7.919537e+00 
## 
## Scale= 0.4755612 
## 
## Loglik(model)= -270.1   Loglik(intercept only)= -1557
##  Chisq= 2573.75 on 8 degrees of freedom, p= <2e-16 
## n= 1000


3 Survival Graphs


3.1 3.1. Survival Plot

library(GGally)
## Warning: package 'GGally' was built under R version 3.5.2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
maintenance_graph <- survfit(Surv(lifetime,broken) ~ 1, data = maintenance)
ggsurv(maintenance_graph)
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).




3.2 3.2 Per Team

maintenance_graph2 <- survfit(Surv(lifetime,broken) ~ team, data = maintenance)
ggsurv(maintenance_graph2)



3.3 3.3 Per Providers

maintenance_graph3 <- survfit(Surv(lifetime,broken) ~ provider, data = maintenance)
ggsurv(maintenance_graph3)




4 3.4Summary

summary(survreg)
## 
## Call:
## survreg(formula = dependantvars ~ pressureInd + moistureInd + 
##     temperatureInd + team + provider, data = maintenance, dist = "gaussian")
##                       Value Std. Error       z      p
## (Intercept)        8.04e+01   2.94e-01  273.57 <2e-16
## pressureInd       -7.14e-04   1.22e-03   -0.59  0.557
## moistureInd        6.01e-03   2.40e-03    2.51  0.012
## temperatureInd    -1.04e-02   1.21e-03   -8.59 <2e-16
## teamTeamB         -5.67e-02   5.88e-02   -0.96  0.335
## teamTeamC         -6.22e+00   6.13e-02 -101.39 <2e-16
## providerProvider2  1.25e+01   6.67e-02  187.46 <2e-16
## providerProvider3 -1.44e+01   6.27e-02 -229.24 <2e-16
## providerProvider4  7.92e+00   7.06e-02  112.23 <2e-16
## Log(scale)        -7.43e-01   3.54e-02  -21.00 <2e-16
## 
## Scale= 0.476 
## 
## Gaussian distribution
## Loglik(model)= -270.1   Loglik(intercept only)= -1557
##  Chisq= 2573.75 on 8 degrees of freedom, p= 0 
## Number of Newton-Raphson Iterations: 12 
## n= 1000



4.1 4 Predictions and call to action

4.2 4.1 Predictions

Using our survival analysis, we can now predict which machine will break next and therefore prioritise the maintenance on these machines to avoid the supply chain to stop. Here we display the 20 machines that should be changed this month and we order them by their remaining lifetime.

# Predict
# p = percentile = 0,5 = expected median
Ebreak=predict(survreg, newdata=maintenance, type="quantile", p=.5)

# Make forecast
Forecast=data.frame(Ebreak)
Forecast$lifetime=maintenance$lifetime
Forecast$broken=maintenance$broken

# Computed Expected Remaining Lifetime (remainingLT)
Forecast$RemainingLT=Forecast$Ebreak-maintenance$lifetime

# Order the elements by Expected Remaining Lifetime
Forecast=Forecast[order(Forecast$RemainingLT),]

# Keep only those who are not broken yet
ActionsPriority=Forecast[Forecast$broken==0,]
ActionsPriorityDT <- head(ActionsPriority, n=20)
datatable(ActionsPriorityDT)

4.3 4.2 Call to Action

4.3.1 4.2.1 Automatization

The first action should be to change the machines with a remaining lifetime inferior to one week, and re-run our program every few days to change the next machine that will break. Here we have a data table that will be automatically emailed to the manager every week. Also, we attach the ID of the machine that has to be changed.

ActionsPriority$class <- cut(ActionsPriority$RemainingLT, c(-10,1,4,1000))
levels(ActionsPriority$class) <- c('Urgent', 'Medium', 'good')
summary(ActionsPriority)
##      Ebreak         lifetime         broken   RemainingLT     
##  Min.   :58.87   Min.   : 1.00   Min.   :0   Min.   :-0.1807  
##  1st Qu.:73.68   1st Qu.:21.00   1st Qu.:0   1st Qu.:21.8005  
##  Median :81.67   Median :40.00   Median :0   Median :39.4834  
##  Mean   :80.62   Mean   :40.11   Mean   :0   Mean   :40.5067  
##  3rd Qu.:87.84   3rd Qu.:57.50   3rd Qu.:0   3rd Qu.:58.2953  
##  Max.   :93.05   Max.   :90.00   Max.   :0   Max.   :91.2483  
##     class    
##  Urgent:  7  
##  Medium: 21  
##  good  :575  
##              
##              
## 

4.3.2 4.2.2 Management

The second action is to show these figures to team C and provider 3 and monitor the improvement. Now the management could focus on the enhancement of the machines’ lifetimes and see what are the best methods. Of course we should interview the workers as we could have new insights on how we could push the lifetime of each machine further.