Purpose

Build a model to predict if an incident will be resolved within the SLA time period.

Training set

Training set include closed tickets from 2017 & 2018.

Data exploration

Visualize and explore the data to highlight it’s main characteristics of distribution, variation, and relationships, and pinpoint data quality issues.
Incident represents an IT service task and “ticket resolution time task” is the net time from ticket creation until resolution
(does not include weekend, pending,after business time, etc.).

For modeling purposes, we are going to use the “total_resolution_time”, the gross time from ticket creation until resolution
(including weekend, pending, after business time, etc.).
The reason for not using the net time is that we are not aware to how it is calculated, and more importantly the data which was used for the calculation is not available.

The meet SLA output will be calculated according to 24 hours threshold.
In general this is a sound approach, because our purpose is to identify the underlining factors that determine the variability is the resolution time.

Resolution time distribution

Plot #1 - Shows the complete distribution, what looks unusual is the longtail composed from tickets that took more then 5 days to resolve.

Plot #2 - Shows the distribution of tickets that took less than 5 days.
The distribution appears to be made up of two or more individual distributions (i.e. the distribution has multiple peaks or modes).

Analyzing the multimodal distribuation

Analyzing the longtail

We see that in 2018 there were more longtail incidents on monthly basis, except for Nov-18.

Looking at the distribution based on the top 20 subcategory reveal that Application and software category dominate the longtail.

SLA rate

64% of the incidents were resolved within 24 hours.

Analysis approach

The major factors influencing the variability of the resolution time, can be grouped as follow.

Complexity

Infer about the incident complexity using text analytics. The short description was processed, and two features were engineered in addition to the number of words in description. In the first we try to assess the topic of the incident, or what it is all about. The second tries to assess how difficult it is.

The plot shows the most frequent words across all incident descriptions.

The second plot shows the sentiment analysis that tag positive and negative words.

Difficulty

Based on negative sentiment tagging, we are able to infer about how difficult is the incident at hand.

Case type

Case types with 1K incidents and above, don’t vary much. Bug cases are exception. It means that case_type is not going to be an strong predictor.

Short Description - Number of words

Longer description indicate complex problem, that takes more time to resolve.

The assumption is supported by the data.

We can clearly see a pattern, the median is growing incrementally, as the number of words increases, but there are many outliers.

Subcategory

Subcategory w/confidance interval

For each subcategory, the SLA Rate is calculated and a 95% confidence interval is shown to help understand what the noise is around this value. We can clearly see which subcategories deviate from randomness and the width of the error bars help the reader understand how much each number should be trusted.

Group type

Service desk teams have the highest sla rate around 76%, next are the support teams with sla of 64% and last are the application teams with 55%. On the right side of the plot we can see the median resolution time per group type, the results are according to the expectations.

Originator Group

Originator group is the first group that handled the ticket, and in many cases the one that resolved the ticket.

Region of assignment group

Urgency

Contact type

Temporal

If the ticket was opened on Friday or Saturday it has lower chances to be resolved on time.

Data Spliting

Random split of the training data based on 10/90 percent split, 10% goes for testing.

Feature Enginnering

The features that goes into the model:

Model Training - Decision Tree

Test Performance

Accuracy of 74.84% compare of 64.17% of majority vote model. which is a 16% improvement. the model has hard time to predict correctly incidents that will not meet the SLA. In an attempt to improve the learning process let’s use random forest which is a bagging (Bootstrap Aggregation) Decision Tree Ensemble.

Feature Importance

Decistion Tree Plot

Model Training - Random Forest

Confusion Matrix and Statistics

          Reference
Prediction   No  Yes
       No   790  342
       Yes  409 1805
                                         
               Accuracy : 0.7756         
                 95% CI : (0.761, 0.7896)
    No Information Rate : 0.6417         
    P-Value [Acc > NIR] : < 2e-16        
                                         
                  Kappa : 0.5058         
 Mcnemar's Test P-Value : 0.01602        
                                         
            Sensitivity : 0.8407         
            Specificity : 0.6589         
         Pos Pred Value : 0.8153         
         Neg Pred Value : 0.6979         
             Prevalence : 0.6417         
         Detection Rate : 0.5395         
   Detection Prevalence : 0.6617         
      Balanced Accuracy : 0.7498         
                                         
       'Positive' Class : Yes            
                                         

Using the RandomForest model we were able to increase the accuracy to 77.56% on the test set compare to 74.84% for a single decision tree.

More importantly the RF model is doing a better job in predicting the incidents that will not meet the SLA.

