Mobile Network Traffic Prediction Use Case 1

14 May, 2020


Network Development Analytics (NDA) Team

Network Intelligence and Performance Management (NIPM)

Context

  • NIPM is the Telekom Malaysia department delivering management and technical team with analysis and Webe is the Mobility Centers of Excellence of the TM Group.
  • Webe technicians and engineers are requesting mobile network traffic prediction to assist in their technical operation, locate congested cells for improvement, and evaluate factors affecting mobile traffic utilization.
  • NIPM requires the construction of a model of analytics based on sample data. The task in analytics is to predict the throughput (download/upload) of Webe network traffic.
  • The future benefits of this project are being able to serve as a constructive approach to network management and planning and also to execute Smart Capex for Webe.
  • Any project results will be informed to the TM management and Webe team for further review and verification.

Needs

  • How to evaluate the most appropriate way of assessing network traffic based on the raw data that consists of:-
    • Timestamp info
    • Node Name
    • Cell Name
    • Average throughput download/upload per-cell (Mbps)
    • Average traffic volume download/upload per-cell (GB)
    • Other extra Info
  • Mapping method - ( Every Node name can have one or more Cell Names )
  • The development will be in R language.

Vision

  • Any project results will be informed to TM management and Webe team for further review and verification.
  • For each particular node / cell , we plan to to predict the future trends in network throughput ( upload and download ).
  • Regularly, the model should be improved, and some acceptable accuracy should be set.
  • If the accuracy is below acceptable threshold the model should be revalidated. (i.e accuracy more than 75%)
  • Proposed using AutoML ( using H2o ) for the solution with multiple models will be built, and improved automation process.
  • The classifier must support both stationary and non-stationary data set for prediction (using Dickey-Fuller Test)

Methodology on Prediction and Classification Use Case (Manual Selection on Classifier)

    1. In FME Studio, the data processing method is used to filter out invalid records and other data manipulations.

    2. Testing for data stationery - alternatively can use both ARIMA or Distributed Random Forest. (Prediction Use Case)

    3. Clustering of the dataset into 3 separate groups (small, medium and large) by Node Name group (Total). (Classification and Prediction Use Case)

Methodology (Continue)

    4. Introducing lag function for dataset (Prediction Use Case)

    5. Testing dataset for stationarity - can use both ARIMA or Distributed Random Forest (DRF).

    6. Split technique 80:20 (testing/training) ratio and using 5-KFold Cross-Validation. (Classification Use Case)

    7. Classifier Selection (Classification and Prediction Use Case)

    8. Extracting the Variable Importance (Classification Use Case)

    9. Introducing Grid Search for optimization process (Classification and Prediction Use Case)

    10. Executing Prediction - Selected classifier or AutoML (Classification and Prediction Use Case)

Methodology (Continue)

    11. Performance Matrix Calculation (Classification and Prediction Use Case)

    12. Confusion Matrix Generation (Classification Use Case)

    13. Accuracy Calculation (Classification and Prediction Use Case)

    14. Introducing Automation - scheduled building the model iteratively (Classification and Prediction Use Case)

    15. Revalidation Analytics Model (Classification and Prediction Use Case)

Methodology on Prediction and Classification Use Case (AutoML Classifier)

    1. Executing AutoML for selected dataset. (Classification and Prediction Use Case)

    2. Save the model for each Cross-Validation. (Classification and Prediction Use Case)

    3. Select the best model from AutoML leaderboard (Classification and Prediction Use Case)

    4. Apllied the best model (Classification and Prediction Use Case)

    5. Executing Prediction using the best model (Classification and Prediction Use Case)

    6. Performance Matrix Calculation (Classification and Prediction Use Case)

Methodology (Continue)

    7. Confusion Matrix Generation (Classification Use Case)

    8. Accuracy Calculation (Classification and Prediction Use Case)

    9. Introducing Automation - scheduled building the model iteratively (Classification and Prediction Use Case)

    10. Revalidation Analytics Model (Classification and Prediction Use Case)

Performance Indicators

  • The analytics model will be validated using a few metrics, including :-
    • Error rate = (average_dl_pdcp_layer_throughput_mbps - predicted)
    • Error percentage = ( error / average_dl_pdcp_layer_throughput_mbps * 100 ) , Variance achived , Standard Deviation , Mean Absolute Error (MAE)
    • Root Mean Square Error (RMSE) , Mean Absolute Percentage Error (MAPE) - prediction accuracy of a forecasting method
    • Mean Percentage Error (MPE) - forecasts of a model differ from actual values , Skewness rate - to the left or to the right , Kurtosis rate - measure of the tailedness
n mean var std mae rmse mape mpe skew kurtosis
88 -0.01906283 0.3401222 0.5832 0.37808 0.5801901 0.3778038 -0.2381319 -0.1065638 2.867807

Outcome

  • The outcome would help to increase stakeholder understanding of the situation surrounding network usage and predictive analytics.
  • It is possible to take constructive steps to enhance customer service by predicting network usage.
  • The current model is being built using the Distributed Random Forest (DRF) framework to boost prediction efficiency with multithreading process
  • For the current model, no Grid Search was involved for optimization (hyperparameter tuning) , only Cross-Validation with K-Fold (5) have been applied.

Example :

Predicted Download Throughput - CB0192_TM BUKIT ASA _011

Example :

Predicted Upload Throughput - CB0192_TM BUKIT ASA _011

Example :

Raw Dataset

Future Works

  • The proposed use of K-means ( Hartigan-Wong, Lloyd, Forgy, MacQueen ) for clustering is for the next use case (Cell Node Classification)
  • Embedding the clustering approach for each Cell Node is based on the GPS location (lat, long) as well as the aggregate throughput/volume.
  • The analytics model should be automated and not retained with the same classifier (Proposed using AutoML)
  • Embedding the hyperparameter tuning for model optimization and improving accuracy of predictions.
  • Next use case 2 - Webe Network Traffic (Utilization) Prediction.
  • Next use case 3 - Webe Network Traffic Cell Node Classification.

Reference






The End