7-Steps Predictive Modeling Process

Why Standard Process?

Organizations today are increasing their use of advance analytics and predictive modeling. The processes of generating predictive models involve data preparation, checking of data quality, reduction, modelling, prediction, and analysis of results. Generating high-quality predictive models is a time consuming activity because of the tuning process in finding optimum model parameters and often required to redevelop, tweak or reuse the models in the future. Thus it is important to

For Whom?

Key Stake Holders

Step 1: Business Objective(s)

Step 1.1: Business Objectives - Asking Right Questions!

It is very important to clearly define the goals based on business objective.

Businesses want to find answers all such important questions and make decisions based on data…

Step 1.2: Business Objective(s) - Target Modeling Opportunities

Industry Response Risk Mitigation Attrition Cross-Sell/Upsell Net Present Value Life Time Value
Retail X X X X X
Banking X X X X X X
Insurance X X X X X X
Telecom X X X X X X
Utilities X X X X X X
Hospitality X X X X X
Catalog X X X X
Publishing X X X X X

Step 2: Define Goals - translate business objective into analytics goal

Based on the business questions we want to answer, translate the business objective into Analytic terms

Step 3: Selecting Data for Modeling

Selecting best data for target modeling requires thorough understanding of the market, business and the objective. The model is only as good and relevant as the underlying data:

Data Types

Data Type Predictive Power Stability Cost
Demographic Medium High Low
Behavioural High Low High
Psychographic Medium Medium High

Sources of Data

Internal Sources External Sources
Customer Data, Transaction Data Survey Data, Research Data, Suppliers, Ratings
Other History Credit Bureau Data, Third Party data, Sellers, Compilers

Step3.1: A Case Study - Target Marketing

Typical data required for Target Marketing

Demographic Data Behaviour Data External Data
Customer Demographic Transaction Customer Survey
Income Loyalty Market Research
Purchasing Power - Macro Economic Factors
- - Competitions

Step 4: Prepare Data

  1. In this step we need prepare data into right format for analysis and the tool you may want use.

  2. Do initial cleaning up

  3. Define Variables and Create Data Dictionary

  4. Joining/Appending multiple datasets

  5. Validate for correctness

  6. Produce Basic Summary Reports

Step 5: Analyze and Transform Variables

Once data is in right shape and perform

Based on type of model you are going to use, you may need to transform the variables using one of the approaches

  1. Bining approach: create distinct groups

  2. Transformation:

  1. Extreme value (outlier) treatments

  2. Missing Value Treatment

  3. Dimension Reduction - Information Value(IV) and Weight of Evidence(WoE), Variable Clustering, PCA, Factor Analysis, etc.

Step 5.1: Random Sampling (Train and Test)

Step 6.1: Model Selection

Based on the defined goal(s) (supervised or unsupervised) we have to select one of or combinations of modeling techniques. Such as

There are wide variety of choices available outside this list.

Step 6.2: Build/Develop/Train Models

Step 7: Validate/Test Models

Document Methodology and Models

It is good practice to document the entire process and findings for audit requirements and future work. Ideally we should document the followings

Implement Models

Once model is developed and validated, we need to implement the model within in a system or run off-line on real business data. After implementation, need to perform User Acceptance Test(UAT) to ensure model is implemented correctly and outcomes are sync with desired outcomes. Typical systems for model implementation include

Model Management: Monitoring and Performance Tracking

R Markdown

This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.