Organizations today are increasing their use of advance analytics and predictive modeling. The processes of generating predictive models involve data preparation, checking of data quality, reduction, modelling, prediction, and analysis of results. Generating high-quality predictive models is a time consuming activity because of the tuning process in finding optimum model parameters and often required to redevelop, tweak or reuse the models in the future. Thus it is important to
It is very important to clearly define the goals based on business objective.
Businesses want to find answers all such important questions and make decisions based on data…
Industry | Response | Risk Mitigation | Attrition | Cross-Sell/Upsell | Net Present Value | Life Time Value |
---|---|---|---|---|---|---|
Retail | X | X | X | X | X | |
Banking | X | X | X | X | X | X |
Insurance | X | X | X | X | X | X |
Telecom | X | X | X | X | X | X |
Utilities | X | X | X | X | X | X |
Hospitality | X | X | X | X | X | |
Catalog | X | X | X | X | ||
Publishing | X | X | X | X | X |
Based on the business questions we want to answer, translate the business objective into Analytic terms
Selecting best data for target modeling requires thorough understanding of the market, business and the objective. The model is only as good and relevant as the underlying data:
Data Types
Data Type | Predictive Power | Stability | Cost |
---|---|---|---|
Demographic | Medium | High | Low |
Behavioural | High | Low | High |
Psychographic | Medium | Medium | High |
Sources of Data
Internal Sources | External Sources |
---|---|
Customer Data, Transaction Data | Survey Data, Research Data, Suppliers, Ratings |
Other History | Credit Bureau Data, Third Party data, Sellers, Compilers |
Typical data required for Target Marketing
Demographic Data | Behaviour Data | External Data |
---|---|---|
Customer Demographic | Transaction | Customer Survey |
Income | Loyalty | Market Research |
Purchasing Power | - | Macro Economic Factors |
- | - | Competitions |
In this step we need prepare data into right format for analysis and the tool you may want use.
Do initial cleaning up
Define Variables and Create Data Dictionary
Joining/Appending multiple datasets
Validate for correctness
Produce Basic Summary Reports
Once data is in right shape and perform
univariate analysis: to check the distribution of each of the variables and features
multivariate analyses: to check relationships with other variables and with dependent variables
Based on type of model you are going to use, you may need to transform the variables using one of the approaches
Bining approach: create distinct groups
Transformation:
Logarithmic, Polynomial
Square Root, Inverse, Square, boxCox
Extreme value (outlier) treatments
Missing Value Treatment
Dimension Reduction - Information Value(IV) and Weight of Evidence(WoE), Variable Clustering, PCA, Factor Analysis, etc.
Training Sample: Model will be developed on this sample. Typically 50%, 60%, 70% or 80% of the data goes here.
Test Sample: Model performances will be validated on this sample. Typically 50%, 40%, 30% or 20% of the data goes here
Based on the defined goal(s) (supervised or unsupervised) we have to select one of or combinations of modeling techniques. Such as
There are wide variety of choices available outside this list.
It is good practice to document the entire process and findings for audit requirements and future work. Ideally we should document the followings
Business Objectives and Goals
Data Sources and Data used
Type of Analysis performed: what, why, findings
Exclusions
Variable transformations
Business inputs
Methodology used, background, benefits
Alternative methodologies tried- if any
Model performance and Validation Results
Cut-Off Analysis
Recommendations
Pseudo codes for implementation
Key issues/challenges
Future Plans
Once model is developed and validated, we need to implement the model within in a system or run off-line on real business data. After implementation, need to perform User Acceptance Test(UAT)
to ensure model is implemented correctly and outcomes are sync with desired outcomes. Typical systems for model implementation include
Monitor performance of the models on a periodic basis
Frequency: Monthly, Quarterly, Half-Yearly etc.
Check for Population Stability
Check for Characteristic Stability
This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.