COD风控模型

数据范围

特征纬度包括两个方面:1.订单信息 2. 用户信息

  • 物流状态数据:logistics_workflow_union
  • 所以订单数据:sale_order_union
  • 用户信息:wp_cid_collect_union
  • 衍生字段:用户过去的购买情况,比如过去的签收率

时间范围

  1. 用户信息的构造范围是2018年10月1�号之前
  2. 使用2018年10月1号到2018年10月31号的订单数据作为训练

数据

字段含义如下: - 订单名字 - 订单创建时间 - 订单更新时间 - 物流状态 - 用户的id - 用户的州 - 下单总额 - 客户端版本 - 客户端型号 - 手机设备 - 性别 - 下过多少单 - 成功支付过多少订单 - 成功支付过多少COD订单 - 签收过多少COD订单 - COD 订单的签收率 - 下单的总金额 - COD成功支付订单的总金额 - COD签收订单的总金额

一共35w条数据,坏样本比例大约为 24%

数据分析

下面是挑出一些比较有区分能力的特征的分析结果:

  1. 不同的州
count count_distr good bad badprob woe bin_iv total_iv
56558 0.1563767 47417 9141 0.1616217 -0.4893475 0.0326334 0.1031834
74501 0.2059871 59091 15410 0.2068429 -0.1871984 0.0068634 0.1031834
133546 0.3692400 102691 30855 0.2310440 -0.0455621 0.0007574 0.1031834
75636 0.2091252 53330 22306 0.2949125 0.2852202 0.0182460 0.1031834
21437 0.0592710 12622 8815 0.4112049 0.7978770 0.0446833 0.1031834

坏样本比较低的州包括:

## [1] "punjab,up,telangana,ponducherry,jalandhar,maharastra,haryana,firozpur,Puducherry,Daman and Diu,Goa,Meghalaya,Manipur,Mizoram,Chandigarh,Kerala,Delhi"
## [1] "一共 17 个州"

坏样本比较高的州包括:

## [1] "Sikkim,Assam,Tripura,rajsthan,dadra and nagar haveli"
## [1] "一共 5 个州"

建立模型并评估

## 
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE,     print.thres = TRUE, print.auc = TRUE)
## 
## Data: tmp[, 1] in 55030 controls (tmp[, 2] 0) < 17305 cases (tmp[, 2] 1).
## Area under the curve: 0.6645

直接构建决策树,获取规则

## 
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label,     predictor = treepre[, 2], plot = TRUE, print.thres = TRUE,     print.auc = TRUE)
## 
## Data: treepre[, 2] in 55030 controls ((testtree %>% filter(gender != "U"))$label 0) < 17305 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.6149
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Tue Jan 15 14:31:26 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 289343 cases (15 attributes) from undefined.data
## 
## Rules:
## 
## Rule 1: (1508/127, lift 1.2)
##  shipping_state = Goa
##  gender = F
##  ->  class 0  [0.915]
## 
## Rule 2: (1821/214, lift 1.2)
##  shipping_state = Manipur
##  gender = F
##  ->  class 0  [0.882]
## 
## Rule 3: (3552/506, lift 1.1)
##  client_version = 5.0.1
##  order_paysuccess > 2
##  ->  class 0  [0.857]
## 
## Rule 4: (27871/4941, lift 1.1)
##  client_os = iOS
##  ->  class 0  [0.823]
## 
## Rule 5: (62045/11085, lift 1.1)
##  client_version = 5.0.3
##  ->  class 0  [0.821]
## 
## Rule 6: (141329/25424, lift 1.1)
##  gender = F
##  ->  class 0  [0.820]
## 
## Rule 7: (70297/12667, lift 1.1)
##  sig_rate > 0.4635762
##  ->  class 0  [0.820]
## 
## Rule 8: (6615/1340, lift 1.0)
##  client_version = 4.8.6
##  ->  class 0  [0.797]
## 
## Rule 9: (31857/6517, lift 1.0)
##  client_version = 4.9.2
##  ->  class 0  [0.795]
## 
## Rule 10: (91266/19576, lift 1.0)
##  amount_total <= 4.43
##  ->  class 0  [0.785]
## 
## Rule 11: (11066/2501, lift 1.0)
##  client_version = 4.8.5
##  ->  class 0  [0.774]
## 
## Rule 12: (207727/51301, lift 1.0)
##  cod_paysuccess <= 0
##  ->  class 0  [0.753]
## 
## Rule 13: (9/1, lift 3.4)
##  client_version = 4.2.2
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.818]
## 
## Rule 14: (7/1, lift 3.3)
##  client_version = 3.6.2
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.6832298
##  ->  class 1  [0.778]
## 
## Rule 15: (25/6, lift 3.1)
##  shipping_state = Delhi
##  gender = F
##  cod_paysuccess > 1
##  cod_paysuccess <= 7
##  sig_rate <= 0.4635762
##  cod_paysuccess_amount > 35.4
##  ->  class 1  [0.741]
## 
## Rule 16: (31/9, lift 2.9)
##  shipping_state = Odisha
##  gender = F
##  cod_paysuccess > 1
##  cod_paysuccess <= 7
##  sig_rate <= 0.4635762
##  ->  class 1  [0.697]
## 
## Rule 17: (11/3, lift 2.9)
##  shipping_state = Chhattisgarh
##  gender = F
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.692]
## 
## Rule 18: (108/35, lift 2.8)
##  gender = F
##  cod_paysuccess > 7
##  sig_rate <= 0.4635762
##  ->  class 1  [0.673]
## 
## Rule 19: (168/58, lift 2.7)
##  shipping_state = Karnataka
##  cod_paysuccess > 1
##  cod_delivered <= 0
##  ->  class 1  [0.653]
## 
## Rule 20: (101/35, lift 2.7)
##  gender = M
##  cod_paysuccess > 9
##  sig_rate <= 0.6832298
##  ->  class 1  [0.650]
## 
## Rule 21: (79/28, lift 2.7)
##  shipping_state = Punjab
##  cod_paysuccess > 1
##  cod_paysuccess <= 7
##  sig_rate <= 0.4635762
##  ->  class 1  [0.642]
## 
## Rule 22: (64/25, lift 2.5)
##  shipping_state = Assam
##  gender = F
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.606]
## 
## Rule 23: (1121/445, lift 2.5)
##  amount_total > 4
##  client_version = 4.7.7
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.603]
## 
## Rule 24: (25/10, lift 2.5)
##  client_version = 4.3.3
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.593]
## 
## Rule 25: (169/70, lift 2.4)
##  client_os = android
##  client_version = 5.0.1
##  gender = M
##  order_paysuccess <= 2
##  cod_paysuccess > 0
##  sig_rate <= 0.6832298
##  ->  class 1  [0.585]
## 
## Rule 26: (599/253, lift 2.4)
##  client_version = 4.6.7
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.577]
## 
## Rule 27: (2524/1134, lift 2.3)
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.551]
## 
## Rule 28: (85/39, lift 2.3)
##  client_version = 4.7.6
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.540]
## 
## Rule 29: (10396/5519, lift 2.0)
##  client_os = android
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.469]
## 
## Rule 30: (147702/103984, lift 1.2)
##  gender = M
##  ->  class 1  [0.296]
## 
## Default class: 0
## 
## 
## Evaluation on training data (289343 cases):
## 
##          Rules     
##    ----------------
##      No      Errors
## 
##      30 68478(23.7%)   <<
## 
## 
##      (a)    (b)    <-classified as
##    -----  -----
##   218513   1608    (a): class 0
##    66870   2352    (b): class 1
## 
## 
##  Attribute usage:
## 
##   99.89% gender
##   75.51% cod_paysuccess
##   40.49% client_version
##   31.90% amount_total
##   27.97% sig_rate
##   13.24% client_os
##    1.29% order_paysuccess
##    1.28% shipping_state
##    0.06% cod_delivered
##    0.01% cod_paysuccess_amount
## 
## 
## Time: 7.3 secs

对老客户是否会拒收进行建模

## 
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE,     print.thres = TRUE, print.auc = TRUE)
## 
## Data: tmp[, 1] in 21092 controls (tmp[, 2] 0) < 5986 cases (tmp[, 2] 1).
## Area under the curve: 0.6954

构建决策树

## 
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label,     predictor = treepre[, 2], plot = TRUE, print.thres = TRUE,     print.auc = TRUE)
## 
## Data: treepre[, 2] in 21092 controls ((testtree %>% filter(gender != "U"))$label 0) < 5986 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.6041
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Tue Jan 15 14:31:49 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 108317 cases (15 attributes) from undefined.data
## 
## Rules:
## 
## Rule 1: (37288/7596, lift 1.0)
##  cod_paysuccess > 1
##  ->  class 0  [0.796]
## 
## Rule 2: (71029/16349, lift 1.0)
##  cod_paysuccess <= 1
##  ->  class 0  [0.770]
## 
## Rule 3: (4, lift 3.8)
##  client_version = 4.0.3
##  gender = F
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.833]
## 
## Rule 4: (4, lift 3.8)
##  client_version = 4.2.2
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.833]
## 
## Rule 5: (8/1, lift 3.6)
##  client_version = 4.1.1
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.800]
## 
## Rule 6: (3, lift 3.6)
##  client_version = 4.4.0
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.800]
## 
## Rule 7: (23/5, lift 3.4)
##  client_version = 4.6.5
##  gender = M
##  cod_paysuccess > 0
##  cod_paysuccess <= 1
##  sig_rate <= 0.7017544
##  ->  class 1  [0.760]
## 
## Rule 8: (6/1, lift 3.4)
##  client_version = 4.8.2
##  gender = F
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.750]
## 
## Rule 9: (13/3, lift 3.3)
##  client_version = 4.3.4
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.733]
## 
## Rule 10: (9/2, lift 3.3)
##  client_version = 4.6.1
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.727]
## 
## Rule 11: (45/12, lift 3.3)
##  amount_total > 9.63
##  client_version = 4.5.4
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.723]
## 
## Rule 12: (8/2, lift 3.2)
##  client_version = 3.6.2
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.700]
## 
## Rule 13: (21/6, lift 3.1)
##  client_version = 4.3.3
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.696]
## 
## Rule 14: (4/1, lift 3.0)
##  client_version = 4.7.4
##  sig_rate <= 0.4635762
##  ->  class 1  [0.667]
## 
## Rule 15: (220/77, lift 2.9)
##  client_version = 4.6.7
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.649]
## 
## Rule 16: (1007/356, lift 2.9)
##  amount_total > 4.43
##  gender = M
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.646]
## 
## Rule 17: (81/29, lift 2.9)
##  client_version = 4.4.5
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.639]
## 
## Rule 18: (119/45, lift 2.8)
##  client_version = 4.3.5
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.620]
## 
## Rule 19: (471/190, lift 2.7)
##  client_version = 4.7.7
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.596]
## 
## Rule 20: (1126/459, lift 2.7)
##  amount_total > 4
##  client_version = 4.7.7
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.592]
## 
## Rule 21: (596/251, lift 2.6)
##  client_version = 4.6.7
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.579]
## 
## Rule 22: (304/128, lift 2.6)
##  client_version = 4.9.2
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.578]
## 
## Rule 23: (72/31, lift 2.6)
##  client_version = 4.7.6
##  gender = M
##  cod_paysuccess > 0
##  cod_paysuccess <= 1
##  sig_rate <= 0.7017544
##  ->  class 1  [0.568]
## 
## Rule 24: (108/47, lift 2.5)
##  client_os = android
##  client_version = 5.0.1
##  gender = M
##  cod_paysuccess > 0
##  cod_paysuccess <= 1
##  sig_rate <= 0.7017544
##  ->  class 1  [0.564]
## 
## Rule 25: (14/6, lift 2.5)
##  client_version = missing
##  gender = M
##  sig_rate <= 0.7017544
##  ->  class 1  [0.563]
## 
## Rule 26: (57/25, lift 2.5)
##  client_version = 4.7.3
##  gender = M
##  cod_paysuccess > 0
##  cod_paysuccess <= 1
##  sig_rate <= 0.7017544
##  ->  class 1  [0.559]
## 
## Rule 27: (45262/31677, lift 1.4)
##  sig_rate <= 0.7017544
##  ->  class 1  [0.300]
## 
## Default class: 0
## 
## 
## Evaluation on training data (108317 cases):
## 
##          Rules     
##    ----------------
##      No      Errors
## 
##      27 23263(21.5%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##   82919  1453    (a): class 0
##   21810  2135    (b): class 1
## 
## 
##  Attribute usage:
## 
##  100.00% cod_paysuccess
##   41.79% sig_rate
##    2.78% client_version
##    2.75% gender
##    1.84% amount_total
##    0.10% client_os
## 
## 
## Time: 2.2 secs

对新客户是否会拒收进行建模

Raw_data20181001old <- sample_dataset1 %>% select(-order_name,-create_time,
                                               -update_time,-user_id,- rn,
                                               - urn,-uid,-state) %>% filter(is.na(order_num))

Labelold <- as.numeric(as.factor(Raw_data20181001old$status))-1

# Label
# 0      1 
# 105464  29931  bad rate 0.2210643 

# woeold <- IV_WOE(Raw_data = Raw_data20181001old[,-1],Label = Labelold,Woe_t = T)

load("/Users/milin/COD\ 建模/woenew.RData")
woet <- woeold[[1]]  %>% select(-order_paysuccess,-cod_paysuccess,-cod_delivered,
                                -sig_rate,-allorder_amount,
                                -cod_paysuccess_amount,
                                -cod_delivered_amount)


woet$label <- as.factor(woet$label)
tmp <- SplitSample(Raw_data = woet[,-1],Label = woet$label,rate = 0.8)

train <- tmp[[1]]
test <- tmp[[2]]

l <- glm(label ~.,data = train, family = "binomial")

pre <- predict(l,newdata = test,type = 'response')

tmp <- data.frame(pre,test$label)

library("pROC")
roc(response = tmp[,2], predictor = tmp[,1],plot=TRUE, print.thres=TRUE, print.auc=TRUE)

## 
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE,     print.thres = TRUE, print.auc = TRUE)
## 
## Data: tmp[, 1] in 33937 controls (tmp[, 2] 0) < 11319 cases (tmp[, 2] 1).
## Area under the curve: 0.6521

对新客户进行构建决策树

## [1] "样本标签的比例为: 2.99821542158456"
## [1] "平衡之后的标签比例为: 1"
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## Rule-Based Model
## Number of samples: 90554 
## Number of predictors: 14 
## 
## Number of Rules: 80 
## 
## Non-standard options: minimum number of cases: 20

## 
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label,     predictor = treepre[, 2], plot = TRUE, print.thres = TRUE,     print.auc = TRUE)
## 
## Data: treepre[, 2] in 11319 controls ((testtree %>% filter(gender != "U"))$label 0) < 11319 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.6117
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Tue Jan 15 14:32:15 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 90554 cases (15 attributes) from undefined.data
## 
## Rules:
## 
## Rule 1: (8, lift 1.8)
##  client_version = 4.6.1
##  ->  class 0  [0.900]
## 
## Rule 2: (69/8, lift 1.7)
##  shipping_state = Goa
##  client_version = 5.0.3
##  ->  class 0  [0.873]
## 
## Rule 3: (4, lift 1.7)
##  client_version = 4.5.1
##  ->  class 0  [0.833]
## 
## Rule 4: (59/11, lift 1.6)
##  shipping_state = Puducherry
##  client_os = android
##  gender = F
##  ->  class 0  [0.803]
## 
## Rule 5: (45/10, lift 1.5)
##  shipping_state = Rajasthan
##  client_version = 4.6.7
##  gender = F
##  ->  class 0  [0.766]
## 
## Rule 6: (44/10, lift 1.5)
##  client_version = 5.0.3
##  gender = M
##  ipnum > 6
##  ->  class 0  [0.761]
## 
## Rule 7: (313/75, lift 1.5)
##  shipping_state = Goa
##  gender = F
##  ->  class 0  [0.759]
## 
## Rule 8: (2, lift 1.5)
##  client_version = 4.6.4
##  ->  class 0  [0.750]
## 
## Rule 9: (2, lift 1.5)
##  shipping_state = punjab
##  ->  class 0  [0.750]
## 
## Rule 10: (26/6, lift 1.5)
##  shipping_state = Andhra Pradesh
##  client_os = android
##  client_version = 5.0.1
##  ->  class 0  [0.750]
## 
## Rule 11: (106/27, lift 1.5)
##  shipping_state = Uttarakhand
##  client_version = 5.0.3
##  ->  class 0  [0.741]
## 
## Rule 12: (741/198, lift 1.5)
##  shipping_state = Delhi
##  client_version = 5.0.1
##  ->  class 0  [0.732]
## 
## Rule 13: (143/40, lift 1.4)
##  client_version = 4.9.1
##  gender = F
##  ->  class 0  [0.717]
## 
## Rule 14: (1854/535, lift 1.4)
##  client_version = 5.0.1
##  gender = F
##  ->  class 0  [0.711]
## 
## Rule 15: (153/45, lift 1.4)
##  shipping_state = Kerala
##  client_version = 5.0.3
##  ->  class 0  [0.703]
## 
## Rule 16: (3018/898, lift 1.4)
##  shipping_state = Delhi
##  gender = F
##  ->  class 0  [0.702]
## 
## Rule 17: (437/130, lift 1.4)
##  shipping_state = Meghalaya
##  client_os = android
##  gender = F
##  ->  class 0  [0.702]
## 
## Rule 18: (266/79, lift 1.4)
##  shipping_state = West Bengal
##  amount_total <= 8.88
##  client_version = 5.0.3
##  ipnum <= 6
##  ->  class 0  [0.701]
## 
## Rule 19: (61/18, lift 1.4)
##  shipping_state = Karnataka
##  client_os = android
##  client_version = 4.8.3
##  gender = F
##  ipnum > 2
##  ->  class 0  [0.698]
## 
## Rule 20: (50/15, lift 1.4)
##  client_version = 5.0.0
##  gender = F
##  ->  class 0  [0.692]
## 
## Rule 21: (1755/541, lift 1.4)
##  shipping_state = Delhi
##  client_version = 5.0.3
##  ->  class 0  [0.692]
## 
## Rule 22: (6182/1958, lift 1.4)
##  client_version = 5.0.3
##  gender = F
##  ->  class 0  [0.683]
## 
## Rule 23: (958/312, lift 1.3)
##  amount_total <= 1.97
##  ->  class 0  [0.674]
## 
## Rule 24: (280/92, lift 1.3)
##  shipping_state = Maharashtra
##  client_version = 5.0.1
##  ipnum > 1
##  ->  class 0  [0.670]
## 
## Rule 25: (316/104, lift 1.3)
##  shipping_state = Andhra Pradesh
##  client_version = 5.0.3
##  ->  class 0  [0.670]
## 
## Rule 26: (934/312, lift 1.3)
##  shipping_state = Delhi
##  client_version = 4.9.2
##  ->  class 0  [0.666]
## 
## Rule 27: (66/22, lift 1.3)
##  shipping_state = Haryana
##  client_version = 4.9.2
##  ipnum > 1
##  ->  class 0  [0.662]
## 
## Rule 28: (406/139, lift 1.3)
##  shipping_state = Manipur
##  ->  class 0  [0.657]
## 
## Rule 29: (123/42, lift 1.3)
##  shipping_state = Chhattisgarh
##  amount_total <= 9.29
##  client_version = 4.7.7
##  gender = F
##  ->  class 0  [0.656]
## 
## Rule 30: (113/39, lift 1.3)
##  shipping_state = West Bengal
##  client_version = 4.8.3
##  gender = F
##  ipnum > 1
##  ->  class 0  [0.652]
## 
## Rule 31: (216/75, lift 1.3)
##  shipping_state = Punjab
##  amount_total <= 4.48
##  client_version = 5.0.3
##  ->  class 0  [0.651]
## 
## Rule 32: (4472/1572, lift 1.3)
##  shipping_state = Maharashtra
##  gender = F
##  ->  class 0  [0.648]
## 
## Rule 33: (1157/409, lift 1.3)
##  shipping_state = Kerala
##  gender = F
##  ->  class 0  [0.646]
## 
## Rule 34: (823/291, lift 1.3)
##  client_version = 4.8.6
##  gender = F
##  ->  class 0  [0.646]
## 
## Rule 35: (705/252, lift 1.3)
##  shipping_state = Tamil Nadu
##  client_version = 5.0.3
##  ipnum <= 3
##  ->  class 0  [0.642]
## 
## Rule 36: (273/98, lift 1.3)
##  shipping_state = Chandigarh
##  client_os = android
##  gender = F
##  ->  class 0  [0.640]
## 
## Rule 37: (625/226, lift 1.3)
##  shipping_state = Uttarakhand
##  gender = F
##  ->  class 0  [0.638]
## 
## Rule 38: (184/67, lift 1.3)
##  shipping_state = Andhra Pradesh
##  client_version = 4.9.2
##  ->  class 0  [0.634]
## 
## Rule 39: (3528/1302, lift 1.3)
##  client_version = 4.9.2
##  gender = F
##  ->  class 0  [0.631]
## 
## Rule 40: (2070/774, lift 1.3)
##  shipping_state = Tamil Nadu
##  gender = F
##  ->  class 0  [0.626]
## 
## Rule 41: (288/108, lift 1.2)
##  shipping_state = Mizoram
##  ->  class 0  [0.624]
## 
## Rule 42: (551/207, lift 1.2)
##  shipping_state = Tamil Nadu
##  client_version = 4.9.2
##  ->  class 0  [0.624]
## 
## Rule 43: (6616/2499, lift 1.2)
##  amount_total <= 5.58
##  gender = F
##  ipnum > 1
##  ->  class 0  [0.622]
## 
## Rule 44: (416/157, lift 1.2)
##  shipping_state = Delhi
##  client_version = 4.8.5
##  ->  class 0  [0.622]
## 
## Rule 45: (1323/501, lift 1.2)
##  client_version = 4.8.5
##  gender = F
##  ->  class 0  [0.621]
## 
## Rule 46: (10063/3840, lift 1.2)
##  amount_total <= 3.96
##  gender = F
##  ipnum <= 4
##  ->  class 0  [0.618]
## 
## Rule 47: (50/19, lift 1.2)
##  client_version = 4.0.3
##  ->  class 0  [0.615]
## 
## Rule 48: (1389/536, lift 1.2)
##  shipping_state = Punjab
##  client_os = android
##  gender = F
##  ->  class 0  [0.614]
## 
## Rule 49: (243/94, lift 1.2)
##  shipping_state = Tamil Nadu
##  client_version = 4.8.5
##  ->  class 0  [0.612]
## 
## Rule 50: (2396/930, lift 1.2)
##  shipping_state = Maharashtra
##  client_version = 5.0.3
##  ipnum <= 3
##  ->  class 0  [0.612]
## 
## Rule 51: (121/47, lift 1.2)
##  shipping_state = Rajasthan
##  client_version = 5.0.3
##  ipnum <= 1
##  ->  class 0  [0.610]
## 
## Rule 52: (7125/2831, lift 1.2)
##  client_os = iOS
##  ->  class 0  [0.603]
## 
## Rule 53: (737/293, lift 1.2)
##  client_version = 4.8.6
##  ipnum > 1
##  ->  class 0  [0.602]
## 
## Rule 54: (646/259, lift 1.2)
##  amount_total <= 17.46
##  client_version = 4.6.7
##  gender = F
##  ipnum <= 1
##  ->  class 0  [0.599]
## 
## Rule 55: (179/74, lift 1.2)
##  shipping_state = Punjab
##  client_version = 4.8.5
##  ->  class 0  [0.586]
## 
## Rule 56: (525/223, lift 1.1)
##  shipping_state = Himachal Pradesh
##  ->  class 0  [0.575]
## 
## Rule 57: (716/320, lift 1.1)
##  shipping_state = Madhya Pradesh
##  amount_total <= 11.19
##  ipnum <= 1
##  ->  class 0  [0.553]
## 
## Rule 58: (7439/3398, lift 1.1)
##  shipping_state = Gujarat
##  ->  class 0  [0.543]
## 
## Rule 59: (129/26, lift 1.6)
##  shipping_state = Tripura
##  client_version = 4.7.7
##  ->  class 1  [0.794]
## 
## Rule 60: (222/52, lift 1.5)
##  shipping_state = Tripura
##  amount_total > 3.96
##  ->  class 1  [0.763]
## 
## Rule 61: (2, lift 1.5)
##  client_version = 3.3.1
##  ->  class 1  [0.750]
## 
## Rule 62: (416/118, lift 1.4)
##  shipping_state = Sikkim
##  client_os = android
##  ipnum <= 6
##  ->  class 1  [0.715]
## 
## Rule 63: (418/120, lift 1.4)
##  shipping_state = Sikkim
##  client_os = android
##  ->  class 1  [0.712]
## 
## Rule 64: (56/17, lift 1.4)
##  shipping_state = Arunachal Pradesh
##  client_version = 4.9.2
##  ->  class 1  [0.690]
## 
## Rule 65: (54/17, lift 1.4)
##  shipping_state = Uttar Pradesh
##  client_version = 5.0.1
##  gender = M
##  ->  class 1  [0.679]
## 
## Rule 66: (482/155, lift 1.4)
##  shipping_state = Bihar
##  gender = M
##  ipnum <= 6
##  ->  class 1  [0.678]
## 
## Rule 67: (93/31, lift 1.3)
##  client_os = missing
##  ->  class 1  [0.663]
## 
## Rule 68: (21/7, lift 1.3)
##  client_version = 4.2.2
##  ->  class 1  [0.652]
## 
## Rule 69: (80/29, lift 1.3)
##  gender = missing
##  ->  class 1  [0.634]
## 
## Rule 70: (289/106, lift 1.3)
##  shipping_state = Punjab
##  amount_total > 4.48
##  gender = M
##  ipnum > 2
##  ->  class 1  [0.632]
## 
## Rule 71: (20388/7519, lift 1.3)
##  client_version = 4.7.7
##  gender = M
##  ->  class 1  [0.631]
## 
## Rule 72: (200/76, lift 1.2)
##  client_version = 4.8.1
##  gender = M
##  ->  class 1  [0.619]
## 
## Rule 73: (8291/3238, lift 1.2)
##  amount_total > 1.97
##  client_os = android
##  client_version = 4.8.3
##  gender = M
##  ->  class 1  [0.609]
## 
## Rule 74: (1245/490, lift 1.2)
##  client_version = 4.6.7
##  gender = M
##  ->  class 1  [0.606]
## 
## Rule 75: (630/253, lift 1.2)
##  shipping_state = Arunachal Pradesh
##  ->  class 1  [0.598]
## 
## Rule 76: (157/63, lift 1.2)
##  shipping_state = Nagaland
##  client_version = 4.7.7
##  ->  class 1  [0.597]
## 
## Rule 77: (861/352, lift 1.2)
##  client_version = 4.8.2
##  gender = M
##  ->  class 1  [0.591]
## 
## Rule 78: (387/160, lift 1.2)
##  client_version = 4.8.3
##  ipnum > 4
##  ->  class 1  [0.586]
## 
## Rule 79: (147/63, lift 1.1)
##  client_version = 4.6.5
##  ->  class 1  [0.570]
## 
## Rule 80: (83336/40952, lift 1.0)
##  client_os = android
##  ->  class 1  [0.509]
## 
## Default class: 0
## 
## 
## Evaluation on training data (90554 cases):
## 
##          Rules     
##    ----------------
##      No      Errors
## 
##      80 35744(39.5%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##   25242 20035    (a): class 0
##   15709 29568    (b): class 1
## 
## 
##  Attribute usage:
## 
##  100.00% client_os
##   65.10% gender
##   58.20% client_version
##   35.53% shipping_state
##   25.78% amount_total
##   21.21% ipnum
## 
## 
## Time: 2.2 secs

预测用户是否会取消

## 
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE,     print.thres = TRUE, print.auc = TRUE)
## 
## Data: tmp[, 1] in 1823 controls (tmp[, 2] 0) < 685 cases (tmp[, 2] 1).
## Area under the curve: 0.6744

构建决策树,尝试获得规则

## [1] "样本标签的比例为: 2.66005252407353"
## [1] "平衡之后的标签比例为: 1"
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## Rule-Based Model
## Number of samples: 5484 
## Number of predictors: 14 
## 
## Number of Rules: 9 
## 
## Non-standard options: minimum number of cases: 20

## 
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label,     predictor = treepre[, 2], plot = TRUE, print.thres = TRUE,     print.auc = TRUE)
## 
## Data: treepre[, 2] in 685 controls ((testtree %>% filter(gender != "U"))$label 0) < 685 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.5522
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Tue Jan 15 14:32:20 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 5484 cases (15 attributes) from undefined.data
## 
## Rules:
## 
## Rule 1: (28/5, lift 1.6)
##  shipping_state = Goa
##  ->  class 0  [0.800]
## 
## Rule 2: (1189/430, lift 1.3)
##  allorder_amount > 5.29
##  ->  class 0  [0.638]
## 
## Rule 3: (77/29, lift 1.2)
##  shipping_state = Delhi
##  client_version = 4.8.3
##  ->  class 0  [0.620]
## 
## Rule 4: (273/112, lift 1.2)
##  client_version = 5.0.1
##  ->  class 0  [0.589]
## 
## Rule 5: (1038/441, lift 1.1)
##  client_version = 5.0.3
##  ->  class 0  [0.575]
## 
## Rule 6: (606/273, lift 1.1)
##  client_version = 4.9.2
##  ->  class 0  [0.549]
## 
## Rule 7: (283/105, lift 1.3)
##  shipping_state = Telangana
##  allorder_amount <= 5.29
##  ->  class 1  [0.628]
## 
## Rule 8: (390/161, lift 1.2)
##  shipping_state = Gujarat
##  allorder_amount <= 5.29
##  ->  class 1  [0.587]
## 
## Rule 9: (4295/1983, lift 1.1)
##  allorder_amount <= 5.29
##  ->  class 1  [0.538]
## 
## Default class: 0
## 
## 
## Evaluation on training data (5484 cases):
## 
##          Rules     
##    ----------------
##      No      Errors
## 
##       9 2284(41.6%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##    1414  1328    (a): class 0
##     956  1786    (b): class 1
## 
## 
##  Attribute usage:
## 
##  100.00% allorder_amount
##   36.36% client_version
##   14.19% shipping_state
## 
## 
## Time: 0.0 secs