COD风控模型

数据范围

特征纬度包括两个方面:1.订单信息 2. 用户信息

  • 物流状态数据:logistics_workflow_union
  • 所以订单数据:sale_order_union
  • 用户信息:wp_cid_collect_union
  • 衍生字段:用户过去的购买情况,比如过去的签收率

时间范围

  1. 用户信息的构造范围是2018年10月1号之前
  2. 使用2018年10月1号到2018年10月31号的订单数据作为训练

数据

字段含义如下:

  • 订单名字
  • 订单创建时间
  • 订单更新时间
  • 物流状态
  • 用户的id
  • 用户的州
  • 下单总额
  • 客户端版本
  • 客户端型号
  • 手机设备
  • 性别
  • 下过多少单
  • 成功支付过多少订单
  • 成功支付过多少COD订单
  • 签收过多少COD订单
  • COD 订单的签收率
  • 下单的总金额
  • COD成功支付订单的总金额
  • COD签收订单的总金额

一共35w条数据,坏样本比例大约为 24%

数据分析

下面是挑出一些比较有区分能力的特征的分析结果:

  1. 不同的州
count count_distr good bad badprob woe bin_iv total_iv
56558 0.1563767 47417 9141 0.1616217 -0.4893475 0.0326334 0.1031834
74501 0.2059871 59091 15410 0.2068429 -0.1871984 0.0068634 0.1031834
133546 0.3692400 102691 30855 0.2310440 -0.0455621 0.0007574 0.1031834
75636 0.2091252 53330 22306 0.2949125 0.2852202 0.0182460 0.1031834
21437 0.0592710 12622 8815 0.4112049 0.7978770 0.0446833 0.1031834

坏样本比较低的州包括:

## [1] "punjab,up,telangana,ponducherry,jalandhar,maharastra,haryana,firozpur,Puducherry,Daman and Diu,Goa,Meghalaya,Manipur,Mizoram,Chandigarh,Kerala,Delhi"
## [1] "一共 17 个州"

坏样本比较高的州包括:

## [1] "Sikkim,Assam,Tripura,rajsthan,dadra and nagar haveli"
## [1] "一共 5 个州"

建立模型并评估

## 
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE,     print.thres = TRUE, print.auc = TRUE)
## 
## Data: tmp[, 1] in 55030 controls (tmp[, 2] 0) < 17305 cases (tmp[, 2] 1).
## Area under the curve: 0.6685

直接构建决策树,获取规则

## 
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label,     predictor = treepre[, 2], plot = TRUE, print.thres = TRUE,     print.auc = TRUE)
## 
## Data: treepre[, 2] in 55030 controls ((testtree %>% filter(gender != "U"))$label 0) < 17305 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.5309
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Tue Jan 15 14:38:31 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 289343 cases (15 attributes) from undefined.data
## 
## Rules:
## 
## Rule 1: (20190/3222, lift 1.1)
##  client_version = 5.0.1
##  ->  class 0  [0.840]
## 
## Rule 2: (62026/11067, lift 1.1)
##  client_version = 5.0.3
##  ->  class 0  [0.822]
## 
## Rule 3: (1486/266, lift 1.1)
##  client_version = 4.9.1
##  ->  class 0  [0.821]
## 
## Rule 4: (141352/25494, lift 1.1)
##  gender = F
##  ->  class 0  [0.820]
## 
## Rule 5: (70553/12790, lift 1.1)
##  sig_rate > 0.4255319
##  ->  class 0  [0.819]
## 
## Rule 6: (6670/1331, lift 1.1)
##  client_version = 4.8.6
##  ->  class 0  [0.800]
## 
## Rule 7: (31673/6521, lift 1.0)
##  client_version = 4.9.2
##  ->  class 0  [0.794]
## 
## Rule 8: (90112/19260, lift 1.0)
##  amount_total <= 4.39
##  ->  class 0  [0.786]
## 
## Rule 9: (207541/51204, lift 1.0)
##  cod_paysuccess <= 0
##  ->  class 0  [0.753]
## 
## Rule 10: (19/4, lift 3.2)
##  amount_total > 4.39
##  client_version = 4.2.3
##  cod_paysuccess > 0
##  cod_paysuccess <= 10
##  sig_rate <= 0.5976096
##  ->  class 1  [0.762]
## 
## Rule 11: (69/18, lift 3.1)
##  gender = M
##  cod_paysuccess > 10
##  sig_rate <= 0.5976096
##  ->  class 1  [0.732]
## 
## Rule 12: (25/9, lift 2.6)
##  client_version = 4.3.3
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4255319
##  ->  class 1  [0.630]
## 
## Rule 13: (11/4, lift 2.6)
##  client_version = 4.0.3
##  cod_paysuccess > 0
##  sig_rate <= 0.4255319
##  ->  class 1  [0.615]
## 
## Rule 14: (102/40, lift 2.5)
##  client_version = 4.4.5
##  gender = M
##  cod_paysuccess > 0
##  cod_paysuccess <= 10
##  sig_rate <= 0.5976096
##  ->  class 1  [0.606]
## 
## Rule 15: (16243/9271, lift 1.8)
##  cod_paysuccess > 0
##  sig_rate <= 0.5976096
##  ->  class 1  [0.429]
## 
## Default class: 0
## 
## 
## Evaluation on training data (289343 cases):
## 
##          Rules     
##    ----------------
##      No      Errors
## 
##      15 68681(23.7%)   <<
## 
## 
##      (a)    (b)    <-classified as
##    -----  -----
##   218996   1125    (a): class 0
##    67556   1666    (b): class 1
## 
## 
##  Attribute usage:
## 
##   77.34% cod_paysuccess
##   48.92% gender
##   42.23% client_version
##   31.15% amount_total
##   28.27% sig_rate
## 
## 
## Time: 4.9 secs

对老客户是否会拒收进行建模

## 
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE,     print.thres = TRUE, print.auc = TRUE)
## 
## Data: tmp[, 1] in 21092 controls (tmp[, 2] 0) < 5986 cases (tmp[, 2] 1).
## Area under the curve: 0.693

构建决策树

## 
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label,     predictor = treepre[, 2], plot = TRUE, print.thres = TRUE,     print.auc = TRUE)
## 
## Data: treepre[, 2] in 21092 controls ((testtree %>% filter(gender != "U"))$label 0) < 5986 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.5767
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Tue Jan 15 14:38:50 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 108317 cases (15 attributes) from undefined.data
## 
## Rules:
## 
## Rule 1: (13570/2189, lift 1.1)
##  order_paysuccess > 1
##  cod_paysuccess <= 1
##  ->  class 0  [0.839]
## 
## Rule 2: (9950/1622, lift 1.1)
##  client_version = 5.0.1
##  ->  class 0  [0.837]
## 
## Rule 3: (11133/1995, lift 1.1)
##  amount_total <= 4.43
##  cod_paysuccess > 1
##  ->  class 0  [0.821]
## 
## Rule 4: (70325/12710, lift 1.1)
##  sig_rate > 0.4635762
##  ->  class 0  [0.819]
## 
## Rule 5: (71071/16357, lift 1.0)
##  cod_paysuccess <= 1
##  ->  class 0  [0.770]
## 
## Rule 6: (7, lift 4.0)
##  client_version = 3.6.2
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.889]
## 
## Rule 7: (22/4, lift 3.6)
##  client_version = 4.6.5
##  gender = M
##  order_paysuccess <= 1
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.792]
## 
## Rule 8: (17/3, lift 3.6)
##  client_version = 4.3.4
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.789]
## 
## Rule 9: (6/1, lift 3.4)
##  client_version = 4.5.2
##  gender = M
##  order_paysuccess <= 1
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.750]
## 
## Rule 10: (43/11, lift 3.3)
##  amount_total > 5.72
##  client_version = 4.3.5
##  gender = M
##  order_paysuccess <= 1
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.733]
## 
## Rule 11: (20/7, lift 2.9)
##  client_version = 4.6.2
##  order_paysuccess <= 1
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.636]
## 
## Rule 12: (1058/389, lift 2.9)
##  amount_total > 4.43
##  gender = M
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.632]
## 
## Rule 13: (481/188, lift 2.8)
##  client_version = 4.7.7
##  cod_paysuccess > 1
##  sig_rate <= 0.4635762
##  ->  class 1  [0.609]
## 
## Rule 14: (80/32, lift 2.7)
##  client_version = 4.5.4
##  gender = M
##  order_paysuccess <= 1
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.598]
## 
## Rule 15: (612/261, lift 2.6)
##  client_version = 4.6.7
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.573]
## 
## Rule 16: (1110/479, lift 2.6)
##  client_version = 4.7.7
##  gender = M
##  order_paysuccess <= 1
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.568]
## 
## Rule 17: (32/14, lift 2.5)
##  client_version = 4.6.6
##  gender = M
##  cod_paysuccess > 0
##  cod_paysuccess <= 1
##  sig_rate <= 0.7017544
##  ->  class 1  [0.559]
## 
## Rule 18: (103/47, lift 2.5)
##  client_version = 4.4.5
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.7017544
##  ->  class 1  [0.543]
## 
## Rule 19: (651/309, lift 2.4)
##  client_version = 4.8.3
##  gender = M
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.525]
## 
## Rule 20: (11341/6075, lift 2.1)
##  cod_paysuccess > 0
##  sig_rate <= 0.4635762
##  ->  class 1  [0.464]
## 
## Default class: 0
## 
## 
## Evaluation on training data (108317 cases):
## 
##          Rules     
##    ----------------
##      No      Errors
## 
##      20 23270(21.5%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##   82574  1798    (a): class 0
##   21472  2473    (b): class 1
## 
## 
##  Attribute usage:
## 
##   77.58% cod_paysuccess
##   75.40% sig_rate
##   13.71% order_paysuccess
##   12.13% client_version
##   11.29% amount_total
##    3.27% gender
## 
## 
## Time: 2.3 secs

对新客户是否会拒收进行建模

Raw_data20181001old <- sample_dataset1 %>% select(-order_name,-create_time,
                                               -update_time,-user_id,- rn,
                                               - urn,-uid,-state) %>% filter(is.na(order_num))

Labelold <- as.numeric(as.factor(Raw_data20181001old$status))-1

# Label
# 0      1 
# 105464  29931  bad rate 0.2210643 

# woeold <- IV_WOE(Raw_data = Raw_data20181001old[,-1],Label = Labelold,Woe_t = T)

load("/Users/milin/COD\ 建模/woenew.RData")
woet <- woeold[[1]]  %>% select(-order_paysuccess,-cod_paysuccess,-cod_delivered,
                                -sig_rate,-allorder_amount,
                                -cod_paysuccess_amount,
                                -cod_delivered_amount)


woet$label <- as.factor(woet$label)
tmp <- SplitSample(Raw_data = woet[,-1],Label = woet$label,rate = 0.8)

train <- tmp[[1]]
test <- tmp[[2]]

l <- glm(label ~.,data = train, family = "binomial")

pre <- predict(l,newdata = test,type = 'response')

tmp <- data.frame(pre,test$label)

library("pROC")
roc(response = tmp[,2], predictor = tmp[,1],plot=TRUE, print.thres=TRUE, print.auc=TRUE)

## 
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE,     print.thres = TRUE, print.auc = TRUE)
## 
## Data: tmp[, 1] in 33937 controls (tmp[, 2] 0) < 11319 cases (tmp[, 2] 1).
## Area under the curve: 0.6556

对新客户进行构建决策树

## [1] "样本标签的比例为: 2.99821542158456"
## [1] "平衡之后的标签比例为: 1"
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## Rule-Based Model
## Number of samples: 90554 
## Number of predictors: 14 
## 
## Number of Rules: 79 
## 
## Non-standard options: minimum number of cases: 20

## 
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label,     predictor = treepre[, 2], plot = TRUE, print.thres = TRUE,     print.auc = TRUE)
## 
## Data: treepre[, 2] in 11319 controls ((testtree %>% filter(gender != "U"))$label 0) < 11319 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.6186
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Tue Jan 15 14:39:11 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 90554 cases (15 attributes) from undefined.data
## 
## Rules:
## 
## Rule 1: (312/71, lift 1.5)
##  shipping_state = Goa
##  gender = F
##  ->  class 0  [0.771]
## 
## Rule 2: (24/5, lift 1.5)
##  shipping_state = Himachal Pradesh
##  client_version = 4.6.7
##  ->  class 0  [0.769]
## 
## Rule 3: (52/12, lift 1.5)
##  shipping_state = Puducherry
##  client_os = android
##  gender = F
##  ->  class 0  [0.759]
## 
## Rule 4: (111/28, lift 1.5)
##  shipping_state = Uttarakhand
##  client_version = 5.0.3
##  ->  class 0  [0.743]
## 
## Rule 5: (60/15, lift 1.5)
##  shipping_state = Manipur
##  client_version = 5.0.3
##  ->  class 0  [0.742]
## 
## Rule 6: (83/22, lift 1.5)
##  shipping_state = Delhi
##  amount_total <= 8.02
##  client_version = 4.6.7
##  ->  class 0  [0.729]
## 
## Rule 7: (61/17, lift 1.4)
##  shipping_state = Rajasthan
##  client_version = 4.6.7
##  ->  class 0  [0.714]
## 
## Rule 8: (3067/895, lift 1.4)
##  shipping_state = Delhi
##  gender = F
##  ->  class 0  [0.708]
## 
## Rule 9: (56/16, lift 1.4)
##  shipping_state = Kerala
##  client_version = 4.6.7
##  ->  class 0  [0.707]
## 
## Rule 10: (37/11, lift 1.4)
##  shipping_state = Gujarat
##  client_version = 4.6.7
##  ipnum > 2
##  ->  class 0  [0.692]
## 
## Rule 11: (501/155, lift 1.4)
##  shipping_state = Meghalaya
##  ->  class 0  [0.690]
## 
## Rule 12: (6142/1942, lift 1.4)
##  client_version = 5.0.3
##  gender = F
##  ->  class 0  [0.684]
## 
## Rule 13: (1735/553, lift 1.4)
##  shipping_state = Delhi
##  client_version = 5.0.3
##  ->  class 0  [0.681]
## 
## Rule 14: (120/39, lift 1.3)
##  shipping_state = Himachal Pradesh
##  ipnum > 1
##  ipnum <= 2
##  ->  class 0  [0.672]
## 
## Rule 15: (1140/383, lift 1.3)
##  shipping_state = Kerala
##  client_os = android
##  gender = F
##  ->  class 0  [0.664]
## 
## Rule 16: (295/99, lift 1.3)
##  shipping_state = Manipur
##  client_os = android
##  gender = F
##  ->  class 0  [0.663]
## 
## Rule 17: (4629/1596, lift 1.3)
##  client_version = 5.0.1
##  ->  class 0  [0.655]
## 
## Rule 18: (44/15, lift 1.3)
##  client_version = 4.7.2
##  gender = F
##  ->  class 0  [0.652]
## 
## Rule 19: (4413/1554, lift 1.3)
##  shipping_state = Maharashtra
##  gender = F
##  ->  class 0  [0.648]
## 
## Rule 20: (359/128, lift 1.3)
##  shipping_state = Chandigarh
##  gender = F
##  ->  class 0  [0.643]
## 
## Rule 21: (296/107, lift 1.3)
##  shipping_state = Andhra Pradesh
##  client_version = 5.0.3
##  ->  class 0  [0.638]
## 
## Rule 22: (2009/750, lift 1.3)
##  shipping_state = Tamil Nadu
##  gender = F
##  ->  class 0  [0.627]
## 
## Rule 23: (730/273, lift 1.3)
##  shipping_state = Tamil Nadu
##  client_version = 5.0.3
##  ->  class 0  [0.626]
## 
## Rule 24: (622/233, lift 1.2)
##  shipping_state = Haryana
##  amount_total <= 8.21
##  gender = F
##  ->  class 0  [0.625]
## 
## Rule 25: (1665/624, lift 1.3)
##  shipping_state = Punjab
##  gender = F
##  ->  class 0  [0.625]
## 
## Rule 26: (229/87, lift 1.2)
##  shipping_state = Mizoram
##  gender = F
##  ->  class 0  [0.619]
## 
## Rule 27: (2954/1136, lift 1.2)
##  amount_total <= 2.29
##  ->  class 0  [0.615]
## 
## Rule 28: (15064/5819, lift 1.2)
##  amount_total <= 5.05
##  gender = F
##  ->  class 0  [0.614]
## 
## Rule 29: (8981/3472, lift 1.2)
##  amount_total <= 11.17
##  client_version = 5.0.3
##  ->  class 0  [0.613]
## 
## Rule 30: (224/87, lift 1.2)
##  amount_total <= 21.12
##  client_version = 4.7.6
##  gender = F
##  ->  class 0  [0.611]
## 
## Rule 31: (414/161, lift 1.2)
##  shipping_state = Delhi
##  client_version = 4.8.5
##  ->  class 0  [0.611]
## 
## Rule 32: (1295/508, lift 1.2)
##  client_version = 4.8.5
##  gender = F
##  ->  class 0  [0.608]
## 
## Rule 33: (102/40, lift 1.2)
##  client_version = 4.6.2
##  ->  class 0  [0.606]
## 
## Rule 34: (2578/1015, lift 1.2)
##  shipping_state = Maharashtra
##  client_version = 5.0.3
##  ->  class 0  [0.606]
## 
## Rule 35: (7223/2846, lift 1.2)
##  client_os = iOS
##  ->  class 0  [0.606]
## 
## Rule 36: (573/227, lift 1.2)
##  shipping_state = Uttarakhand
##  client_os = android
##  gender = F
##  ->  class 0  [0.603]
## 
## Rule 37: (1127/451, lift 1.2)
##  client_version = 4.9.2
##  ipnum > 2
##  ->  class 0  [0.600]
## 
## Rule 38: (102/41, lift 1.2)
##  shipping_state = Himachal Pradesh
##  gender = M
##  ->  class 0  [0.596]
## 
## Rule 39: (335/135, lift 1.2)
##  client_version = 4.9.0
##  ->  class 0  [0.596]
## 
## Rule 40: (147/61, lift 1.2)
##  client_version = 5.0.0
##  ->  class 0  [0.584]
## 
## Rule 41: (197/82, lift 1.2)
##  shipping_state = Punjab
##  client_version = 4.8.5
##  ->  class 0  [0.583]
## 
## Rule 42: (483/204, lift 1.2)
##  shipping_state = Uttar Pradesh
##  client_version = 4.9.2
##  ->  class 0  [0.577]
## 
## Rule 43: (719/311, lift 1.1)
##  shipping_state = Chhattisgarh
##  ->  class 0  [0.567]
## 
## Rule 44: (2024/883, lift 1.1)
##  client_version = 4.8.6
##  ->  class 0  [0.564]
## 
## Rule 45: (9072/3967, lift 1.1)
##  client_version = 4.9.2
##  ->  class 0  [0.563]
## 
## Rule 46: (7420/3374, lift 1.1)
##  shipping_state = Gujarat
##  ->  class 0  [0.545]
## 
## Rule 47: (4, lift 1.7)
##  client_version = 4.2.0
##  gender = M
##  ->  class 1  [0.833]
## 
## Rule 48: (16/3, lift 1.6)
##  client_version = 4.1.1
##  gender = M
##  ->  class 1  [0.778]
## 
## Rule 49: (24/5, lift 1.5)
##  amount_total > 21.12
##  client_version = 4.7.6
##  gender = F
##  ->  class 1  [0.769]
## 
## Rule 50: (2603/643, lift 1.5)
##  shipping_state = Assam
##  client_version = 4.7.7
##  ->  class 1  [0.753]
## 
## Rule 51: (2, lift 1.5)
##  client_version = 3.3.1
##  ->  class 1  [0.750]
## 
## Rule 52: (25/6, lift 1.5)
##  client_version = 4.2.3
##  gender = M
##  ->  class 1  [0.741]
## 
## Rule 53: (283/77, lift 1.5)
##  shipping_state = Tripura
##  client_os = android
##  ->  class 1  [0.726]
## 
## Rule 54: (27/7, lift 1.4)
##  client_version = 3.9.1
##  ->  class 1  [0.724]
## 
## Rule 55: (5265/1529, lift 1.4)
##  shipping_state = Assam
##  gender = M
##  ->  class 1  [0.710]
## 
## Rule 56: (65/20, lift 1.4)
##  client_version = 4.3.3
##  gender = M
##  ->  class 1  [0.687]
## 
## Rule 57: (416/142, lift 1.3)
##  shipping_state = Sikkim
##  client_os = android
##  ->  class 1  [0.658]
## 
## Rule 58: (81/28, lift 1.3)
##  client_os = missing
##  ->  class 1  [0.651]
## 
## Rule 59: (4375/1536, lift 1.3)
##  shipping_state = Karnataka
##  client_os = android
##  gender = M
##  ipnum <= 1
##  ->  class 1  [0.649]
## 
## Rule 60: (100/35, lift 1.3)
##  client_version = 4.6.6
##  gender = M
##  ->  class 1  [0.647]
## 
## Rule 61: (1032/367, lift 1.3)
##  shipping_state = Karnataka
##  client_os = android
##  client_version = 4.8.3
##  ipnum <= 1
##  ->  class 1  [0.644]
## 
## Rule 62: (448/159, lift 1.3)
##  shipping_state = Sikkim
##  ->  class 1  [0.644]
## 
## Rule 63: (4330/1558, lift 1.3)
##  shipping_state = Karnataka
##  client_version = 4.7.7
##  ->  class 1  [0.640]
## 
## Rule 64: (158/58, lift 1.3)
##  shipping_state = Nagaland
##  client_version = 4.7.7
##  ->  class 1  [0.631]
## 
## Rule 65: (20379/7562, lift 1.3)
##  client_version = 4.7.7
##  gender = M
##  ->  class 1  [0.629]
## 
## Rule 66: (212/79, lift 1.3)
##  client_version = 4.5.4
##  gender = M
##  ->  class 1  [0.626]
## 
## Rule 67: (126/47, lift 1.2)
##  shipping_state = Telangana
##  client_version = 4.6.7
##  ->  class 1  [0.625]
## 
## Rule 68: (674/256, lift 1.2)
##  amount_total > 8.02
##  client_version = 4.6.7
##  gender = M
##  ->  class 1  [0.620]
## 
## Rule 69: (8147/3104, lift 1.2)
##  amount_total > 2.29
##  client_os = android
##  client_version = 4.8.3
##  gender = M
##  ->  class 1  [0.619]
## 
## Rule 70: (618/235, lift 1.2)
##  shipping_state = Bihar
##  ->  class 1  [0.619]
## 
## Rule 71: (544/208, lift 1.2)
##  shipping_state = Arunachal Pradesh
##  client_os = android
##  gender = F
##  ->  class 1  [0.617]
## 
## Rule 72: (261/101, lift 1.2)
##  shipping_state = Jharkhand
##  gender = M
##  ->  class 1  [0.612]
## 
## Rule 73: (4328/1685, lift 1.2)
##  shipping_state = Telangana
##  client_os = android
##  gender = M
##  ->  class 1  [0.611]
## 
## Rule 74: (609/238, lift 1.2)
##  shipping_state = Arunachal Pradesh
##  ->  class 1  [0.609]
## 
## Rule 75: (290/123, lift 1.2)
##  shipping_state = Karnataka
##  client_version = 4.6.7
##  ->  class 1  [0.575]
## 
## Rule 76: (314/134, lift 1.1)
##  shipping_state = Odisha
##  client_os = android
##  gender = M
##  ->  class 1  [0.573]
## 
## Rule 77: (325/141, lift 1.1)
##  shipping_state = Odisha
##  gender = M
##  ->  class 1  [0.566]
## 
## Rule 78: (1051/456, lift 1.1)
##  shipping_state = Uttar Pradesh
##  gender = M
##  ->  class 1  [0.566]
## 
## Rule 79: (83250/40872, lift 1.0)
##  client_os = android
##  ->  class 1  [0.509]
## 
## Default class: 0
## 
## 
## Evaluation on training data (90554 cases):
## 
##          Rules     
##    ----------------
##      No      Errors
## 
##      79 35622(39.3%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##   27083 18194    (a): class 0
##   17428 27849    (b): class 1
## 
## 
##  Attribute usage:
## 
##  100.00% client_os
##   68.97% client_version
##   68.29% gender
##   53.65% shipping_state
##   35.09% amount_total
##    6.53% ipnum
## 
## 
## Time: 2.1 secs

预测用户是否会取消

## 
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE,     print.thres = TRUE, print.auc = TRUE)
## 
## Data: tmp[, 1] in 1823 controls (tmp[, 2] 0) < 685 cases (tmp[, 2] 1).
## Area under the curve: 0.6969

构建决策树,尝试获得规则

## [1] "样本标签的比例为: 2.66005252407353"
## [1] "平衡之后的标签比例为: 1"
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## Rule-Based Model
## Number of samples: 5484 
## Number of predictors: 14 
## 
## Number of Rules: 9 
## 
## Non-standard options: minimum number of cases: 20

## 
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label,     predictor = treepre[, 2], plot = TRUE, print.thres = TRUE,     print.auc = TRUE)
## 
## Data: treepre[, 2] in 685 controls ((testtree %>% filter(gender != "U"))$label 0) < 685 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.5704
## 
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
##  T, control = tc)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Tue Jan 15 14:39:15 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 5484 cases (15 attributes) from undefined.data
## 
## Rules:
## 
## Rule 1: (24/5, lift 1.5)
##  shipping_state = Goa
##  ->  class 0  [0.769]
## 
## Rule 2: (52/13, lift 1.5)
##  shipping_state = Delhi
##  client_version = 5.0.1
##  ->  class 0  [0.741]
## 
## Rule 3: (806/266, lift 1.3)
##  cod_delivered_amount > 5.47
##  ->  class 0  [0.670]
## 
## Rule 4: (22/7, lift 1.3)
##  client_version = 4.9.1
##  ->  class 0  [0.667]
## 
## Rule 5: (35/12, lift 1.3)
##  shipping_state = Bihar
##  ->  class 0  [0.649]
## 
## Rule 6: (1021/438, lift 1.1)
##  client_version = 5.0.3
##  ->  class 0  [0.571]
## 
## Rule 7: (586/264, lift 1.1)
##  client_version = 4.9.2
##  ->  class 0  [0.549]
## 
## Rule 8: (18/4, lift 1.5)
##  client_version = 4.6.5
##  ->  class 1  [0.750]
## 
## Rule 9: (4678/2202, lift 1.1)
##  cod_delivered_amount <= 5.47
##  ->  class 1  [0.529]
## 
## Default class: 0
## 
## 
## Evaluation on training data (5484 cases):
## 
##          Rules     
##    ----------------
##      No      Errors
## 
##       9 2335(42.6%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##    1281  1461    (a): class 0
##     874  1868    (b): class 1
## 
## 
##  Attribute usage:
## 
##  100.00% cod_delivered_amount
##   30.98% client_version
##    2.02% shipping_state
## 
## 
## Time: 0.0 secs