特征纬度包括两个方面:1.订单信息 2. 用户信息
字段含义如下:
一共35w条数据,坏样本比例大约为 24%
下面是挑出一些比较有区分能力的特征的分析结果:
count | count_distr | good | bad | badprob | woe | bin_iv | total_iv |
---|---|---|---|---|---|---|---|
56558 | 0.1563767 | 47417 | 9141 | 0.1616217 | -0.4893475 | 0.0326334 | 0.1031834 |
74501 | 0.2059871 | 59091 | 15410 | 0.2068429 | -0.1871984 | 0.0068634 | 0.1031834 |
133546 | 0.3692400 | 102691 | 30855 | 0.2310440 | -0.0455621 | 0.0007574 | 0.1031834 |
75636 | 0.2091252 | 53330 | 22306 | 0.2949125 | 0.2852202 | 0.0182460 | 0.1031834 |
21437 | 0.0592710 | 12622 | 8815 | 0.4112049 | 0.7978770 | 0.0446833 | 0.1031834 |
坏样本比较低的州包括:
## [1] "punjab,up,telangana,ponducherry,jalandhar,maharastra,haryana,firozpur,Puducherry,Daman and Diu,Goa,Meghalaya,Manipur,Mizoram,Chandigarh,Kerala,Delhi"
## [1] "一共 17 个州"
坏样本比较高的州包括:
## [1] "Sikkim,Assam,Tripura,rajsthan,dadra and nagar haveli"
## [1] "一共 5 个州"
##
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE, print.thres = TRUE, print.auc = TRUE)
##
## Data: tmp[, 1] in 55030 controls (tmp[, 2] 0) < 17305 cases (tmp[, 2] 1).
## Area under the curve: 0.6685
##
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label, predictor = treepre[, 2], plot = TRUE, print.thres = TRUE, print.auc = TRUE)
##
## Data: treepre[, 2] in 55030 controls ((testtree %>% filter(gender != "U"))$label 0) < 17305 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.5309
##
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
## T, control = tc)
##
##
## C5.0 [Release 2.07 GPL Edition] Tue Jan 15 14:38:31 2019
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 289343 cases (15 attributes) from undefined.data
##
## Rules:
##
## Rule 1: (20190/3222, lift 1.1)
## client_version = 5.0.1
## -> class 0 [0.840]
##
## Rule 2: (62026/11067, lift 1.1)
## client_version = 5.0.3
## -> class 0 [0.822]
##
## Rule 3: (1486/266, lift 1.1)
## client_version = 4.9.1
## -> class 0 [0.821]
##
## Rule 4: (141352/25494, lift 1.1)
## gender = F
## -> class 0 [0.820]
##
## Rule 5: (70553/12790, lift 1.1)
## sig_rate > 0.4255319
## -> class 0 [0.819]
##
## Rule 6: (6670/1331, lift 1.1)
## client_version = 4.8.6
## -> class 0 [0.800]
##
## Rule 7: (31673/6521, lift 1.0)
## client_version = 4.9.2
## -> class 0 [0.794]
##
## Rule 8: (90112/19260, lift 1.0)
## amount_total <= 4.39
## -> class 0 [0.786]
##
## Rule 9: (207541/51204, lift 1.0)
## cod_paysuccess <= 0
## -> class 0 [0.753]
##
## Rule 10: (19/4, lift 3.2)
## amount_total > 4.39
## client_version = 4.2.3
## cod_paysuccess > 0
## cod_paysuccess <= 10
## sig_rate <= 0.5976096
## -> class 1 [0.762]
##
## Rule 11: (69/18, lift 3.1)
## gender = M
## cod_paysuccess > 10
## sig_rate <= 0.5976096
## -> class 1 [0.732]
##
## Rule 12: (25/9, lift 2.6)
## client_version = 4.3.3
## gender = M
## cod_paysuccess > 0
## sig_rate <= 0.4255319
## -> class 1 [0.630]
##
## Rule 13: (11/4, lift 2.6)
## client_version = 4.0.3
## cod_paysuccess > 0
## sig_rate <= 0.4255319
## -> class 1 [0.615]
##
## Rule 14: (102/40, lift 2.5)
## client_version = 4.4.5
## gender = M
## cod_paysuccess > 0
## cod_paysuccess <= 10
## sig_rate <= 0.5976096
## -> class 1 [0.606]
##
## Rule 15: (16243/9271, lift 1.8)
## cod_paysuccess > 0
## sig_rate <= 0.5976096
## -> class 1 [0.429]
##
## Default class: 0
##
##
## Evaluation on training data (289343 cases):
##
## Rules
## ----------------
## No Errors
##
## 15 68681(23.7%) <<
##
##
## (a) (b) <-classified as
## ----- -----
## 218996 1125 (a): class 0
## 67556 1666 (b): class 1
##
##
## Attribute usage:
##
## 77.34% cod_paysuccess
## 48.92% gender
## 42.23% client_version
## 31.15% amount_total
## 28.27% sig_rate
##
##
## Time: 4.9 secs
##
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE, print.thres = TRUE, print.auc = TRUE)
##
## Data: tmp[, 1] in 21092 controls (tmp[, 2] 0) < 5986 cases (tmp[, 2] 1).
## Area under the curve: 0.693
##
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label, predictor = treepre[, 2], plot = TRUE, print.thres = TRUE, print.auc = TRUE)
##
## Data: treepre[, 2] in 21092 controls ((testtree %>% filter(gender != "U"))$label 0) < 5986 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.5767
##
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
## T, control = tc)
##
##
## C5.0 [Release 2.07 GPL Edition] Tue Jan 15 14:38:50 2019
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 108317 cases (15 attributes) from undefined.data
##
## Rules:
##
## Rule 1: (13570/2189, lift 1.1)
## order_paysuccess > 1
## cod_paysuccess <= 1
## -> class 0 [0.839]
##
## Rule 2: (9950/1622, lift 1.1)
## client_version = 5.0.1
## -> class 0 [0.837]
##
## Rule 3: (11133/1995, lift 1.1)
## amount_total <= 4.43
## cod_paysuccess > 1
## -> class 0 [0.821]
##
## Rule 4: (70325/12710, lift 1.1)
## sig_rate > 0.4635762
## -> class 0 [0.819]
##
## Rule 5: (71071/16357, lift 1.0)
## cod_paysuccess <= 1
## -> class 0 [0.770]
##
## Rule 6: (7, lift 4.0)
## client_version = 3.6.2
## gender = M
## cod_paysuccess > 0
## sig_rate <= 0.7017544
## -> class 1 [0.889]
##
## Rule 7: (22/4, lift 3.6)
## client_version = 4.6.5
## gender = M
## order_paysuccess <= 1
## cod_paysuccess > 0
## sig_rate <= 0.7017544
## -> class 1 [0.792]
##
## Rule 8: (17/3, lift 3.6)
## client_version = 4.3.4
## gender = M
## cod_paysuccess > 0
## sig_rate <= 0.7017544
## -> class 1 [0.789]
##
## Rule 9: (6/1, lift 3.4)
## client_version = 4.5.2
## gender = M
## order_paysuccess <= 1
## cod_paysuccess > 0
## sig_rate <= 0.7017544
## -> class 1 [0.750]
##
## Rule 10: (43/11, lift 3.3)
## amount_total > 5.72
## client_version = 4.3.5
## gender = M
## order_paysuccess <= 1
## cod_paysuccess > 0
## sig_rate <= 0.7017544
## -> class 1 [0.733]
##
## Rule 11: (20/7, lift 2.9)
## client_version = 4.6.2
## order_paysuccess <= 1
## cod_paysuccess > 0
## sig_rate <= 0.7017544
## -> class 1 [0.636]
##
## Rule 12: (1058/389, lift 2.9)
## amount_total > 4.43
## gender = M
## cod_paysuccess > 1
## sig_rate <= 0.4635762
## -> class 1 [0.632]
##
## Rule 13: (481/188, lift 2.8)
## client_version = 4.7.7
## cod_paysuccess > 1
## sig_rate <= 0.4635762
## -> class 1 [0.609]
##
## Rule 14: (80/32, lift 2.7)
## client_version = 4.5.4
## gender = M
## order_paysuccess <= 1
## cod_paysuccess > 0
## sig_rate <= 0.7017544
## -> class 1 [0.598]
##
## Rule 15: (612/261, lift 2.6)
## client_version = 4.6.7
## gender = M
## cod_paysuccess > 0
## sig_rate <= 0.4635762
## -> class 1 [0.573]
##
## Rule 16: (1110/479, lift 2.6)
## client_version = 4.7.7
## gender = M
## order_paysuccess <= 1
## cod_paysuccess > 0
## sig_rate <= 0.7017544
## -> class 1 [0.568]
##
## Rule 17: (32/14, lift 2.5)
## client_version = 4.6.6
## gender = M
## cod_paysuccess > 0
## cod_paysuccess <= 1
## sig_rate <= 0.7017544
## -> class 1 [0.559]
##
## Rule 18: (103/47, lift 2.5)
## client_version = 4.4.5
## gender = M
## cod_paysuccess > 0
## sig_rate <= 0.7017544
## -> class 1 [0.543]
##
## Rule 19: (651/309, lift 2.4)
## client_version = 4.8.3
## gender = M
## cod_paysuccess > 0
## sig_rate <= 0.4635762
## -> class 1 [0.525]
##
## Rule 20: (11341/6075, lift 2.1)
## cod_paysuccess > 0
## sig_rate <= 0.4635762
## -> class 1 [0.464]
##
## Default class: 0
##
##
## Evaluation on training data (108317 cases):
##
## Rules
## ----------------
## No Errors
##
## 20 23270(21.5%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 82574 1798 (a): class 0
## 21472 2473 (b): class 1
##
##
## Attribute usage:
##
## 77.58% cod_paysuccess
## 75.40% sig_rate
## 13.71% order_paysuccess
## 12.13% client_version
## 11.29% amount_total
## 3.27% gender
##
##
## Time: 2.3 secs
Raw_data20181001old <- sample_dataset1 %>% select(-order_name,-create_time,
-update_time,-user_id,- rn,
- urn,-uid,-state) %>% filter(is.na(order_num))
Labelold <- as.numeric(as.factor(Raw_data20181001old$status))-1
# Label
# 0 1
# 105464 29931 bad rate 0.2210643
# woeold <- IV_WOE(Raw_data = Raw_data20181001old[,-1],Label = Labelold,Woe_t = T)
load("/Users/milin/COD\ 建模/woenew.RData")
woet <- woeold[[1]] %>% select(-order_paysuccess,-cod_paysuccess,-cod_delivered,
-sig_rate,-allorder_amount,
-cod_paysuccess_amount,
-cod_delivered_amount)
woet$label <- as.factor(woet$label)
tmp <- SplitSample(Raw_data = woet[,-1],Label = woet$label,rate = 0.8)
train <- tmp[[1]]
test <- tmp[[2]]
l <- glm(label ~.,data = train, family = "binomial")
pre <- predict(l,newdata = test,type = 'response')
tmp <- data.frame(pre,test$label)
library("pROC")
roc(response = tmp[,2], predictor = tmp[,1],plot=TRUE, print.thres=TRUE, print.auc=TRUE)
##
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE, print.thres = TRUE, print.auc = TRUE)
##
## Data: tmp[, 1] in 33937 controls (tmp[, 2] 0) < 11319 cases (tmp[, 2] 1).
## Area under the curve: 0.6556
## [1] "样本标签的比例为: 2.99821542158456"
## [1] "平衡之后的标签比例为: 1"
##
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
## T, control = tc)
##
## Rule-Based Model
## Number of samples: 90554
## Number of predictors: 14
##
## Number of Rules: 79
##
## Non-standard options: minimum number of cases: 20
##
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label, predictor = treepre[, 2], plot = TRUE, print.thres = TRUE, print.auc = TRUE)
##
## Data: treepre[, 2] in 11319 controls ((testtree %>% filter(gender != "U"))$label 0) < 11319 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.6186
##
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
## T, control = tc)
##
##
## C5.0 [Release 2.07 GPL Edition] Tue Jan 15 14:39:11 2019
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 90554 cases (15 attributes) from undefined.data
##
## Rules:
##
## Rule 1: (312/71, lift 1.5)
## shipping_state = Goa
## gender = F
## -> class 0 [0.771]
##
## Rule 2: (24/5, lift 1.5)
## shipping_state = Himachal Pradesh
## client_version = 4.6.7
## -> class 0 [0.769]
##
## Rule 3: (52/12, lift 1.5)
## shipping_state = Puducherry
## client_os = android
## gender = F
## -> class 0 [0.759]
##
## Rule 4: (111/28, lift 1.5)
## shipping_state = Uttarakhand
## client_version = 5.0.3
## -> class 0 [0.743]
##
## Rule 5: (60/15, lift 1.5)
## shipping_state = Manipur
## client_version = 5.0.3
## -> class 0 [0.742]
##
## Rule 6: (83/22, lift 1.5)
## shipping_state = Delhi
## amount_total <= 8.02
## client_version = 4.6.7
## -> class 0 [0.729]
##
## Rule 7: (61/17, lift 1.4)
## shipping_state = Rajasthan
## client_version = 4.6.7
## -> class 0 [0.714]
##
## Rule 8: (3067/895, lift 1.4)
## shipping_state = Delhi
## gender = F
## -> class 0 [0.708]
##
## Rule 9: (56/16, lift 1.4)
## shipping_state = Kerala
## client_version = 4.6.7
## -> class 0 [0.707]
##
## Rule 10: (37/11, lift 1.4)
## shipping_state = Gujarat
## client_version = 4.6.7
## ipnum > 2
## -> class 0 [0.692]
##
## Rule 11: (501/155, lift 1.4)
## shipping_state = Meghalaya
## -> class 0 [0.690]
##
## Rule 12: (6142/1942, lift 1.4)
## client_version = 5.0.3
## gender = F
## -> class 0 [0.684]
##
## Rule 13: (1735/553, lift 1.4)
## shipping_state = Delhi
## client_version = 5.0.3
## -> class 0 [0.681]
##
## Rule 14: (120/39, lift 1.3)
## shipping_state = Himachal Pradesh
## ipnum > 1
## ipnum <= 2
## -> class 0 [0.672]
##
## Rule 15: (1140/383, lift 1.3)
## shipping_state = Kerala
## client_os = android
## gender = F
## -> class 0 [0.664]
##
## Rule 16: (295/99, lift 1.3)
## shipping_state = Manipur
## client_os = android
## gender = F
## -> class 0 [0.663]
##
## Rule 17: (4629/1596, lift 1.3)
## client_version = 5.0.1
## -> class 0 [0.655]
##
## Rule 18: (44/15, lift 1.3)
## client_version = 4.7.2
## gender = F
## -> class 0 [0.652]
##
## Rule 19: (4413/1554, lift 1.3)
## shipping_state = Maharashtra
## gender = F
## -> class 0 [0.648]
##
## Rule 20: (359/128, lift 1.3)
## shipping_state = Chandigarh
## gender = F
## -> class 0 [0.643]
##
## Rule 21: (296/107, lift 1.3)
## shipping_state = Andhra Pradesh
## client_version = 5.0.3
## -> class 0 [0.638]
##
## Rule 22: (2009/750, lift 1.3)
## shipping_state = Tamil Nadu
## gender = F
## -> class 0 [0.627]
##
## Rule 23: (730/273, lift 1.3)
## shipping_state = Tamil Nadu
## client_version = 5.0.3
## -> class 0 [0.626]
##
## Rule 24: (622/233, lift 1.2)
## shipping_state = Haryana
## amount_total <= 8.21
## gender = F
## -> class 0 [0.625]
##
## Rule 25: (1665/624, lift 1.3)
## shipping_state = Punjab
## gender = F
## -> class 0 [0.625]
##
## Rule 26: (229/87, lift 1.2)
## shipping_state = Mizoram
## gender = F
## -> class 0 [0.619]
##
## Rule 27: (2954/1136, lift 1.2)
## amount_total <= 2.29
## -> class 0 [0.615]
##
## Rule 28: (15064/5819, lift 1.2)
## amount_total <= 5.05
## gender = F
## -> class 0 [0.614]
##
## Rule 29: (8981/3472, lift 1.2)
## amount_total <= 11.17
## client_version = 5.0.3
## -> class 0 [0.613]
##
## Rule 30: (224/87, lift 1.2)
## amount_total <= 21.12
## client_version = 4.7.6
## gender = F
## -> class 0 [0.611]
##
## Rule 31: (414/161, lift 1.2)
## shipping_state = Delhi
## client_version = 4.8.5
## -> class 0 [0.611]
##
## Rule 32: (1295/508, lift 1.2)
## client_version = 4.8.5
## gender = F
## -> class 0 [0.608]
##
## Rule 33: (102/40, lift 1.2)
## client_version = 4.6.2
## -> class 0 [0.606]
##
## Rule 34: (2578/1015, lift 1.2)
## shipping_state = Maharashtra
## client_version = 5.0.3
## -> class 0 [0.606]
##
## Rule 35: (7223/2846, lift 1.2)
## client_os = iOS
## -> class 0 [0.606]
##
## Rule 36: (573/227, lift 1.2)
## shipping_state = Uttarakhand
## client_os = android
## gender = F
## -> class 0 [0.603]
##
## Rule 37: (1127/451, lift 1.2)
## client_version = 4.9.2
## ipnum > 2
## -> class 0 [0.600]
##
## Rule 38: (102/41, lift 1.2)
## shipping_state = Himachal Pradesh
## gender = M
## -> class 0 [0.596]
##
## Rule 39: (335/135, lift 1.2)
## client_version = 4.9.0
## -> class 0 [0.596]
##
## Rule 40: (147/61, lift 1.2)
## client_version = 5.0.0
## -> class 0 [0.584]
##
## Rule 41: (197/82, lift 1.2)
## shipping_state = Punjab
## client_version = 4.8.5
## -> class 0 [0.583]
##
## Rule 42: (483/204, lift 1.2)
## shipping_state = Uttar Pradesh
## client_version = 4.9.2
## -> class 0 [0.577]
##
## Rule 43: (719/311, lift 1.1)
## shipping_state = Chhattisgarh
## -> class 0 [0.567]
##
## Rule 44: (2024/883, lift 1.1)
## client_version = 4.8.6
## -> class 0 [0.564]
##
## Rule 45: (9072/3967, lift 1.1)
## client_version = 4.9.2
## -> class 0 [0.563]
##
## Rule 46: (7420/3374, lift 1.1)
## shipping_state = Gujarat
## -> class 0 [0.545]
##
## Rule 47: (4, lift 1.7)
## client_version = 4.2.0
## gender = M
## -> class 1 [0.833]
##
## Rule 48: (16/3, lift 1.6)
## client_version = 4.1.1
## gender = M
## -> class 1 [0.778]
##
## Rule 49: (24/5, lift 1.5)
## amount_total > 21.12
## client_version = 4.7.6
## gender = F
## -> class 1 [0.769]
##
## Rule 50: (2603/643, lift 1.5)
## shipping_state = Assam
## client_version = 4.7.7
## -> class 1 [0.753]
##
## Rule 51: (2, lift 1.5)
## client_version = 3.3.1
## -> class 1 [0.750]
##
## Rule 52: (25/6, lift 1.5)
## client_version = 4.2.3
## gender = M
## -> class 1 [0.741]
##
## Rule 53: (283/77, lift 1.5)
## shipping_state = Tripura
## client_os = android
## -> class 1 [0.726]
##
## Rule 54: (27/7, lift 1.4)
## client_version = 3.9.1
## -> class 1 [0.724]
##
## Rule 55: (5265/1529, lift 1.4)
## shipping_state = Assam
## gender = M
## -> class 1 [0.710]
##
## Rule 56: (65/20, lift 1.4)
## client_version = 4.3.3
## gender = M
## -> class 1 [0.687]
##
## Rule 57: (416/142, lift 1.3)
## shipping_state = Sikkim
## client_os = android
## -> class 1 [0.658]
##
## Rule 58: (81/28, lift 1.3)
## client_os = missing
## -> class 1 [0.651]
##
## Rule 59: (4375/1536, lift 1.3)
## shipping_state = Karnataka
## client_os = android
## gender = M
## ipnum <= 1
## -> class 1 [0.649]
##
## Rule 60: (100/35, lift 1.3)
## client_version = 4.6.6
## gender = M
## -> class 1 [0.647]
##
## Rule 61: (1032/367, lift 1.3)
## shipping_state = Karnataka
## client_os = android
## client_version = 4.8.3
## ipnum <= 1
## -> class 1 [0.644]
##
## Rule 62: (448/159, lift 1.3)
## shipping_state = Sikkim
## -> class 1 [0.644]
##
## Rule 63: (4330/1558, lift 1.3)
## shipping_state = Karnataka
## client_version = 4.7.7
## -> class 1 [0.640]
##
## Rule 64: (158/58, lift 1.3)
## shipping_state = Nagaland
## client_version = 4.7.7
## -> class 1 [0.631]
##
## Rule 65: (20379/7562, lift 1.3)
## client_version = 4.7.7
## gender = M
## -> class 1 [0.629]
##
## Rule 66: (212/79, lift 1.3)
## client_version = 4.5.4
## gender = M
## -> class 1 [0.626]
##
## Rule 67: (126/47, lift 1.2)
## shipping_state = Telangana
## client_version = 4.6.7
## -> class 1 [0.625]
##
## Rule 68: (674/256, lift 1.2)
## amount_total > 8.02
## client_version = 4.6.7
## gender = M
## -> class 1 [0.620]
##
## Rule 69: (8147/3104, lift 1.2)
## amount_total > 2.29
## client_os = android
## client_version = 4.8.3
## gender = M
## -> class 1 [0.619]
##
## Rule 70: (618/235, lift 1.2)
## shipping_state = Bihar
## -> class 1 [0.619]
##
## Rule 71: (544/208, lift 1.2)
## shipping_state = Arunachal Pradesh
## client_os = android
## gender = F
## -> class 1 [0.617]
##
## Rule 72: (261/101, lift 1.2)
## shipping_state = Jharkhand
## gender = M
## -> class 1 [0.612]
##
## Rule 73: (4328/1685, lift 1.2)
## shipping_state = Telangana
## client_os = android
## gender = M
## -> class 1 [0.611]
##
## Rule 74: (609/238, lift 1.2)
## shipping_state = Arunachal Pradesh
## -> class 1 [0.609]
##
## Rule 75: (290/123, lift 1.2)
## shipping_state = Karnataka
## client_version = 4.6.7
## -> class 1 [0.575]
##
## Rule 76: (314/134, lift 1.1)
## shipping_state = Odisha
## client_os = android
## gender = M
## -> class 1 [0.573]
##
## Rule 77: (325/141, lift 1.1)
## shipping_state = Odisha
## gender = M
## -> class 1 [0.566]
##
## Rule 78: (1051/456, lift 1.1)
## shipping_state = Uttar Pradesh
## gender = M
## -> class 1 [0.566]
##
## Rule 79: (83250/40872, lift 1.0)
## client_os = android
## -> class 1 [0.509]
##
## Default class: 0
##
##
## Evaluation on training data (90554 cases):
##
## Rules
## ----------------
## No Errors
##
## 79 35622(39.3%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 27083 18194 (a): class 0
## 17428 27849 (b): class 1
##
##
## Attribute usage:
##
## 100.00% client_os
## 68.97% client_version
## 68.29% gender
## 53.65% shipping_state
## 35.09% amount_total
## 6.53% ipnum
##
##
## Time: 2.1 secs
##
## Call:
## roc.default(response = tmp[, 2], predictor = tmp[, 1], plot = TRUE, print.thres = TRUE, print.auc = TRUE)
##
## Data: tmp[, 1] in 1823 controls (tmp[, 2] 0) < 685 cases (tmp[, 2] 1).
## Area under the curve: 0.6969
## [1] "样本标签的比例为: 2.66005252407353"
## [1] "平衡之后的标签比例为: 1"
##
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
## T, control = tc)
##
## Rule-Based Model
## Number of samples: 5484
## Number of predictors: 14
##
## Number of Rules: 9
##
## Non-standard options: minimum number of cases: 20
##
## Call:
## roc.default(response = (testtree %>% filter(gender != "U"))$label, predictor = treepre[, 2], plot = TRUE, print.thres = TRUE, print.auc = TRUE)
##
## Data: treepre[, 2] in 685 controls ((testtree %>% filter(gender != "U"))$label 0) < 685 cases ((testtree %>% filter(gender != "U"))$label 1).
## Area under the curve: 0.5704
##
## Call:
## C5.0.formula(formula = label ~ ., data = traintree[, -5], rules =
## T, control = tc)
##
##
## C5.0 [Release 2.07 GPL Edition] Tue Jan 15 14:39:15 2019
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 5484 cases (15 attributes) from undefined.data
##
## Rules:
##
## Rule 1: (24/5, lift 1.5)
## shipping_state = Goa
## -> class 0 [0.769]
##
## Rule 2: (52/13, lift 1.5)
## shipping_state = Delhi
## client_version = 5.0.1
## -> class 0 [0.741]
##
## Rule 3: (806/266, lift 1.3)
## cod_delivered_amount > 5.47
## -> class 0 [0.670]
##
## Rule 4: (22/7, lift 1.3)
## client_version = 4.9.1
## -> class 0 [0.667]
##
## Rule 5: (35/12, lift 1.3)
## shipping_state = Bihar
## -> class 0 [0.649]
##
## Rule 6: (1021/438, lift 1.1)
## client_version = 5.0.3
## -> class 0 [0.571]
##
## Rule 7: (586/264, lift 1.1)
## client_version = 4.9.2
## -> class 0 [0.549]
##
## Rule 8: (18/4, lift 1.5)
## client_version = 4.6.5
## -> class 1 [0.750]
##
## Rule 9: (4678/2202, lift 1.1)
## cod_delivered_amount <= 5.47
## -> class 1 [0.529]
##
## Default class: 0
##
##
## Evaluation on training data (5484 cases):
##
## Rules
## ----------------
## No Errors
##
## 9 2335(42.6%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 1281 1461 (a): class 0
## 874 1868 (b): class 1
##
##
## Attribute usage:
##
## 100.00% cod_delivered_amount
## 30.98% client_version
## 2.02% shipping_state
##
##
## Time: 0.0 secs