一场dota比赛胜负预测建模(BP之前),(可以选取一场比赛来举例说明)

  1. 列举模型的关键因子库(说明提取方法和原因)

  2. 如何建模,并确定相关权重(说明方法)

  3. 如何训练和验证(说明思路)

数据来源

数据来源 : https://www.kaggle.com/devinanzelmo/dota-2-matches/downloads/dota-2-matches.zip/3

数据集有19个,分别如下

首先,准备数据

library(readr)
ability_ids <- read_csv("/Users/milin/Downloads/dota-2-matches/ability_ids.csv",progress = F)
ability_upgrades <- read_csv("/Users/milin/Downloads/dota-2-matches/ability_upgrades.csv",progress = F)
chat <- read_csv("/Users/milin/Downloads/dota-2-matches/chat.csv",progress = F)
cluster_regions <- read_csv("/Users/milin/Downloads/dota-2-matches/cluster_regions.csv",progress = F)
hero_names <- read_csv("/Users/milin/Downloads/dota-2-matches/hero_names.csv",progress = F)
item_ids <- read_csv("/Users/milin/Downloads/dota-2-matches/item_ids.csv",progress = F)
match_outcomes <- read_csv("/Users/milin/Downloads/dota-2-matches/match_outcomes.csv",progress = F)
match <- read_csv("/Users/milin/Downloads/dota-2-matches/match.csv",progress = F)
objectives <- read_csv("/Users/milin/Downloads/dota-2-matches/objectives.csv",progress = F)
patch_dates <- read_csv("/Users/milin/Downloads/dota-2-matches/patch_dates.csv",progress = F)
player_ratings <- read_csv("/Users/milin/Downloads/dota-2-matches/player_ratings.csv",progress = F)
player_time <- read_csv("/Users/milin/Downloads/dota-2-matches/player_time.csv",progress = F)
players <- read_csv("/Users/milin/Downloads/dota-2-matches/players.csv",progress = F)
purchase_log <- read_csv("/Users/milin/Downloads/dota-2-matches/purchase_log.csv",progress = F)
teamfights_players <- read_csv("/Users/milin/Downloads/dota-2-matches/teamfights_players.csv",progress = F)
teamfights <- read_csv("/Users/milin/Downloads/dota-2-matches/teamfights.csv",progress = F)
test_labels <- read_csv("/Users/milin/Downloads/dota-2-matches/test_labels.csv",progress = F)
test_player <- read_csv("/Users/milin/Downloads/dota-2-matches/test_player.csv",progress = F)
teamfights <- read_csv("/Users/milin/Downloads/dota-2-matches/teamfights.csv",progress = F)

test_labels 数据集中包含了每一场比较,天辉是否胜利的信息,如果胜利,记为1,否则记为0,一共100000场游戏。

下面列举一下有关字段的解释:

  1. account_id :账户id
  2. player_slot :团队以及英雄位置的标识,
  3. hero_id :dota 英雄的唯一ID
  4. item_0 :左上角的物品id
  5. item_1 :中上的物品id
  6. item_2 :右上角的物品id
  7. item_3 :左下角的物品id
  8. item_4 :中下的物品id
  9. item_5 :右下角的物品id
  10. kills 杀了多少人
  11. deaths :死了多少次
  12. assists :助攻多少次
  13. leaver_status : 玩家离赛状态,0标示完成比赛
  14. gold : 比赛结束的时候剩余金额
  15. last_hits :最后的补刀数量
  16. denies :反补数
  17. gold_per_min :每分钟的金钱数量
  18. xp_per_min :每分钟的经验数量
  19. gold_spent : 比赛中花了多少钱
  20. hero_damage :比赛中对英雄造成了多少伤害
  21. tower_damage :对建筑造成了多少伤害
  22. hero_healing : 产生了多少治疗
  23. level :比赛结束的等级
  24. ability_upgrades :玩家的加点列表
  25. additional_units :玩家拥有的其他单位
  26. season :季赛
  27. radiant_win :天辉radiant ,是否胜利
  28. duration : 比赛花了多少时间,秒
  29. start_time : 开始时间
  30. tower_status_radiant : 炮塔的状态,外,中,内,主炮塔
  31. cluster : 服务器集群
  32. first_blood_time :一血时间
  33. game_mode:比赛类型,Invalid,Public matchmaking,Practise。。。
  34. match_id :比赛唯一id

match 中记录了每一场比赛的信息,players记录了每一场比赛中每一个玩家的信息,player_time记录了玩家每一分钟的信息,teamfights 记录的是团战的信息,player_ratings记录了玩家的信息

一场比赛的胜负主要有三个因素影响。

  1. 比赛选手的实力
  2. 比赛选手的表现 3,阵容

首先,通过历史比赛数据,对比赛数据的分布进行简单的了解:

比赛时常的分布

library(ggplot2)
ggplot(match, aes(x=duration)) + 
    geom_histogram(aes(y=..density..),      # 这一步很重要,使用density代替y轴
                   binwidth=.5,
                   colour="black", fill="white") +
    geom_density(alpha=.2, fill="#FF6666")  # 重叠部分采用透明设置

可以发现,关于比赛时常的分布还是非常接近正态分布的,接下来看一下关于比赛时常的一些统计量:

summary(match$duration)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      59    2029    2415    2476    2872   16037

可以发现,平均时常为2476秒,大约为41分钟.

要预测比赛的是否会胜利,一个关键点是在什么时间点进行预测。一般而言有两个时间点

  1. 比赛开始之前
  2. 比赛进行中

赛前模型

不同时间点进行预测,那么模型能够使用的数据是不一样的,如果是在比赛之前进行预测,那么,能够使用到的数据就会比较少,比如选手的历史记录信息,

首先,建立一个在比赛之前进行预测胜率的模型,使用到的数据包括player_ratings和match.player_ratings记录了选手之前赢的总场数,进行比赛的总场数。击杀数量的平均值,击杀数的方差。接下来需要构造训练数据,将match和player_ratings通过account id 进行链接:

library(tidyverse)
# Radiant and true if dire
id <- players %>% select(match_id,account_id,player_slot) %>% distinct()
match1 <- match %>% left_join(id,"match_id")

match1 <- match1 %>% dplyr::mutate(side = case_when(player_slot %in% c(0,1,2,3,4) ~ "Radiant",player_slot %in% c(128,129,130,131,132)~"Dire"))

pre_traindata <- match1 %>% left_join(player_ratings,by = "account_id")

每一个比较包含5个选手的数据,因此需要将五个选手的数据汇总起来

pre_traindata_g <-  pre_traindata %>% group_by(match_id,side,radiant_win) %>% summarise(win_rate = mean(total_wins)/mean(total_matches),kill = mean(trueskill_mu),sigma_win = sd(total_wins),sigma_matches = sd(total_matches),mu_win = mean(total_wins),mu_matches = mean(total_matches))

head(pre_traindata_g,3)
## # A tibble: 3 x 9
## # Groups:   match_id, side [3]
##   match_id side  radiant_win win_rate  kill sigma_win sigma_matches mu_win
##      <dbl> <chr> <lgl>          <dbl> <dbl>     <dbl>         <dbl>  <dbl>
## 1        0 Dire  TRUE           0.485  28.7   880939.      1815713. 6.43e5
## 2        0 Radi… TRUE           0.485  24.8   880951.      1815730. 6.43e5
## 3        1 Dire  FALSE          0.485  26.9   880935.      1815708. 9.65e5
## # … with 1 more variable: mu_matches <dbl>
Ra <- pre_traindata_g %>% filter(side =="Radiant")
Di <- pre_traindata_g %>% filter(side =="Dire")

names(Ra)[4:9] <- paste(names(Ra)[4:9],"Radiant",sep = ".")
names(Di)[4:9] <- paste(names(Di)[4:9],"Dire",sep = ".")

fdata <- cbind(Ra,Di) %>% select(-match_id1,-side1,-radiant_win1,-side)

fdata$radiant_win <- as.factor(as.numeric(fdata$radiant_win))

head(fdata,3)
## # A tibble: 3 x 15
## # Groups:   match_id, side [3]
##   side  match_id radiant_win win_rate.Radiant kill.Radiant sigma_win.Radia…
##   <chr>    <dbl> <fct>                  <dbl>        <dbl>            <dbl>
## 1 Radi…        0 1                      0.485         24.8          880951.
## 2 Radi…        1 0                      0.485         24.9          880948.
## 3 Radi…        2 0                     NA             NA                NA 
## # … with 9 more variables: sigma_matches.Radiant <dbl>,
## #   mu_win.Radiant <dbl>, mu_matches.Radiant <dbl>, win_rate.Dire <dbl>,
## #   kill.Dire <dbl>, sigma_win.Dire <dbl>, sigma_matches.Dire <dbl>,
## #   mu_win.Dire <dbl>, mu_matches.Dire <dbl>

这里就构建好,赛前预测模型的数据,数据包含选手之前胜率的信息,胜利的场数,汇总的数据是每一个选手数据的平均。还可以添加其他的信息,比如说,选手之前的击杀数,金钱数,等等。这个模型本质上是而分类模型。

library(scorecard)


dt_f = var_filter(fdata[,-c(1,2)], y="radiant_win",iv_limit = 0.1) # 计算IV筛选变量
## [INFO] filtering variables ...
names(dt_f) # 筛选出这9个特征
## [1] "win_rate.Radiant"   "sigma_win.Radiant"  "mu_win.Radiant"    
## [4] "mu_matches.Radiant" "win_rate.Dire"      "sigma_win.Dire"    
## [7] "mu_win.Dire"        "mu_matches.Dire"    "radiant_win"
# 划分数据集合
dt_list = split_df(dt_f, y="radiant_win", ratio = 0.6, seed = 30)
label_list = lapply(dt_list, function(x) x$radiant_win)
head(dt_list) # 训练集合占比0.6
## $train
##        win_rate.Radiant sigma_win.Radiant mu_win.Radiant
##     1:        0.4851776       880951.4843       643364.0
##     2:        0.4851777       880948.3805       643367.4
##     3:        0.4851794       719293.0923       321687.4
##     4:        0.4851818       880935.4181       643381.6
##     5:        0.6271186           28.0749           22.2
##    ---                                                  
## 29970:               NA                NA             NA
## 29971:               NA                NA             NA
## 29972:               NA                NA             NA
## 29973:        0.4851768       719293.7631       321686.2
## 29974:        0.4851799       880935.6007       965053.6
##        mu_matches.Radiant win_rate.Dire sigma_win.Dire mu_win.Dire
##     1:          1326038.0     0.4851814       880938.5    643378.2
##     2:          1326045.0     0.4851796       880935.3    965053.8
##     3:           663027.8     0.4851834       719288.5    321695.6
##     4:          1326062.8     0.4851782       880950.9    965042.4
##     5:               35.4     0.4851770       719293.1    321687.4
##    ---                                                            
## 29970:                 NA            NA             NA          NA
## 29971:                 NA            NA             NA          NA
## 29972:                 NA     0.4851771       880951.2    965042.2
## 29973:           663028.8     0.4851781       880945.5    643370.6
## 29974:          1989063.4            NA             NA          NA
##        mu_matches.Dire radiant_win
##     1:       1326057.0           1
##     2:       1989065.0           0
##     3:        663039.2           1
##     4:       1989047.2           1
##     5:        663031.0           1
##    ---                            
## 29970:              NA           0
## 29971:              NA           1
## 29972:       1989051.4           1
## 29973:       1326050.4           1
## 29974:              NA           0
## 
## $test
##        win_rate.Radiant sigma_win.Radiant mu_win.Radiant
##     1:               NA                NA             NA
##     2:               NA                NA             NA
##     3:        0.4851775      880955.04446       965039.4
##     4:        0.4851776      719296.55815      1286718.8
##     5:        0.5205993          47.48473           83.4
##    ---                                                  
## 20022:        0.4851777      880953.85773       643361.4
## 20023:        0.4851777      880951.11913       643364.4
## 20024:        0.4851762      719296.55815       321681.2
## 20025:        0.4851775           0.00000      1608398.0
## 20026:               NA                NA             NA
##        mu_matches.Radiant win_rate.Dire sigma_win.Dire mu_win.Dire
##     1:                 NA     0.4851775        0.00000   1608398.0
##     2:                 NA            NA             NA          NA
##     3:          1989044.0            NA             NA          NA
##     4:          2652057.2     0.4851763   719289.06741    321694.6
##     5:              160.2     0.4851811   880913.32756    643405.8
##    ---                                                            
## 20022:          1326032.6     0.4851782   880947.92408    965044.6
## 20023:          1326038.8     0.4851759   719291.97428    321689.4
## 20024:           663019.4     0.4548872       25.57733        24.2
## 20025:          3315071.0            NA             NA          NA
## 20026:                 NA            NA             NA          NA
##        mu_matches.Dire radiant_win
##     1:       3315071.0           0
##     2:              NA           0
##     3:              NA           0
##     4:        663046.8           0
##     5:       1326114.6           0
##    ---                            
## 20022:       1989051.8           0
## 20023:        663036.6           1
## 20024:            53.2           0
## 20025:              NA           0
## 20026:              NA           1

进行WOE binning

bins = woebin(dt_f, y="radiant_win")
## [INFO] creating woe binning ...

将数据转变成为WOE形式

dt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins))
## [INFO] converting into woe values ... 
## [INFO] converting into woe values ...

训练模型

dt_woe_list$train$radiant_win <- as.factor(dt_woe_list$train$radiant_win)

dt_woe_list$test$radiant_win <- as.factor(dt_woe_list$test$radiant_win)


m1 = glm(radiant_win~ ., family = binomial(), data = dt_woe_list$train)
m_step = step(m1, direction="both", trace = FALSE)
m2 = eval(m_step$call)
summary(m_step)
## 
## Call:
## glm(formula = radiant_win ~ win_rate.Radiant_woe + mu_win.Radiant_woe + 
##     win_rate.Dire_woe + mu_win.Dire_woe, family = binomial(), 
##     data = dt_woe_list$train)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.369  -1.198   1.031   1.141   1.265  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           0.07533    0.01159   6.501 7.96e-11 ***
## win_rate.Radiant_woe  1.00204    0.16679   6.008 1.88e-09 ***
## mu_win.Radiant_woe    0.35619    0.22625   1.574 0.115412    
## win_rate.Dire_woe     0.69011    0.20575   3.354 0.000796 ***
## mu_win.Dire_woe       0.68874    0.20686   3.329 0.000870 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 41510  on 29973  degrees of freedom
## Residual deviance: 41371  on 29969  degrees of freedom
## AIC: 41381
## 
## Number of Fisher Scoring iterations: 3

模型评估

pred_list = lapply(dt_woe_list, function(x) predict(m2, x, type='response'))
## performance

label_list$train <- (as.numeric(label_list$train))
label_list$test <- (as.numeric(label_list$test))
pred_list$test<- as.numeric(pred_list$test)

perf = scorecard::perf_eva(pred = pred_list$train, label = label_list$train,show_plot =  c('ks', 'lift', 'gain', 'roc', 'lz', 'pr', 'f1', 'density'),confusion_matrix = F)

赛中模型

赛中模型,赛中模型则可以加入实时的赛况数据,比如前 5分钟,前十分钟,钱半个小时的,金钱,补刀,击杀等情况。接下来构建赛前30分钟的预测模型:

player30 <- player_time %>% filter(times == 30*60)

player30 %>% head(3)
## # A tibble: 3 x 32
##   match_id times gold_t_0 lh_t_0 xp_t_0 gold_t_1 lh_t_1 xp_t_1 gold_t_2
##      <dbl> <dbl>    <dbl>  <dbl>  <dbl>    <dbl>  <dbl>  <dbl>    <dbl>
## 1        0  1800     8124     20   7747    12590     83  17244     9064
## 2        1  1800     7519     35   7593    17176    227  24824     9017
## 3        2  1800     7286     49   8222     9374    104  11227    10976
## # … with 23 more variables: lh_t_2 <dbl>, xp_t_2 <dbl>, gold_t_3 <dbl>,
## #   lh_t_3 <dbl>, xp_t_3 <dbl>, gold_t_4 <dbl>, lh_t_4 <dbl>,
## #   xp_t_4 <dbl>, gold_t_128 <dbl>, lh_t_128 <dbl>, xp_t_128 <dbl>,
## #   gold_t_129 <dbl>, lh_t_129 <dbl>, xp_t_129 <dbl>, gold_t_130 <dbl>,
## #   lh_t_130 <dbl>, xp_t_130 <dbl>, gold_t_131 <dbl>, lh_t_131 <dbl>,
## #   xp_t_131 <dbl>, gold_t_132 <dbl>, lh_t_132 <dbl>, xp_t_132 <dbl>

这个数据显示了三十分钟的时候,比赛中每一个位置的金钱,经验等信息。

fdata1 <- fdata %>% left_join(player30,by = "match_id") %>% select(-times)
head(fdata1,3)
## # A tibble: 3 x 45
## # Groups:   match_id, side [3]
##   side  match_id radiant_win win_rate.Radiant kill.Radiant sigma_win.Radia…
##   <chr>    <dbl> <fct>                  <dbl>        <dbl>            <dbl>
## 1 Radi…        0 1                      0.485         24.8          880951.
## 2 Radi…        1 0                      0.485         24.9          880948.
## 3 Radi…        2 0                     NA             NA                NA 
## # … with 39 more variables: sigma_matches.Radiant <dbl>,
## #   mu_win.Radiant <dbl>, mu_matches.Radiant <dbl>, win_rate.Dire <dbl>,
## #   kill.Dire <dbl>, sigma_win.Dire <dbl>, sigma_matches.Dire <dbl>,
## #   mu_win.Dire <dbl>, mu_matches.Dire <dbl>, gold_t_0 <dbl>,
## #   lh_t_0 <dbl>, xp_t_0 <dbl>, gold_t_1 <dbl>, lh_t_1 <dbl>,
## #   xp_t_1 <dbl>, gold_t_2 <dbl>, lh_t_2 <dbl>, xp_t_2 <dbl>,
## #   gold_t_3 <dbl>, lh_t_3 <dbl>, xp_t_3 <dbl>, gold_t_4 <dbl>,
## #   lh_t_4 <dbl>, xp_t_4 <dbl>, gold_t_128 <dbl>, lh_t_128 <dbl>,
## #   xp_t_128 <dbl>, gold_t_129 <dbl>, lh_t_129 <dbl>, xp_t_129 <dbl>,
## #   gold_t_130 <dbl>, lh_t_130 <dbl>, xp_t_130 <dbl>, gold_t_131 <dbl>,
## #   lh_t_131 <dbl>, xp_t_131 <dbl>, gold_t_132 <dbl>, lh_t_132 <dbl>,
## #   xp_t_132 <dbl>

开始构建模型

library(scorecard)


dt_f = var_filter(fdata1[,-c(1,2)], y="radiant_win",iv_limit = 0.1) # 计算IV筛选变量
## [INFO] filtering variables ...
names(dt_f) # 筛选出这9个特征
##  [1] "win_rate.Radiant"   "sigma_win.Radiant"  "mu_win.Radiant"    
##  [4] "mu_matches.Radiant" "win_rate.Dire"      "sigma_win.Dire"    
##  [7] "mu_win.Dire"        "mu_matches.Dire"    "gold_t_0"          
## [10] "xp_t_0"             "gold_t_1"           "xp_t_1"            
## [13] "gold_t_2"           "xp_t_2"             "gold_t_3"          
## [16] "xp_t_3"             "gold_t_4"           "xp_t_4"            
## [19] "gold_t_128"         "xp_t_128"           "gold_t_129"        
## [22] "xp_t_129"           "gold_t_130"         "xp_t_130"          
## [25] "gold_t_131"         "xp_t_131"           "gold_t_132"        
## [28] "xp_t_132"           "radiant_win"
# 划分数据集合
dt_list = split_df(dt_f, y="radiant_win", ratio = 0.6, seed = 30)
label_list = lapply(dt_list, function(x) x$radiant_win)
head(dt_list) # 训练集合占比0.6
## $train
##        win_rate.Radiant sigma_win.Radiant mu_win.Radiant
##     1:        0.4851776       880951.4843       643364.0
##     2:        0.4851777       880948.3805       643367.4
##     3:        0.4851794       719293.0923       321687.4
##     4:        0.4851818       880935.4181       643381.6
##     5:        0.6271186           28.0749           22.2
##    ---                                                  
## 29970:               NA                NA             NA
## 29971:               NA                NA             NA
## 29972:               NA                NA             NA
## 29973:        0.4851768       719293.7631       321686.2
## 29974:        0.4851799       880935.6007       965053.6
##        mu_matches.Radiant win_rate.Dire sigma_win.Dire mu_win.Dire
##     1:          1326038.0     0.4851814       880938.5    643378.2
##     2:          1326045.0     0.4851796       880935.3    965053.8
##     3:           663027.8     0.4851834       719288.5    321695.6
##     4:          1326062.8     0.4851782       880950.9    965042.4
##     5:               35.4     0.4851770       719293.1    321687.4
##    ---                                                            
## 29970:                 NA            NA             NA          NA
## 29971:                 NA            NA             NA          NA
## 29972:                 NA     0.4851771       880951.2    965042.2
## 29973:           663028.8     0.4851781       880945.5    643370.6
## 29974:          1989063.4            NA             NA          NA
##        mu_matches.Dire gold_t_0 xp_t_0 gold_t_1 xp_t_1 gold_t_2 xp_t_2
##     1:       1326057.0     8124   7747    12590  17244     9064  11478
##     2:       1989065.0     7519   7593    17176  24824     9017   8129
##     3:        663039.2    20551  20131    18565  19751    11222   9651
##     4:       1989047.2       NA     NA       NA     NA       NA     NA
##     5:        663031.0     7858   8399    19397  19934    13342  15405
##    ---                                                                
## 29970:              NA    12488  16070     8806   9627     8263   9550
## 29971:              NA    12125  10374     7474   6284    10093  11001
## 29972:       1989051.4       NA     NA       NA     NA       NA     NA
## 29973:       1326050.4    13364  14018     6564   8236    11859  11699
## 29974:              NA    11245  11605    10767  14275     6790   6944
##        gold_t_3 xp_t_3 gold_t_4 xp_t_4 gold_t_128 xp_t_128 gold_t_129
##     1:    14535  16479    15833  15888      11090    13848       9210
##     2:    12850  14219    11918  13471      14989    13547      11976
##     3:     9055   8581    17426  15411       8330     8077       7185
##     4:       NA     NA       NA     NA         NA       NA         NA
##     5:    15839  14346    10794  10530       7735     8184      13901
##    ---                                                               
## 29970:     9835  14612     8908   9961       9376    12471      11598
## 29971:     7326   7471     9797  13692      14743    20257       8075
## 29972:       NA     NA       NA     NA         NA       NA         NA
## 29973:    13900  15067     7425   7580       8366    10618       7753
## 29974:     8515  10581    10526  13073      12117    16075       8815
##        xp_t_129 gold_t_130 xp_t_130 gold_t_131 xp_t_131 gold_t_132
##     1:    10990      13087    15187       5495     6514      14136
##     2:    14782      10707    13468      12820    17403      12299
##     3:     6471       8407     9685      11963    12976       8957
##     4:       NA         NA       NA         NA       NA         NA
##     5:    14316       8872    10510       7883     8208      15383
##    ---                                                            
## 29970:    16819       8882    12561      11924    13702       9429
## 29971:     9082      13374    13533       8986    10879       6684
## 29972:       NA         NA       NA         NA       NA         NA
## 29973:     8933       9409    12390       9442     9115       7684
## 29974:    10582       9059    10227      18643    17514      12139
##        xp_t_132 radiant_win
##     1:     9975           1
##     2:    15694           0
##     3:     8549           1
##     4:       NA           1
##     5:    15275           1
##    ---                     
## 29970:    11602           0
## 29971:     6746           1
## 29972:       NA           1
## 29973:     9342           1
## 29974:    13786           0
## 
## $test
##        win_rate.Radiant sigma_win.Radiant mu_win.Radiant
##     1:               NA                NA             NA
##     2:               NA                NA             NA
##     3:        0.4851775      880955.04446       965039.4
##     4:        0.4851776      719296.55815      1286718.8
##     5:        0.5205993          47.48473           83.4
##    ---                                                  
## 20022:        0.4851777      880953.85773       643361.4
## 20023:        0.4851777      880951.11913       643364.4
## 20024:        0.4851762      719296.55815       321681.2
## 20025:        0.4851775           0.00000      1608398.0
## 20026:               NA                NA             NA
##        mu_matches.Radiant win_rate.Dire sigma_win.Dire mu_win.Dire
##     1:                 NA     0.4851775        0.00000   1608398.0
##     2:                 NA            NA             NA          NA
##     3:          1989044.0            NA             NA          NA
##     4:          2652057.2     0.4851763   719289.06741    321694.6
##     5:              160.2     0.4851811   880913.32756    643405.8
##    ---                                                            
## 20022:          1326032.6     0.4851782   880947.92408    965044.6
## 20023:          1326038.8     0.4851759   719291.97428    321689.4
## 20024:           663019.4     0.4548872       25.57733        24.2
## 20025:          3315071.0            NA             NA          NA
## 20026:                 NA            NA             NA          NA
##        mu_matches.Dire gold_t_0 xp_t_0 gold_t_1 xp_t_1 gold_t_2 xp_t_2
##     1:       3315071.0     7286   8222     9374  11227    10976  13636
##     2:              NA     6931   8209    11483  11666    11242  14862
##     3:              NA    12679  11277    10537  12191     7902   9006
##     4:        663046.8    14239  16473     7246   8144     4963   6175
##     5:       1326114.6    13168  15182    17086  17673     7032   7564
##    ---                                                                
## 20022:       1989051.8    13465  10184    11088  12908     9390  10882
## 20023:        663036.6    15118  14335    13482  16385    13179  14259
## 20024:            53.2     7175   7566    11746  14862     8678   8135
## 20025:              NA    14300  17215     9079  12120     6382   7930
## 20026:              NA    11200  13284     8385   8216     9780  10511
##        gold_t_3 xp_t_3 gold_t_4 xp_t_4 gold_t_128 xp_t_128 gold_t_129
##     1:     8974   9184     5446   6756      12231    14241       8098
##     2:     6827   8220    11178  12975      10588    12852      11219
##     3:     9718   9415     9712  12841      12773    14062      16759
##     4:    12953  15429     7112   6398      10998    12428      14088
##     5:     7683   8181    11233  10838       8056     6693      13003
##    ---                                                               
## 20022:     8378   9727     7350   8444       6476     7621       5905
## 20023:     8890   9960    20868  22568       7637     8179       9035
## 20024:    13091  14267    11021  13690       8603    10708      14448
## 20025:    13624  16368    11043  13145      14264    13629       8782
## 20026:    12627  14918    12044  14374      10135    10899      11814
##        xp_t_129 gold_t_130 xp_t_130 gold_t_131 xp_t_131 gold_t_132
##     1:     9854      10258    12905       8637     9214       6435
##     2:    14497       7531     8880       6620     7089      12975
##     3:    19796      16551    20137      16094    16415      11569
##     4:    15303      12901    15901       9454    11388      10211
##     5:    13724      16206    21295       8683     7994       7253
##    ---                                                            
## 20022:     9206      10440    12521      11113    12231       6324
## 20023:     9259       6168     4835      10228     9673      10149
## 20024:    14723      10187    12028      11365     9187      19378
## 20025:    11315       7719     9194       7437     8844      16087
## 20026:    15385      10222     9986      10919    12756       9690
##        xp_t_132 radiant_win
##     1:     7410           0
##     2:    15194           0
##     3:    11799           0
##     4:    10461           0
##     5:     9097           0
##    ---                     
## 20022:     8229           0
## 20023:    11321           1
## 20024:    21755           0
## 20025:    18117           0
## 20026:    12439           1

进行WOE binning

bins = woebin(dt_f, y="radiant_win")
## [INFO] creating woe binning ...

将数据转变成为WOE形式

dt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins))
## [INFO] converting into woe values ... 
## [INFO] converting into woe values ...

训练模型

dt_woe_list$train$radiant_win <- as.factor(dt_woe_list$train$radiant_win)

dt_woe_list$test$radiant_win <- as.factor(dt_woe_list$test$radiant_win)


m1 = glm(radiant_win~ ., family = binomial(), data = dt_woe_list$train)
m_step = step(m1, direction="both", trace = FALSE)
m2 = eval(m_step$call)
summary(m_step)
## 
## Call:
## glm(formula = radiant_win ~ win_rate.Radiant_woe + mu_win.Radiant_woe + 
##     win_rate.Dire_woe + mu_win.Dire_woe + gold_t_0_woe + xp_t_0_woe + 
##     gold_t_1_woe + xp_t_1_woe + gold_t_2_woe + xp_t_2_woe + gold_t_3_woe + 
##     xp_t_3_woe + gold_t_4_woe + xp_t_4_woe + gold_t_128_woe + 
##     xp_t_128_woe + gold_t_129_woe + xp_t_129_woe + gold_t_130_woe + 
##     xp_t_130_woe + gold_t_131_woe + xp_t_131_woe + gold_t_132_woe + 
##     xp_t_132_woe, family = binomial(), data = dt_woe_list$train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6187  -0.8256   0.2682   0.8655   2.7140  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           0.13287    0.01425   9.327  < 2e-16 ***
## win_rate.Radiant_woe  0.52700    0.20305   2.595 0.009446 ** 
## mu_win.Radiant_woe   -0.45375    0.27323  -1.661 0.096777 .  
## win_rate.Dire_woe     0.98673    0.24986   3.949 7.85e-05 ***
## mu_win.Dire_woe       0.58683    0.25015   2.346 0.018982 *  
## gold_t_0_woe          0.51334    0.04706  10.908  < 2e-16 ***
## xp_t_0_woe            0.19572    0.05296   3.696 0.000219 ***
## gold_t_1_woe          0.55667    0.04899  11.364  < 2e-16 ***
## xp_t_1_woe            0.19131    0.05529   3.460 0.000540 ***
## gold_t_2_woe          0.58219    0.05052  11.525  < 2e-16 ***
## xp_t_2_woe            0.14386    0.05674   2.535 0.011236 *  
## gold_t_3_woe          0.45118    0.04630   9.744  < 2e-16 ***
## xp_t_3_woe            0.24236    0.05215   4.647 3.36e-06 ***
## gold_t_4_woe          0.64393    0.04766  13.512  < 2e-16 ***
## xp_t_4_woe            0.09584    0.05439   1.762 0.078037 .  
## gold_t_128_woe        0.57828    0.04617  12.526  < 2e-16 ***
## xp_t_128_woe          0.16309    0.05060   3.223 0.001268 ** 
## gold_t_129_woe        0.51772    0.04984  10.387  < 2e-16 ***
## xp_t_129_woe          0.23303    0.05293   4.403 1.07e-05 ***
## gold_t_130_woe        0.45566    0.04905   9.289  < 2e-16 ***
## xp_t_130_woe          0.30801    0.05168   5.960 2.52e-09 ***
## gold_t_131_woe        0.45024    0.04701   9.577  < 2e-16 ***
## xp_t_131_woe          0.25150    0.05073   4.958 7.14e-07 ***
## gold_t_132_woe        0.57351    0.04877  11.759  < 2e-16 ***
## xp_t_132_woe          0.10997    0.05335   2.061 0.039286 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 41510  on 29973  degrees of freedom
## Residual deviance: 30464  on 29949  degrees of freedom
## AIC: 30514
## 
## Number of Fisher Scoring iterations: 5

模型评估

pred_list = lapply(dt_woe_list, function(x) predict(m2, x, type='response'))
## performance

label_list$train <- (as.numeric(label_list$train))
label_list$test <- (as.numeric(label_list$test))
pred_list$test<- as.numeric(pred_list$test)

perf = scorecard::perf_eva(pred = pred_list$train, label = label_list$train,show_plot =  c('ks', 'lift', 'gain', 'roc', 'lz', 'pr', 'f1', 'density'),confusion_matrix = T)
## [INFO] The threshold of confusion matrix is 0.3917.

可以发现,赛中模型比赛前模型有更高的准确度,其中KS达到了0.55,AUC等于0.84.所以基本上,比赛半个小时,通过数据可以预测出比赛的结果了。然后,这里将时间提前,通过10分钟的数据,来来预测比赛:

10min 模型

player10 <- player_time %>% filter(times == 10*60)

player10 %>% head(3)
## # A tibble: 3 x 32
##   match_id times gold_t_0 lh_t_0 xp_t_0 gold_t_1 lh_t_1 xp_t_1 gold_t_2
##      <dbl> <dbl>    <dbl>  <dbl>  <dbl>    <dbl>  <dbl>  <dbl>    <dbl>
## 1        0   600     2211      3   1532     3379     39   3903     1650
## 2        1   600     1560      4   1393     3749     57   4065     2453
## 3        2   600     2561     17   2460     2380     23   3033     2869
## # … with 23 more variables: lh_t_2 <dbl>, xp_t_2 <dbl>, gold_t_3 <dbl>,
## #   lh_t_3 <dbl>, xp_t_3 <dbl>, gold_t_4 <dbl>, lh_t_4 <dbl>,
## #   xp_t_4 <dbl>, gold_t_128 <dbl>, lh_t_128 <dbl>, xp_t_128 <dbl>,
## #   gold_t_129 <dbl>, lh_t_129 <dbl>, xp_t_129 <dbl>, gold_t_130 <dbl>,
## #   lh_t_130 <dbl>, xp_t_130 <dbl>, gold_t_131 <dbl>, lh_t_131 <dbl>,
## #   xp_t_131 <dbl>, gold_t_132 <dbl>, lh_t_132 <dbl>, xp_t_132 <dbl>

这个数据显示了三十分钟的时候,比赛中每一个位置的金钱,经验等信息。

fdata1 <- fdata %>% left_join(player10,by = "match_id") %>% select(-times)
head(fdata1,3)
## # A tibble: 3 x 45
## # Groups:   match_id, side [3]
##   side  match_id radiant_win win_rate.Radiant kill.Radiant sigma_win.Radia…
##   <chr>    <dbl> <fct>                  <dbl>        <dbl>            <dbl>
## 1 Radi…        0 1                      0.485         24.8          880951.
## 2 Radi…        1 0                      0.485         24.9          880948.
## 3 Radi…        2 0                     NA             NA                NA 
## # … with 39 more variables: sigma_matches.Radiant <dbl>,
## #   mu_win.Radiant <dbl>, mu_matches.Radiant <dbl>, win_rate.Dire <dbl>,
## #   kill.Dire <dbl>, sigma_win.Dire <dbl>, sigma_matches.Dire <dbl>,
## #   mu_win.Dire <dbl>, mu_matches.Dire <dbl>, gold_t_0 <dbl>,
## #   lh_t_0 <dbl>, xp_t_0 <dbl>, gold_t_1 <dbl>, lh_t_1 <dbl>,
## #   xp_t_1 <dbl>, gold_t_2 <dbl>, lh_t_2 <dbl>, xp_t_2 <dbl>,
## #   gold_t_3 <dbl>, lh_t_3 <dbl>, xp_t_3 <dbl>, gold_t_4 <dbl>,
## #   lh_t_4 <dbl>, xp_t_4 <dbl>, gold_t_128 <dbl>, lh_t_128 <dbl>,
## #   xp_t_128 <dbl>, gold_t_129 <dbl>, lh_t_129 <dbl>, xp_t_129 <dbl>,
## #   gold_t_130 <dbl>, lh_t_130 <dbl>, xp_t_130 <dbl>, gold_t_131 <dbl>,
## #   lh_t_131 <dbl>, xp_t_131 <dbl>, gold_t_132 <dbl>, lh_t_132 <dbl>,
## #   xp_t_132 <dbl>

开始构建模型

library(scorecard)


dt_f = var_filter(fdata1[,-c(1,2)], y="radiant_win",iv_limit = 0.1) # 计算IV筛选变量
## [INFO] filtering variables ...
names(dt_f) # 筛选出这9个特征
##  [1] "win_rate.Radiant"   "sigma_win.Radiant"  "mu_win.Radiant"    
##  [4] "mu_matches.Radiant" "win_rate.Dire"      "sigma_win.Dire"    
##  [7] "mu_win.Dire"        "mu_matches.Dire"    "gold_t_0"          
## [10] "xp_t_0"             "gold_t_1"           "xp_t_1"            
## [13] "gold_t_2"           "xp_t_2"             "gold_t_3"          
## [16] "xp_t_3"             "gold_t_4"           "xp_t_4"            
## [19] "gold_t_128"         "xp_t_128"           "gold_t_129"        
## [22] "xp_t_129"           "gold_t_130"         "xp_t_130"          
## [25] "gold_t_131"         "xp_t_131"           "gold_t_132"        
## [28] "xp_t_132"           "radiant_win"
# 划分数据集合
dt_list = split_df(dt_f, y="radiant_win", ratio = 0.6, seed = 30)
label_list = lapply(dt_list, function(x) x$radiant_win)
head(dt_list) # 训练集合占比0.6
## $train
##        win_rate.Radiant sigma_win.Radiant mu_win.Radiant
##     1:        0.4851776       880951.4843       643364.0
##     2:        0.4851777       880948.3805       643367.4
##     3:        0.4851794       719293.0923       321687.4
##     4:        0.4851818       880935.4181       643381.6
##     5:        0.6271186           28.0749           22.2
##    ---                                                  
## 29970:               NA                NA             NA
## 29971:               NA                NA             NA
## 29972:               NA                NA             NA
## 29973:        0.4851768       719293.7631       321686.2
## 29974:        0.4851799       880935.6007       965053.6
##        mu_matches.Radiant win_rate.Dire sigma_win.Dire mu_win.Dire
##     1:          1326038.0     0.4851814       880938.5    643378.2
##     2:          1326045.0     0.4851796       880935.3    965053.8
##     3:           663027.8     0.4851834       719288.5    321695.6
##     4:          1326062.8     0.4851782       880950.9    965042.4
##     5:               35.4     0.4851770       719293.1    321687.4
##    ---                                                            
## 29970:                 NA            NA             NA          NA
## 29971:                 NA            NA             NA          NA
## 29972:                 NA     0.4851771       880951.2    965042.2
## 29973:           663028.8     0.4851781       880945.5    643370.6
## 29974:          1989063.4            NA             NA          NA
##        mu_matches.Dire gold_t_0 xp_t_0 gold_t_1 xp_t_1 gold_t_2 xp_t_2
##     1:       1326057.0     2211   1532     3379   3903     1650   1450
##     2:       1989065.0     1560   1393     3749   4065     2453   1774
##     3:        663039.2     4108   3802     4735   4778     2339   2365
##     4:       1989047.2     2590   2954     1328   1202     3517   3128
##     5:        663031.0     1590   1529     3932   3520     2243   2780
##    ---                                                                
## 29970:              NA     2688   3308     1445   1604     2411   2646
## 29971:              NA     2568   1722     1878   1321     2548   3211
## 29972:       1989051.4     3448   3955     4082   4771     3079   2916
## 29973:       1326050.4     3285   4123     1197   1799     2901   2262
## 29974:              NA     2788   2828     1994   2341     1590   1980
##        gold_t_3 xp_t_3 gold_t_4 xp_t_4 gold_t_128 xp_t_128 gold_t_129
##     1:     2859   4017     3745   3464       2623     3395       2573
##     2:     2811   2207     3748   4364       5015     4095       3286
##     3:     1525   1723     2871   2534       1410     1685       1826
##     4:     4084   4095     1930   1698       3024     3016       1696
##     5:     4298   4148     2277   2035       2025     1525       3531
##    ---                                                               
## 29970:     2189   2507     3041   3662       2004     2889       1995
## 29971:     1651   1611     2790   3609       3465     5118       1676
## 29972:     2522   2047     2806   2476       4079     4017       3718
## 29973:     2986   3175     1446   1755       2411     4103       1468
## 29974:     2649   3952     2976   3330       3375     3748       2206
##        xp_t_129 gold_t_130 xp_t_130 gold_t_131 xp_t_131 gold_t_132
##     1:     3295       3853     4396       1058      315       4164
##     2:     3182       1741     2035       2869     3085       2991
##     3:     1752       2990     2778       3428     3108       1781
##     4:     1718       2440     2247       1401     1461       3398
##     5:     3440       2654     2578       2900     2325       3900
##    ---                                                            
## 29970:     2580       2163     2098       2918     3697       1616
## 29971:     1580       3320     2547       2003     3170       1183
## 29972:     2828       1511     1447       3074     3318       2207
## 29973:     1513       2177     3310       2408     2145       1331
## 29974:     2444       2722     3020       4609     4428       2669
##        xp_t_132 radiant_win
##     1:     2124           1
##     2:     3320           0
##     3:     1783           1
##     4:     3935           1
##     5:     3068           1
##    ---                     
## 29970:     2372           0
## 29971:     1324           1
## 29972:     1765           1
## 29973:     1801           1
## 29974:     2740           0
## 
## $test
##        win_rate.Radiant sigma_win.Radiant mu_win.Radiant
##     1:               NA                NA             NA
##     2:               NA                NA             NA
##     3:        0.4851775      880955.04446       965039.4
##     4:        0.4851776      719296.55815      1286718.8
##     5:        0.5205993          47.48473           83.4
##    ---                                                  
## 20022:        0.4851777      880953.85773       643361.4
## 20023:        0.4851777      880951.11913       643364.4
## 20024:        0.4851762      719296.55815       321681.2
## 20025:        0.4851775           0.00000      1608398.0
## 20026:               NA                NA             NA
##        mu_matches.Radiant win_rate.Dire sigma_win.Dire mu_win.Dire
##     1:                 NA     0.4851775        0.00000   1608398.0
##     2:                 NA            NA             NA          NA
##     3:          1989044.0            NA             NA          NA
##     4:          2652057.2     0.4851763   719289.06741    321694.6
##     5:              160.2     0.4851811   880913.32756    643405.8
##    ---                                                            
## 20022:          1326032.6     0.4851782   880947.92408    965044.6
## 20023:          1326038.8     0.4851759   719291.97428    321689.4
## 20024:           663019.4     0.4548872       25.57733        24.2
## 20025:          3315071.0            NA             NA          NA
## 20026:                 NA            NA             NA          NA
##        mu_matches.Dire gold_t_0 xp_t_0 gold_t_1 xp_t_1 gold_t_2 xp_t_2
##     1:       3315071.0     2561   2460     2380   3033     2869   3230
##     2:              NA     1745   2100     2780   1935     1741   2781
##     3:              NA     3511   2872     2518   2231     1898   2097
##     4:        663046.8     3164   3540     1306   1343     1164   1222
##     5:       1326114.6     3427   4228     4708   4695     1735   1777
##    ---                                                                
## 20022:       1989051.8     2625   1977     2713   2789     2139   2955
## 20023:        663036.6     2850   2882     2986   4203     3133   3196
## 20024:            53.2     1448   1353     2548   3354     1768   1299
## 20025:              NA     3768   3699     1994   1722     1824   2227
## 20026:              NA     3777   3684     2272   2702     2848   2985
##        gold_t_3 xp_t_3 gold_t_4 xp_t_4 gold_t_128 xp_t_128 gold_t_129
##     1:     2033   2172     1044   1560       3448     3088       1992
##     2:     1839   1848     2689   3721       1619     1539       2820
##     3:     2940   3690     1260   1870       1683     1706       2286
##     4:     2680   3186     2468   2119       2975     3559       3245
##     5:     1331   1412     3757   2321       2190     1688       3454
##    ---                                                               
## 20022:     2011   2552     1625   2011       1924     2033       1111
## 20023:     1350   1469     3616   4510       2657     2264       2841
## 20024:     2949   2812     2715   3532       1530     1884       3637
## 20025:     2466   3298     2259   2578       3195     4067       2358
## 20026:     3714   3958     2957   2884       2228     2099       2624
##        xp_t_129 gold_t_130 xp_t_130 gold_t_131 xp_t_131 gold_t_132
##     1:     2529       3559     4642       1974     1786       1120
##     2:     3590       1136     1947       1178     1367       4242
##     3:     2605       3383     4211       2287     2069       1462
##     4:     2556       3412     4338       2043     3242       2622
##     5:     3653       2617     3410       1299     1135       1734
##    ---                                                            
## 20022:     1633       1752     2315       2998     4043       1267
## 20023:     3323       2310     1461       3133     2653       3184
## 20024:     4117       2432     2888       2252     2204       3664
## 20025:     2898       2034     1769       1935     1854       3502
## 20026:     3309       2693     2371       2134     2082       2396
##        xp_t_132 radiant_win
##     1:     1524           0
##     2:     3676           0
##     3:     1975           0
##     4:     2425           0
##     5:     2216           0
##    ---                     
## 20022:     1950           0
## 20023:     3699           1
## 20024:     3572           0
## 20025:     3658           0
## 20026:     3335           1

进行WOE binning

bins = woebin(dt_f, y="radiant_win")
## [INFO] creating woe binning ...

将数据转变成为WOE形式

dt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins))
## [INFO] converting into woe values ... 
## [INFO] converting into woe values ...

训练模型

dt_woe_list$train$radiant_win <- as.factor(dt_woe_list$train$radiant_win)

dt_woe_list$test$radiant_win <- as.factor(dt_woe_list$test$radiant_win)


m1 = glm(radiant_win~ ., family = binomial(), data = dt_woe_list$train)
m_step = step(m1, direction="both", trace = FALSE)
m2 = eval(m_step$call)
summary(m_step)
## 
## Call:
## glm(formula = radiant_win ~ win_rate.Radiant_woe + win_rate.Dire_woe + 
##     gold_t_0_woe + xp_t_0_woe + gold_t_1_woe + xp_t_1_woe + gold_t_2_woe + 
##     xp_t_2_woe + gold_t_3_woe + xp_t_3_woe + gold_t_4_woe + xp_t_4_woe + 
##     gold_t_128_woe + xp_t_128_woe + gold_t_129_woe + xp_t_129_woe + 
##     gold_t_130_woe + xp_t_130_woe + gold_t_131_woe + xp_t_131_woe + 
##     gold_t_132_woe + xp_t_132_woe, family = binomial(), data = dt_woe_list$train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5634  -1.0815   0.5589   1.0456   2.5391  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           0.07416    0.01239   5.986 2.16e-09 ***
## win_rate.Radiant_woe  0.92490    0.15003   6.165 7.06e-10 ***
## win_rate.Dire_woe     0.80654    0.15385   5.242 1.59e-07 ***
## gold_t_0_woe          1.05940    0.07883  13.439  < 2e-16 ***
## xp_t_0_woe            0.53116    0.10441   5.087 3.63e-07 ***
## gold_t_1_woe          1.13579    0.08346  13.609  < 2e-16 ***
## xp_t_1_woe            0.55609    0.11313   4.915 8.86e-07 ***
## gold_t_2_woe          1.03457    0.08585  12.051  < 2e-16 ***
## xp_t_2_woe            0.67513    0.11087   6.089 1.13e-09 ***
## gold_t_3_woe          0.92653    0.08309  11.151  < 2e-16 ***
## xp_t_3_woe            0.64337    0.10425   6.172 6.76e-10 ***
## gold_t_4_woe          1.07672    0.08414  12.796  < 2e-16 ***
## xp_t_4_woe            0.50889    0.10364   4.910 9.10e-07 ***
## gold_t_128_woe        0.98453    0.07465  13.188  < 2e-16 ***
## xp_t_128_woe          0.65225    0.09427   6.919 4.56e-12 ***
## gold_t_129_woe        1.13322    0.08119  13.957  < 2e-16 ***
## xp_t_129_woe          0.44481    0.11304   3.935 8.32e-05 ***
## gold_t_130_woe        0.93795    0.08254  11.363  < 2e-16 ***
## xp_t_130_woe          0.72167    0.09541   7.564 3.91e-14 ***
## gold_t_131_woe        0.91588    0.08632  10.610  < 2e-16 ***
## xp_t_131_woe          0.68305    0.10549   6.475 9.49e-11 ***
## gold_t_132_woe        1.00633    0.08399  11.982  < 2e-16 ***
## xp_t_132_woe          0.61879    0.10536   5.873 4.27e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 41510  on 29973  degrees of freedom
## Residual deviance: 37430  on 29951  degrees of freedom
## AIC: 37476
## 
## Number of Fisher Scoring iterations: 4

模型评估

pred_list = lapply(dt_woe_list, function(x) predict(m2, x, type='response'))
## performance

label_list$train <- (as.numeric(label_list$train))
label_list$test <- (as.numeric(label_list$test))
pred_list$test<- as.numeric(pred_list$test)

perf = scorecard::perf_eva(pred = pred_list$train, label = label_list$train,show_plot =  c('ks', 'lift', 'gain', 'roc', 'lz', 'pr', 'f1', 'density'),confusion_matrix = T)
## [INFO] The threshold of confusion matrix is 0.3441.

赛中前10分钟的模型,预测效果要差一点,KS为0.30,AUC为0.7044

总结

  1. 列举模型的关键因子库(说明提取方法和原因)

有两类预测模型,1, 在比赛之前进行的预测,在这个时候能够使用到的数据有。可以使用的数据包括选手历史数据,比如选手的历史胜率,选手历史平均击杀数,历史平均获取金钱数,历史平均死亡数,历史平均助攻数等等,比赛之前的预测就只能使用一些历史数据 2. 就是在比赛进行的时候进行预测,这种预测需要选一个时间点,根据比赛时常的分布,游戏平均时长为41分钟,可以选择比赛前十分钟为一个时间点,利用十分钟时候的是一个比赛数据为特征,这些特征可以包括,各个位置的金钱数量,经验数量,击杀数,死亡数,助攻数,等等数据进行建模。

  1. 如何建模,并确定相关权重(说明方法)

有了数据之后需要判断特征与预测结果的关系,有些特征有预测能力,有些特征没有预测能力。有预测能力的特征包括金钱数量,经验数量。这个很容易理解,经验,金钱高的一方更有优势。其他还包括,击杀数量。筛选特征有一些方法,本文使用的指标是IV值。当然还有一些其他指标。或者可以使用机器学习的方法进行特征选择。

  1. 如何训练和验证(说明思路)

将历史比赛的数据划分成为两份,一份用于建立模型,一份用于测试模型

关键点:

  1. 数据不能用于直接建模,需要将原始数据转换成可以建模的数据,在这个过程中会进行大量的数据处理,数据转换,构造特征。
  2. 分析特征的重要性,筛选出重要的特征进行建模