データクレンジング
> jra <- read.csv("jra_race_result.csv")
> str(jra)
'data.frame': 198401 obs. of 30 variables:
$ 開催日 : Factor w/ 426 levels "2013-01-05","2013-01-06",..: 1 1 1 1 1 1 1 1 1 1 ...
$ 競馬場 : Factor w/ 10 levels "京都","阪神",..: 7 7 7 7 7 7 7 7 7 7 ...
$ レース番号 : int 1 1 1 1 1 1 1 1 1 1 ...
$ レース名 : Factor w/ 1130 levels "1932-1950sダービーメモリーズトキノミノルカップ",..: 152 152 152 152 152 152 152 152 152 152 ...
$ コース : Factor w/ 3 levels "ダート","芝",..: 1 1 1 1 1 1 1 1 1 1 ...
$ 周回 : Factor w/ 8 levels "右","右2周","左",..: 1 1 1 1 1 1 1 1 1 1 ...
$ 距離 : int 1200 1200 1200 1200 1200 1200 1200 1200 1200 1200 ...
$ 馬場状態 : Factor w/ 4 levels "重","不良","良",..: 3 3 3 3 3 3 3 3 3 3 ...
$ 賞金 : num 500 200 125 75 50 0 0 0 0 0 ...
$ 頭数 : int 16 16 16 16 16 16 16 16 16 16 ...
$ 着順 : int 1 2 3 4 5 6 7 8 9 10 ...
$ 枠番 : int 6 2 7 1 5 5 3 2 4 3 ...
$ 馬番 : int 12 4 13 1 10 9 6 3 7 5 ...
$ 馬名 : Factor w/ 24373 levels "アーカイブ","アークアーセナル",..: 22934 350 2327 14213 5678 1554 18783 14065 6986 9814 ...
$ 性別 : Factor w/ 3 levels "セ","牡","牝": 2 3 3 2 2 2 2 2 3 3 ...
$ 年齢 : int 3 3 3 3 3 3 3 3 3 3 ...
$ 騎手 : Factor w/ 257 levels "A.アッゼニ","A.シュタルケ",..: 75 168 74 61 140 135 240 88 136 90 ...
$ タイム : num 73.6 73.6 73.7 74 74.3 74.3 74.3 74.4 74.6 74.8 ...
$ 着差 : Factor w/ 22 levels "","1 1/2馬身",..: 1 20 20 4 4 20 19 11 3 3 ...
$ 通過順 : Factor w/ 8698 levels "","01-01","01-01-01",..: 2 233 2279 1201 6226 4370 233 7299 4182 6226 ...
$ 上り3F : num 39.4 39.3 39.1 39.5 38.9 39.4 40 38.8 39.7 39.4 ...
$ 斤量 : num 56 54 54 56 56 56 53 56 54 54 ...
$ 馬体重 : int 484 454 396 484 462 474 506 448 448 476 ...
$ 増減 : int 2 -2 0 6 2 8 -4 8 -4 -2 ...
$ 人気 : int 1 7 5 2 3 6 10 14 11 4 ...
$ オッズ : num 1.9 44.1 11.8 5.3 6.1 ...
$ ブリンカー : Factor w/ 2 levels "","B": 1 1 1 1 1 1 1 1 1 1 ...
$ 調教師 : Factor w/ 321 levels "C.チャン","C.ファウンズ",..: 120 204 57 27 133 64 256 77 221 282 ...
$ 調教コメント: Factor w/ 181 levels "nan","いま一息",..: 46 102 46 74 41 9 57 180 95 13 ...
$ 調教評価 : Factor w/ 5 levels "A","B","C","D",..: 2 3 2 2 2 3 3 3 3 3 ...
> # install.packages("lubridate")
> library(lubridate)
> jra$date <- ymd(jra$開催日)
> # install.packages("BH")
> # install.packages("dplyr")
> library(dplyr)
> n.t.2500 <- filter(jra,競馬場=="中山") %>% filter(コース=="芝") %>% filter(距離=="2500")
>
> n.t.2500$isIn2nd <- 0
> n.t.2500$isIn2nd[which(n.t.2500$着順 %in% 1:2)] <- 1
>
> n <- nrow(n.t.2500)
>
> UmaNyusyoRatio <- c()
> KisyuNyusyoRatio <- c()
> for(i in 1:n){
+ uma <- n.t.2500$馬名[i]
+ kishu <- n.t.2500$騎手[i]
+ time <- n.t.2500$date[i]
+ d1 <- filter(jra,馬名==uma)
+ p <- nrow(d1)
+ if(length(which((time - d1$date)>0))>0){
+ d2 <- d1[which((time - d1$date)>0),]
+ UmaNyusyoRatio[i] <- sum(d2$着順 %in% 1:2)/p
+ }
+ d3 <- filter(jra,騎手==kishu)
+ p <- nrow(d3)
+ if(length(which((time - d3$date)>0))>0){
+ d4 <- d3[which((time - d3$date)>0),]
+ KisyuNyusyoRatio[i] <- sum(d4$着順 %in% 1:2)/p
+ }
+ }
ロジスティック回帰
> dat <- data.frame(isIn2nd=n.t.2500$isIn2nd, UmaNyusyoRatio,KisyuNyusyoRatio)
> glm(isIn2nd ~ UmaNyusyoRatio + KisyuNyusyoRatio, family = binomial,data=dat)
Call: glm(formula = isIn2nd ~ UmaNyusyoRatio + KisyuNyusyoRatio, family = binomial,
data = dat)
Coefficients:
(Intercept) UmaNyusyoRatio KisyuNyusyoRatio
-2.421 2.054 3.999
Degrees of Freedom: 408 Total (i.e. Null); 406 Residual
(20 observations deleted due to missingness)
Null Deviance: 344.6
Residual Deviance: 334.5 AIC: 340.5
有馬記念で