交流戦の打撃成績をまとめる

データで楽しむプロ野球 http://baseballdata.jp/mm/ctop.html から, スクレイプして交流戦の打撃成績をまとめます.

XMLパッケージを利用します.

library(XML)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
##  以下のオブジェクトはマスクされています (from 'package:stats') : 
## 
##      filter, lag 
## 
##  以下のオブジェクトはマスクされています (from 'package:base') : 
## 
##      intersect, setdiff, setequal, union
library(data.table)
## 
## Attaching package: 'data.table'
## 
##  以下のオブジェクトはマスクされています (from 'package:dplyr') : 
## 
##      last
dat = readHTMLTable('http://baseballdata.jp/mm/ctop.html')

## 中身を見ます
str(dat)
## List of 1
##  $ NULL:'data.frame':    60 obs. of  40 variables:
##   ..$ V1 : Factor w/ 54 levels "1","10","11",..: 54 1 8 19 19 40 50 51 52 53 ...
##   ..$ V2 : Factor w/ 13 levels "D","オ","ソ",..: 7 4 3 3 3 12 10 12 9 5 ...
##   ..$ V3 : Factor w/ 59 levels "T-岡田","エルドレッド",..: 38 28 55 57 43 12 48 41 21 18 ...
##   ..$ V4 : Factor w/ 49 levels ".183",".184",..: 49 48 47 46 46 45 44 43 42 41 ...
##   ..$ V5 : Factor w/ 21 levels "10","11","12",..: 21 6 12 13 7 6 7 18 18 7 ...
##   ..$ V6 : Factor w/ 9 levels "0","1","2","3",..: 9 5 4 7 3 3 4 1 1 2 ...
##   ..$ V7 : Factor w/ 21 levels "11","15","16",..: 21 20 18 20 20 18 18 19 17 13 ...
##   ..$ V8 : Factor w/ 24 levels "10","11","12",..: 24 15 16 17 18 14 14 19 16 11 ...
##   ..$ V9 : Factor w/ 11 levels "0","1","2","2塁打",..: 4 10 6 6 6 7 9 3 8 7 ...
##   ..$ V10: Factor w/ 5 levels "0","1","2","3",..: 5 2 2 1 2 4 1 2 2 2 ...
##   ..$ V11: Factor w/ 40 levels ".000",".100",..: 40 24 32 16 31 23 36 15 5 7 ...
##   ..$ V12: Factor w/ 51 levels ".234",".250",..: 51 49 50 42 40 48 45 38 33 47 ...
##   ..$ V13: Factor w/ 54 levels ".244",".269",..: 54 52 47 51 44 49 46 22 36 40 ...
##   ..$ V14: Factor w/ 56 levels ".535",".536",..: 56 55 53 51 44 52 49 31 39 45 ...
##   ..$ V15: Factor w/ 23 levels "10","11","13",..: 23 11 16 20 13 14 21 10 5 16 ...
##   ..$ V16: Factor w/ 13 levels "10","12","13",..: 13 12 3 12 12 3 3 10 9 1 ...
##   ..$ V17: Factor w/ 43 levels ".111",".158",..: 43 37 40 23 33 41 36 29 33 32 ...
##   ..$ V18: Factor w/ 30 levels "22","28","29",..: 30 17 20 23 26 21 17 28 12 12 ...
##   ..$ V19: Factor w/ 16 levels "10","11","12",..: 16 2 4 9 8 7 6 8 4 3 ...
##   ..$ V20: Factor w/ 52 levels ".125",".146",..: 52 20 25 47 40 44 41 39 42 34 ...
##   ..$ V21: Factor w/ 7 levels "0","1","2","3",..: 7 2 1 6 2 3 2 1 1 2 ...
##   ..$ V22: Factor w/ 5 levels "21","22","23",..: 5 4 4 4 2 4 4 4 4 4 ...
##   ..$ V23: Factor w/ 28 levels "100","102","103",..: 28 13 11 9 7 6 6 8 3 27 ...
##   ..$ V24: Factor w/ 30 levels "100","102","103",..: 30 28 20 1 1 21 23 3 23 12 ...
##   ..$ V25: Factor w/ 19 levels "10","11","12",..: 19 8 10 4 7 6 7 9 2 7 ...
##   ..$ V26: Factor w/ 18 levels "1","10","11",..: 18 7 8 16 13 4 4 13 11 6 ...
##   ..$ V27: Factor w/ 5 levels "0","1","2","3",..: 5 2 3 2 1 3 1 2 2 3 ...
##   ..$ V28: Factor w/ 13 levels "0","1","10","11",..: 13 6 4 1 7 7 7 3 9 7 ...
##   ..$ V29: Factor w/ 11 levels "0","1","11","2",..: 11 4 10 1 5 5 4 8 6 5 ...
##   ..$ V30: Factor w/ 15 levels ".---",".000",..: 15 14 10 1 14 14 6 5 9 14 ...
##   ..$ V31: Factor w/ 12 levels "0","1","10","2",..: 12 2 1 1 2 1 1 2 9 5 ...
##   ..$ V32: Factor w/ 10 levels "0","1","10","2",..: 10 2 1 1 2 1 1 1 7 5 ...
##   ..$ V33: Factor w/ 9 levels ".---",".000",..: 9 3 1 1 3 1 1 2 3 3 ...
##   ..$ V34: Factor w/ 5 levels "0","1","2","3",..: 5 2 3 2 2 1 2 1 1 1 ...
##   ..$ V35: Factor w/ 5 levels "0","1","2","3",..: 5 1 1 1 2 1 1 1 1 2 ...
##   ..$ V36: Factor w/ 4 levels "0","1","2","代打安打": 4 1 1 1 2 1 1 1 1 1 ...
##   ..$ V37: Factor w/ 6 levels ".---",".000",..: 6 1 1 1 5 1 1 1 1 2 ...
##   ..$ V38: Factor w/ 8 levels "0","1","2","3",..: 8 1 1 4 1 2 3 1 1 1 ...
##   ..$ V39: Factor w/ 6 levels "0","1","2","3",..: 6 5 1 1 2 3 2 2 5 1 ...
##   ..$ V40: Factor w/ 20 levels "10","11","12",..: 20 4 12 6 1 8 6 19 1 1 ...

データは取ってこれました. 便利ですね. しかし, このままでは使えません. 列名がグチャグチャです. 整理します.

dat = dat$"NULL"
## 列の名前をとってくる
dat_names = dat[1,] %>% unlist %>% as.character
## 列名をつける
dat %>% setnames(dat_names) 
## 完成版 最上部と最下部はいらない
dat_interleague = dat[2:59,] 
## 内容確認
dat_interleague %>% head
##   順位 球団     選手名 打率 打点 本塁打 安打数 単打 2塁打 3塁打 最近5試合
## 2    1   ヤ 山田 哲人 .378   15      4     37   24     8     1      .313
## 3    2   ソ 柳田 悠岐 .371   20      3     33   25     4     1      .412
## 4    3   ソ   李 大浩 .370   21      6     37   27     4     0      .250
## 5    3   ソ   中村 晃 .370   16      2     37   30     4     1      .409
## 6    5   中       ルナ .367   15      2     33   23     5     3      .300
## 7    6   阪   鳥谷 敬 .359   16      3     33   23     7     0      .450
##   出塁率 長打率   OPS 得点圏打数 得点圏安打 得点圏打率 UC打数 UC安打 UC率
## 2   .470   .602 1.072         22          9       .409     44     11 .250
## 3   .482   .539 1.021         27         13       .481     47     13 .277
## 4   .418   .590 1.008         31          9       .290     50     19 .380
## 5   .396   .490  .886         24          9       .375     53     18 .340
## 6   .462   .556 1.018         25         13       .520     48     17 .354
## 7   .434   .533  .967         33         13       .394     44     15 .341
##   UC本塁打 試合数 打席数 打数 得点 四球 死球 企盗塁 盗塁 盗塁成功率 企犠打
## 2        1     24    118   98   18   15    1      2    2      1.000      1
## 3        0     24    112   89   22   16    2     11    9       .818      0
## 4        5     24    110  100   13    8    1      0    0       .---      0
## 5        1     22    107  100   17    5    0      3    3      1.000      1
## 6        2     24    106   90   16   12    2      3    3      1.000      0
## 7        1     24    106   92   17   12    0      3    2       .667      0
##   犠打 犠打成功率 犠飛 代打数 代打安打 代打率 併殺 失策 三振
## 2    1      1.000    1      0        0   .---    0    4   14
## 3    0       .---    2      0        0   .---    0    0   23
## 4    0       .---    1      0        0   .---    3    0   16
## 5    1      1.000    1      1        1  1.000    0    1   10
## 6    0       .---    0      0        0   .---    1    2   18
## 7    0       .---    1      0        0   .---    2    1   16

できました.

以上です. XMLパッケージでスクレイプ. 楽ですね.