データで楽しむプロ野球 http://baseballdata.jp/mm/ctop.html から, スクレイプして交流戦の打撃成績をまとめます.
XMLパッケージを利用します.
library(XML)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## 以下のオブジェクトはマスクされています (from 'package:stats') :
##
## filter, lag
##
## 以下のオブジェクトはマスクされています (from 'package:base') :
##
## intersect, setdiff, setequal, union
library(data.table)
##
## Attaching package: 'data.table'
##
## 以下のオブジェクトはマスクされています (from 'package:dplyr') :
##
## last
dat = readHTMLTable('http://baseballdata.jp/mm/ctop.html')
## 中身を見ます
str(dat)
## List of 1
## $ NULL:'data.frame': 60 obs. of 40 variables:
## ..$ V1 : Factor w/ 54 levels "1","10","11",..: 54 1 8 19 19 40 50 51 52 53 ...
## ..$ V2 : Factor w/ 13 levels "D","オ","ソ",..: 7 4 3 3 3 12 10 12 9 5 ...
## ..$ V3 : Factor w/ 59 levels "T-岡田","エルドレッド",..: 38 28 55 57 43 12 48 41 21 18 ...
## ..$ V4 : Factor w/ 49 levels ".183",".184",..: 49 48 47 46 46 45 44 43 42 41 ...
## ..$ V5 : Factor w/ 21 levels "10","11","12",..: 21 6 12 13 7 6 7 18 18 7 ...
## ..$ V6 : Factor w/ 9 levels "0","1","2","3",..: 9 5 4 7 3 3 4 1 1 2 ...
## ..$ V7 : Factor w/ 21 levels "11","15","16",..: 21 20 18 20 20 18 18 19 17 13 ...
## ..$ V8 : Factor w/ 24 levels "10","11","12",..: 24 15 16 17 18 14 14 19 16 11 ...
## ..$ V9 : Factor w/ 11 levels "0","1","2","2塁打",..: 4 10 6 6 6 7 9 3 8 7 ...
## ..$ V10: Factor w/ 5 levels "0","1","2","3",..: 5 2 2 1 2 4 1 2 2 2 ...
## ..$ V11: Factor w/ 40 levels ".000",".100",..: 40 24 32 16 31 23 36 15 5 7 ...
## ..$ V12: Factor w/ 51 levels ".234",".250",..: 51 49 50 42 40 48 45 38 33 47 ...
## ..$ V13: Factor w/ 54 levels ".244",".269",..: 54 52 47 51 44 49 46 22 36 40 ...
## ..$ V14: Factor w/ 56 levels ".535",".536",..: 56 55 53 51 44 52 49 31 39 45 ...
## ..$ V15: Factor w/ 23 levels "10","11","13",..: 23 11 16 20 13 14 21 10 5 16 ...
## ..$ V16: Factor w/ 13 levels "10","12","13",..: 13 12 3 12 12 3 3 10 9 1 ...
## ..$ V17: Factor w/ 43 levels ".111",".158",..: 43 37 40 23 33 41 36 29 33 32 ...
## ..$ V18: Factor w/ 30 levels "22","28","29",..: 30 17 20 23 26 21 17 28 12 12 ...
## ..$ V19: Factor w/ 16 levels "10","11","12",..: 16 2 4 9 8 7 6 8 4 3 ...
## ..$ V20: Factor w/ 52 levels ".125",".146",..: 52 20 25 47 40 44 41 39 42 34 ...
## ..$ V21: Factor w/ 7 levels "0","1","2","3",..: 7 2 1 6 2 3 2 1 1 2 ...
## ..$ V22: Factor w/ 5 levels "21","22","23",..: 5 4 4 4 2 4 4 4 4 4 ...
## ..$ V23: Factor w/ 28 levels "100","102","103",..: 28 13 11 9 7 6 6 8 3 27 ...
## ..$ V24: Factor w/ 30 levels "100","102","103",..: 30 28 20 1 1 21 23 3 23 12 ...
## ..$ V25: Factor w/ 19 levels "10","11","12",..: 19 8 10 4 7 6 7 9 2 7 ...
## ..$ V26: Factor w/ 18 levels "1","10","11",..: 18 7 8 16 13 4 4 13 11 6 ...
## ..$ V27: Factor w/ 5 levels "0","1","2","3",..: 5 2 3 2 1 3 1 2 2 3 ...
## ..$ V28: Factor w/ 13 levels "0","1","10","11",..: 13 6 4 1 7 7 7 3 9 7 ...
## ..$ V29: Factor w/ 11 levels "0","1","11","2",..: 11 4 10 1 5 5 4 8 6 5 ...
## ..$ V30: Factor w/ 15 levels ".---",".000",..: 15 14 10 1 14 14 6 5 9 14 ...
## ..$ V31: Factor w/ 12 levels "0","1","10","2",..: 12 2 1 1 2 1 1 2 9 5 ...
## ..$ V32: Factor w/ 10 levels "0","1","10","2",..: 10 2 1 1 2 1 1 1 7 5 ...
## ..$ V33: Factor w/ 9 levels ".---",".000",..: 9 3 1 1 3 1 1 2 3 3 ...
## ..$ V34: Factor w/ 5 levels "0","1","2","3",..: 5 2 3 2 2 1 2 1 1 1 ...
## ..$ V35: Factor w/ 5 levels "0","1","2","3",..: 5 1 1 1 2 1 1 1 1 2 ...
## ..$ V36: Factor w/ 4 levels "0","1","2","代打安打": 4 1 1 1 2 1 1 1 1 1 ...
## ..$ V37: Factor w/ 6 levels ".---",".000",..: 6 1 1 1 5 1 1 1 1 2 ...
## ..$ V38: Factor w/ 8 levels "0","1","2","3",..: 8 1 1 4 1 2 3 1 1 1 ...
## ..$ V39: Factor w/ 6 levels "0","1","2","3",..: 6 5 1 1 2 3 2 2 5 1 ...
## ..$ V40: Factor w/ 20 levels "10","11","12",..: 20 4 12 6 1 8 6 19 1 1 ...
データは取ってこれました. 便利ですね. しかし, このままでは使えません. 列名がグチャグチャです. 整理します.
dat = dat$"NULL"
## 列の名前をとってくる
dat_names = dat[1,] %>% unlist %>% as.character
## 列名をつける
dat %>% setnames(dat_names)
## 完成版 最上部と最下部はいらない
dat_interleague = dat[2:59,]
## 内容確認
dat_interleague %>% head
## 順位 球団 選手名 打率 打点 本塁打 安打数 単打 2塁打 3塁打 最近5試合
## 2 1 ヤ 山田 哲人 .378 15 4 37 24 8 1 .313
## 3 2 ソ 柳田 悠岐 .371 20 3 33 25 4 1 .412
## 4 3 ソ 李 大浩 .370 21 6 37 27 4 0 .250
## 5 3 ソ 中村 晃 .370 16 2 37 30 4 1 .409
## 6 5 中 ルナ .367 15 2 33 23 5 3 .300
## 7 6 阪 鳥谷 敬 .359 16 3 33 23 7 0 .450
## 出塁率 長打率 OPS 得点圏打数 得点圏安打 得点圏打率 UC打数 UC安打 UC率
## 2 .470 .602 1.072 22 9 .409 44 11 .250
## 3 .482 .539 1.021 27 13 .481 47 13 .277
## 4 .418 .590 1.008 31 9 .290 50 19 .380
## 5 .396 .490 .886 24 9 .375 53 18 .340
## 6 .462 .556 1.018 25 13 .520 48 17 .354
## 7 .434 .533 .967 33 13 .394 44 15 .341
## UC本塁打 試合数 打席数 打数 得点 四球 死球 企盗塁 盗塁 盗塁成功率 企犠打
## 2 1 24 118 98 18 15 1 2 2 1.000 1
## 3 0 24 112 89 22 16 2 11 9 .818 0
## 4 5 24 110 100 13 8 1 0 0 .--- 0
## 5 1 22 107 100 17 5 0 3 3 1.000 1
## 6 2 24 106 90 16 12 2 3 3 1.000 0
## 7 1 24 106 92 17 12 0 3 2 .667 0
## 犠打 犠打成功率 犠飛 代打数 代打安打 代打率 併殺 失策 三振
## 2 1 1.000 1 0 0 .--- 0 4 14
## 3 0 .--- 2 0 0 .--- 0 0 23
## 4 0 .--- 1 0 0 .--- 3 0 16
## 5 1 1.000 1 1 1 1.000 0 1 10
## 6 0 .--- 0 0 0 .--- 1 2 18
## 7 0 .--- 1 0 0 .--- 2 1 16
できました.
以上です. XMLパッケージでスクレイプ. 楽ですね.