彩票市场在中国内地甚至全球的市场占有份额日益增加,越来越多赌徒热爱购买彩票。今天僕尝试分析随机内地彩票—福彩3D为数据,分析是否可能从中获利。
美国马萨诸塞州 Cash Winfall 彩票有漏洞!文献中阐述:
Cash Winfall彩票有一条特殊的规则:当头奖达到200万美元而无人领取后,较小的奖项奖金金额将会变大,比如末奖能从5美元增加到几十美元。统计学家称,任何人只要购买10万美元以上的彩票就要保证盈利。因此每隔几个月,当头奖金额累积到200万美元后,由精明的赌徒组成的小团体就会短暂控制整个博彩,他们有的来自MIT,有的来自东北大学,训练有素,他们会投入数十万美元购买彩票,每次能获得几十万美元的收入,这些盈利还没有计算获得头奖的可能性。麻州的博彩官员正开始采取行动限制大量购买,比如每个彩票点每天只能出售5000美元的彩票…
环境设置与读取程序包。
## Remove all objects (includes hidden objects).
rm(list = ls(all = TRUE))
options(warn = -1) #Do not display warning messages.
suppressPackageStartupMessages(library('BBmisc'))
pkgs <- c('devtools', 'knitr', 'kableExtra', 'rvest', 'plyr', 'dplyr', 'magrittr', 'stringr', 'lubridate', 'devtools', 'openxlsx', 'DT', 'mvtnorm', 'PoisNor', 'flexmix', 'poilog', 'kfoots', 'mvrpois', 'rstan', 'matrixStats')
##https://github.com/lamortenera/kfoots
#'@ suppressAll(library('kfoots')) ##Fit mixture model or a hidden markov model
suppressAll(lib(pkgs))
rm(pkgs)
## Modified or call hidden functions in mvrpois.
l_ply(dir(paste0(getwd(), '/function/')), function(funs) source(paste0(getwd(), '/function/', funs)))#, local = TRUE))
福彩3D: 直选走势数据从2016158期至2017158期,一共有360个观测数据。该福彩类型规则乃预测3个随机数字,从000至999。
##
## http://baidu.lecai.com/lottery/draw/list/52?type=latest&num=100 ##要下载txt或xlsx文件,需要登入账户
#'@ read_html('http://sports.sina.com.cn/l/tubiao/3d_jibenzoushitu.html') %>% html('table_A')
## http://zst.sina.aicai.com/gaopin_cqssc/?sDate=&eDate=&q=t100&sortTag=up ##时时彩上至五个随机变量,比福彩3D复杂
## http://sports.sina.com.cn/l/tubiao/3d_jibenzoushitu.html ## 快三
ltry_data <- read.xlsx(xlsxFile = './data/lottery_data.xlsx', detectDates = TRUE) %>% .[1:5] %>% tbl_df %>% mutate(Result = as.numeric(paste0(Result1, Result2, Result3)))
datatable(ltry_data)
数据图2.1:福彩3D开彩数据
从以上数据图2.1,首先僕绘制个频率图。
##观测数据量
length(ltry_data$Result)
## [1] 360
##过滤重复开彩成绩
length(unique(ltry_data$Result))
## [1] 296
hist(ltry_data$Result, main = "3D Lottery Histogram", xlab = "Number (Digit from 0 to 999)")
数据图2.2:福彩3D开彩数据频率图
以 000 至 999 三位数随机变量,一般上会以普通频率分布分析,然而由于三位数字非出现在同一个单位(slot)里而是个别三个随机变量,因此多变量泊松模式将能分析。
kfoots
:kfoots程序包使用马克夫链,举例此期开彩010,该模式将分析及预测每当开彩该成绩后的下一个成绩(状态的转换率)的机率PoisNor
:PoisNor程序包提供多随机变量产生器及相关系数分析poilog
:poilog程序包分析双变量泊松模式mvrpois
:mvrpois程序包分析多变量泊松模式在此,僕使用mvrpois
多变量泊松模式分析仨变量泊松3D彩票数据,有关详情请查阅Dimitris Karlis and Loukia Meligkotsidou (2005)1和Dimitris Karlis (2002)2。
\[(X_{1},X_{2},X_{3})_{i} ~ 3 - Pois(\theta_{1i},\theta_{2i},\theta_{3i},\theta_{12i},\theta_{13i},\theta_{23i}) \dots equation\ 3.1\]
联合分布函数将为:
\[P(X = x) = \sum_{(y_{12},y_{13},y_{23})\in C}\frac{exp(-\sum\theta_{i})\theta_{1}^{x_{1}-y_{12}-y_{13}}\theta_{2}^{x_{2}-y_{12}-y_{23}}\theta_{3}^{x_{3}-y_{13}-y_{23}}\theta_{12}^{y_{12}}\theta_{13}^{y_{13}}\theta_{23}^{y_{23}}}{(x_{1}-y_{12}-y{13})!(x_{2}-y_{12}-y{23})!(x_{3}-y_{13}-y{23})!y_{12}!y_{13}!y_{23}!} \dots equation\ 3.2\] 而 \(C \subset N^3\) 设为:
\[C = (y_{12},y_{13},y_{23}) \in N^3 : \{y_{12}+y_{13}\leq x_{1}\} \cup \{y_{12}+y_{23}\leq x_{2}\} \cup \{y_{13}+y_{23}\leq x_{3}\} \neq \theta\]
Rerefences for multivariate Poisson models :
## =========================== eval=FALSE ==============================
## get the trivariate sample data and convert to matric format.
m <- ltry_data %>% select(Result1, Result2, Result3) %>% as.matrix
#'@ m <- ltry_data %>% select(Date, Result1, Result2, Result3) %>% as.matrix
#'@ adply(m, 1, transform, mvp = mvp.fit(method = 'HMC'))
## ------------------------ PENDING --------------------------------------
## 1) missing files : 'karlis.stan' and 'karlis.covariates.stan'.
## 2) missing codes : cov() inside gibbs() inside mvp.fit.R
## ------------------------ PENDING --------------------------------------
## Hamiltonian Monte Carlo (HMC)
fit.HMC <- mvp.fit(m, method = 'HMC')
## Gibbs MCMC
fit.Gibbs <- mvp.fit(m, method = 'Gibbs')
## ==============================================================================
## 1) method = Monte Carlo samples, default iterations : n.mc = 10000.
tv.mc <- mvp.prob(m, seq(0, 9), logarithm = FALSE, method = 'MC')
adply(m, 1, transform, prob = mvp.prob(m[, c('Result1', 'Result2', 'Result3')], seq(0, 9), logarithm = FALSE, method = 'MC'))
## 2) method = 'recursive', but it is equal to 'analytical'.
tv.rc <- mvp.prob(m, seq(0, 9), logarithm = FALSE, method = 'recursive')
## 3) method = 'analytical', but it is equal to 'recursive'.
tv.an <- mvp.prob(m, seq(0, 9), logarithm = FALSE, method = 'analytical')
understanding the kelly capital growth investment strategy文中提及,尽管中马票的机率非常渺茫,但是运用凯利模式长期投注的话还是可以获利,不过可能得花上百万年时间,请查看下图。
由于凯利投资模式,在此使用该模式。有关凯利模式投注,请参阅:
Summary of the betting
options(warn = 0)
以下乃此文献资讯:
Category | session_info | Category | Sys.info |
---|---|---|---|
version | R version 3.4.4 (2018-03-15) | sysname | Linux |
system | x86_64, linux-gnu | release | 3.13.0-156-generic |
ui | X11 | version | #206-Ubuntu SMP Fri Aug 17 08:51:32 UTC 2018 |
language | (EN) | nodename | 5e78ee65cf45 |
collate | C.UTF-8 | machine | x86_64 |
tz | Etc/UTC | login | unknown |
date | 2018-08-29 | user | rstudio-user |
Current time | 2018-08-30 04:38:03 JST | effective_user | rstudio-user |
7.3 参考文献中的4th文献↩
7.3 参考文献中的3rd文献↩
So far, the following models have been implemented:↩
1) Maher (1982) - Modelling Association Football Scores - maher↩
2) Dixon and Coles (1997) - Modelling Association Football Scores and Inefficiencies in the Football Betting Market - dixon-coles↩
3)Karlis and Ntzoufras (2008) - Bayesian modelling of football outcomes (using the Skellam’s distribution) - karlis-ntzoufras↩