#library(ggplot2)
# Many thanks indeed to Rasmus Bååth
#for his Bayesian First Aid Alternative to the Proportion Test!
#http://www.sumsar.net/blog/2014/06/bayesian-first-aid-prop-test/
library(rjags)
## Loading required package: coda
## Linked to JAGS 4.1.0
## Loaded modules: basemod,bugs
library(mcmc)
library(stringr)
source(file = "bayes_prp_tst.R")
source(file = "r_jags.R")
source(file = "generic.R")
An election for President of the United States occurs every four years on Election Day, held the first Tuesday after the first Monday in November. The 2016 Presidential election will be held on November 8, 2016.
The election process begins with the primary elections and caucuses and moves to nominating conventions, during which political parties each select a nominee to unite behind. The nominee also announces a Vice Presidential running mate at this time. The candidates then campaign across the country to explain their views and plans to voters and participate in debates with candidates from other parties. See: https://www.usa.gov/election
According to the Associated Press, Donald J. Trump and Hillary Clinton have each won enough delegates to claim their party’s nomination for president. Delegate totals include unpledged delegates, also known as superdelegates, who are free to support any candidate at the party conventions.
So far we got two nominees for Republicans and Democrats: Donald J. Trump and Hillary Clinton. Though H.Clinton has got advantage in absolute voices gained (1812) compared to D.Trump (1144) there is some intrigue in proportions for both candidates (pledged delegates/total delegates). We see \({P}_{DT}=1144/1239=0.92\) for D.Trump and \({P}_{HC}=1812/2383=0.76\) for H.Clinton. Let’s try proportion test for both frequentist (https://rpubs.com/alex-lev/111354) and Bayesian (https://en.wikipedia.org/wiki/Bayesian_statistics) models:
prop.test(c(1144,1812),c(1239,2382))
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: c(1144, 1812) out of c(1239, 2382)
## X-squared = 142.69, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.1393556 0.1858843
## sample estimates:
## prop 1 prop 2
## 0.9233253 0.7607053
The result is significantly (p-value < 2.2e-16) different for two proportions.
fit<-bayes.prop.test(c(1144,1812),c(1239,2382))
summary(fit)
## Data
## number of successes: 1144, 1812
## number of trials: 1239, 2382
##
## Model parameters and generated quantities
## theta[i]: the relative frequency of success for Group i
## x_pred[i]: predicted number of successes in a replication for Group i
## theta_diff[i,j]: the difference between two groups (theta[i] - theta[j])
##
## Measures
## mean sd HDIlo HDIup %<comp %>comp
## theta[1] 0.923 0.008 0.908 0.937 0 1
## theta[2] 0.761 0.009 0.743 0.778 0 1
## x_pred[1] 1143.193 13.378 1115.000 1167.000 0 1
## x_pred[2] 1811.513 29.334 1751.000 1866.000 0 1
## theta_diff[1,2] 0.162 0.012 0.140 0.185 0 1
##
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## '%<comp' and '%>comp' are the probabilities of the respective parameter being
## smaller or larger than 0.5 (except for the theta_diff parameters where
## the comparison value comp is 0.0).
##
## Quantiles
## q2.5% q25% median q75% q97.5%
## theta[1] 0.907 0.918 0.923 0.928 0.937
## theta[2] 0.744 0.755 0.760 0.767 0.778
## x_pred[1] 1116.000 1134.000 1144.000 1152.000 1168.000
## x_pred[2] 1753.000 1792.000 1812.000 1832.000 1868.000
## theta_diff[1,2] 0.139 0.154 0.162 0.170 0.185
plot(fit)
diagnostics(fit)#diagnostics
##
## Iterations = 1:5000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## Diagnostic measures
## mean sd mcmc_se n_eff Rhat
## theta[1] 0.923 0.008 0.000 15587 1.001
## theta[2] 0.761 0.009 0.000 15466 1.000
## x_pred[1] 1143.193 13.378 0.110 14719 1.000
## x_pred[2] 1811.513 29.334 0.235 15572 1.000
## theta_diff[1,2] 0.162 0.012 NA NA NA
##
## mcmc_se: the estimated standard error of the MCMC approximation of the mean.
## n_eff: a crude measure of effective MCMC sample size.
## Rhat: the potential scale reduction factor (at convergence, Rhat=1).
##
## Model parameters and generated quantities
## theta: The relative frequency of success
## x_pred: Predicted number of successes in a replication
## theta_diff[i,j]: the difference between two groups (theta[i] - theta[j])
So does Bayesian test with adequate diagnostics.
What if Sanders’s pledged delegates voices would be added to Clinton’s ones? Ask frequentist:
prop.test(c(1144,1812+1521),c(1239,2382+1569))
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: c(1144, 1812 + 1521) out of c(1239, 2382 + 1569)
## X-squared = 49.939, df = 1, p-value = 1.586e-12
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.06056212 0.09892060
## sample estimates:
## prop 1 prop 2
## 0.9233253 0.8435839
And what about Bayes?
fit2<-bayes.prop.test(c(1144,1812+1521),c(1239,2382+1569))
plot(fit2)
Well, H.Clinton support by Sanders pledged delegates voters would grow \({P}_{HC+S}=1812+1521/2382+1569=0.84\) but not enough to beat D.Trump significantly.