* Suppose N students participate in a coin flip experiment, when they get heads they stop, when they get tails they keep going. All students will stop after the second trial no matter the results. Y is the binary indicator of whether they claim they cheated in the experiment. Estimate how many students cheat in this experiment.
## using bayes rule
# # we want to know p(cheat|1st head)
#
# p (c|1st head)
# p(cheat) = p(cheat\1st head)p(1st head) + p(cheat|1st teail)p(1st tail)
#
# p(c|1st tail) = p(c|2nd head)P(2nd head) + p(c|2nd tail)p(2nd tail) = 1*1/2 = 1/2
#
# so that
# p(cheat) = p(cheat|1st head)*1/2 + 1/2*1/2
#
# y/N = 1/2* X + 1/4
#
* Apply function
# apply(x, 1, function)
a <- matrix(rnorm(10), ncol = 2, nrow = 5)
a
## [,1] [,2]
## [1,] 1.0809507 0.1948797
## [2,] 0.8653803 -0.7371508
## [3,] -0.4959525 0.5607351
## [4,] -0.9269606 -0.2049032
## [5,] -0.4435452 -0.2072242
apply(a, 2, sum)
## [1] 0.07987277 -0.39366342
# lapply
a <- list(a1 = c(10, 10), a2 = 10)
lapply(a, sum)
## $a1
## [1] 20
##
## $a2
## [1] 10
# malapply
Q1 <- matrix(c(rep(1, 4), rep(2, 4), rep(3, 4), rep(4, 4)),4,4)
# Print `Q1`
print(Q1)
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 1 2 3 4
## [3,] 1 2 3 4
## [4,] 1 2 3 4
# Or use `mapply()`
Q2 <- mapply(rep,1:4,4)
# Print `Q2`
print(Q2)
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 1 2 3 4
## [3,] 1 2 3 4
## [4,] 1 2 3 4
The number used to multiply variable. explain how much changes x will need in order for y to change 1 unit
Draw sample from a population, and use sample mean to estimate the population mean, the confidence interval tell us how much confidence we have to make a conclusion that the population mean is in the range of this interval.
*Evaluate the estimator
MSE = variance + Bias ^ 2 Bias = E(theta^ - theta) Consistent: when n (sample size increases), MSE decreases Efficient: when MSE is small Unbised : (bias == 0)
* How to evaluate the results of a survey?
## coverage error: unknown chances of individuals included in the sample
## sample error: the sample population does not represent the characteristics of the population
## measurement error: the answer does not reflect the topic accurately
## unresponse error: lack of response of people in the sample
* Bivariate distribution simulation
library(mixtools) #for ellipse
## mixtools package, version 1.1.0, Released 2017-03-10
## This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772.
N <- 200 # Number of random samples
set.seed(123)
# Target parameters for univariate normal distributions
rho <- -0.6
mu1 <- 1; s1 <- 2
mu2 <- 1; s2 <- 8
# Parameters for bivariate normal distribution
mu <- c(mu1,mu2) # Mean
sigma <- matrix(c(s1^2, s1*s2*rho, s1*s2*rho, s2^2),
2) # Covariance matrix
# Function to draw ellipse for bivariate normal data
ellipse_bvn <- function(bvn, alpha){
Xbar <- apply(bvn,2,mean)
S <- cov(bvn)
ellipse(Xbar, S, alpha = alpha, col="red")
}
library(MASS)
mybvd <- mvrnorm(N, mu = mu, Sigma = sigma)
colnames(mybvd) <- c('X1', 'X2')
mybvd
## X1 X2
## [1,] -1.736404159 -4.0180849
## [2,] -0.763548866 -1.1607058
## [3,] -0.531291499 13.5285462
## [4,] 0.063158545 1.4313013
## [5,] 1.486094242 2.1348926
## [6,] -0.396578755 14.8303129
## [7,] 1.656993892 4.8779822
## [8,] 3.508274041 -8.9707367
## [9,] -0.722467323 -4.8949803
## [10,] 1.640716027 -2.5504587
## [11,] -0.714274650 10.7590130
## [12,] 0.170067967 3.8177285
## [13,] -1.426202539 3.9040380
## [14,] 1.668278009 2.0109355
## [15,] 3.244774033 -3.2025552
## [16,] -3.849030985 14.8798986
## [17,] 1.067959411 5.0885666
## [18,] 4.584698775 -14.5492413
## [19,] 2.056429229 6.9098073
## [20,] 3.597733350 -2.4671867
## [21,] 3.229817128 -7.3986059
## [22,] 0.306377190 -0.8937362
## [23,] 0.546366453 -7.4749487
## [24,] 0.804098830 -5.0010233
## [25,] 2.348466898 -3.9092807
## [26,] 3.012010015 -12.5018473
## [27,] 1.055298950 7.8710550
## [28,] 1.929327638 2.4013683
## [29,] 1.038242136 -8.3166581
## [30,] 1.021983996 11.2735929
## [31,] -2.587796194 3.9331647
## [32,] 1.509456082 -1.3374422
## [33,] -0.452574165 8.1053422
## [34,] 1.057960552 8.2019534
## [35,] 0.872056448 7.7097054
## [36,] 2.198475623 6.8278254
## [37,] 0.594435291 5.4739025
## [38,] 0.422550453 0.4027326
## [39,] 0.875132067 -1.5256720
## [40,] 2.696195310 -1.8517094
## [41,] 3.099505959 -4.3626999
## [42,] 2.044297166 -0.5400615
## [43,] 0.241661163 -9.4834038
## [44,] 0.069872266 18.6210066
## [45,] -0.228015158 10.7028793
## [46,] -0.570859497 -8.4447416
## [47,] 1.660682477 -2.1969488
## [48,] 3.707469399 -2.3997955
## [49,] 1.065237605 7.3989795
## [50,] 0.345451903 0.2149374
## [51,] 1.270742340 3.1172281
## [52,] 1.913657932 0.9087929
## [53,] 1.590939884 0.7410889
## [54,] -0.849744846 11.9216521
## [55,] -1.216121341 -1.1952587
## [56,] -0.754504635 13.1477278
## [57,] 1.244269342 -11.6479051
## [58,] -0.715389699 5.5208729
## [59,] 1.022982756 2.0180939
## [60,] 3.125869241 3.1006562
## [61,] 1.340455332 4.1628219
## [62,] 2.392518479 -2.8972230
## [63,] 1.342232435 -1.6759222
## [64,] 0.239638978 -7.4619768
## [65,] -1.245475682 -8.1297028
## [66,] -1.797225392 3.0495924
## [67,] 0.648602111 4.6164910
## [68,] 3.678704110 1.8523134
## [69,] 0.456332211 8.4695457
## [70,] -1.698375883 17.3712908
## [71,] 0.292446224 -3.1325502
## [72,] 2.378261487 -17.6995545
## [73,] -1.324738845 8.8752407
## [74,] 4.065592521 -4.3306131
## [75,] 0.531083947 -4.7087665
## [76,] 0.417672003 9.3096865
## [77,] 1.082303652 -1.3197651
## [78,] 2.407236449 -8.7793864
## [79,] 0.104610155 2.3453090
## [80,] 1.134809447 -0.1166333
## [81,] 3.598511499 1.4528459
## [82,] -0.631815252 3.9011536
## [83,] 0.859437970 -2.0580665
## [84,] 0.610795363 6.2174169
## [85,] 1.090597280 -0.7918918
## [86,] 0.376402565 3.6203249
## [87,] -0.714485664 9.7167179
## [88,] -2.107300478 4.0795749
## [89,] 1.749139763 -1.5528064
## [90,] -0.696604831 10.1451907
## [91,] -2.065913819 8.6593266
## [92,] -1.331857659 5.1279851
## [93,] -1.087655475 2.6295972
## [94,] 2.686160033 -3.8800488
## [95,] -3.827617140 11.3916845
## [96,] 1.645030140 -3.8161144
## [97,] -4.647557860 18.0351387
## [98,] 1.197960842 13.5847150
## [99,] 1.261417646 -0.8898447
## [100,] 0.328009239 -7.5124457
## [101,] 3.004434546 -4.5061380
## [102,] 1.855551755 3.2377205
## [103,] 2.774550368 -0.7436759
## [104,] 3.078539331 -1.5223042
## [105,] 2.870972032 -6.5027706
## [106,] 0.538682494 0.5591598
## [107,] 5.127278500 -4.7849816
## [108,] 2.750718600 -12.3890402
## [109,] -0.457900258 -2.3420649
## [110,] -3.331165551 7.8515207
## [111,] -0.315138728 -3.9180366
## [112,] -0.941464730 5.6768499
## [113,] 5.717790592 -11.5158351
## [114,] 2.009308085 0.7024389
## [115,] 0.901791960 5.2392016
## [116,] -0.475281140 3.2364932
## [117,] 1.033220922 1.8707933
## [118,] 3.766595034 -3.8162375
## [119,] -0.571584834 -6.2053651
## [120,] 0.854146956 -7.4115400
## [121,] 0.482125929 1.8828178
## [122,] 0.279168300 -6.8734106
## [123,] 3.704384905 -2.5960622
## [124,] 0.287020292 -1.2089815
## [125,] -0.484424347 15.8715933
## [126,] 0.745316828 -4.3799628
## [127,] 0.801228092 2.8970535
## [128,] -0.086419120 1.4689962
## [129,] 0.113649301 -7.0170533
## [130,] 1.077616833 0.4280223
## [131,] -2.393234231 12.3028162
## [132,] 2.293545002 4.9002524
## [133,] 2.076158551 1.5057339
## [134,] -0.846657032 -2.7489919
## [135,] 2.973191036 -15.5103978
## [136,] 2.794770960 10.5470962
## [137,] 4.954747889 -10.3469521
## [138,] 0.390129841 6.9658049
## [139,] -2.735922079 16.0545339
## [140,] 2.961518554 -10.5209214
## [141,] -0.851383931 6.4594036
## [142,] -0.171322565 -1.3305386
## [143,] 0.351095208 -11.9789322
## [144,] 2.803116383 -11.1253713
## [145,] 3.080313967 -11.7936518
## [146,] 4.402417109 -2.8176031
## [147,] 2.669389607 -10.7128365
## [148,] 1.034933058 6.6402689
## [149,] -0.099364923 18.0306535
## [150,] 2.887641874 -9.2475710
## [151,] -1.569295450 7.0514014
## [152,] 3.154059476 7.6355763
## [153,] 1.253029950 3.7606128
## [154,] 2.076419638 -7.0917137
## [155,] 2.544889223 0.2627072
## [156,] 0.828218788 -1.3235699
## [157,] -0.345667282 5.4014603
## [158,] 1.516513271 -1.9700660
## [159,] 3.633888750 9.4136693
## [160,] -2.550767408 -2.6225196
## [161,] 0.006789361 9.4678553
## [162,] 1.292015427 -7.5483615
## [163,] 2.145156531 -9.1433353
## [164,] -4.646781417 26.6663106
## [165,] 0.242605099 -2.5327650
## [166,] 0.955580365 3.4358871
## [167,] -0.385534508 5.9979415
## [168,] 3.081227090 -2.6378280
## [169,] -0.984246494 4.9239410
## [170,] 1.259896746 4.0628067
## [171,] -2.507765219 -1.3117761
## [172,] 3.498537209 1.9248484
## [173,] 1.767580784 0.8407710
## [174,] -2.946587999 17.8183594
## [175,] 1.128190333 -5.0523698
## [176,] 3.289216065 -7.6200987
## [177,] 2.510466735 1.5453153
## [178,] 0.386681718 3.4474485
## [179,] 0.477476519 4.4940541
## [180,] 4.369756312 -2.2285071
## [181,] 2.273282554 -7.5110818
## [182,] -0.874010579 11.0543789
## [183,] 1.163404705 -1.8385242
## [184,] 3.728997289 -5.6635288
## [185,] 0.550893717 -1.0055027
## [186,] -0.908123949 -0.9129538
## [187,] -1.098379580 9.7639420
## [188,] 2.668773881 1.9545918
## [189,] 0.739536663 7.1358941
## [190,] 1.082386353 -3.0769053
## [191,] 1.743439346 2.8726012
## [192,] 4.776962727 -1.0699562
## [193,] -0.499850515 1.5406175
## [194,] 3.413814753 -5.9572384
## [195,] 3.532463349 -9.3416254
## [196,] -3.843073555 16.6034252
## [197,] 1.459907837 5.9922763
## [198,] 1.240277117 -9.2118190
## [199,] 3.732901095 -3.5795305
## [200,] 3.033781745 -8.3929459
gibbs<-function (n, mu1, s1, mu2, s2, rho)
{
mat <- matrix(ncol = 2, nrow = n)
x <- 0
y <- 0
mat[1, ] <- c(x, y)
for (i in 2:n) {
x <- rnorm(1, mu1 +
(s1/s2) * rho * (y - mu2), sqrt((1 - rho^2)*s1^2))
y <- rnorm(1, mu2 +
(s2/s1) * rho * (x - mu1), sqrt((1 - rho^2)*s2^2))
mat[i, ] <- c(x, y)
}
mat
}
bvn4 <- gibbs(N,mu1,s1,mu2,s2,rho)
colnames(bvn4) <- c("bvn4_X1","bvn4_X2")
## a simple format for gibbs sampling
gibbs <- function(n, rho) {
result <- matrix(ncol = 2, nrow = n)
x <- 0
y <- 0
result[1, ] <- c(x, y)
for (i in 2:n) {
x <- rnorm(1, y*rho, sqrt(1- rho^2))
y <- rnorm(1, x*rho, sqrt(1-rho^2))
result[i, ] <- c(x, y)
}
result
}
set.seed(12)
d <- rbind(
data.frame(group = 'A',
converted = rbinom(1000000, size = 1, p = 0.05)),
data.frame(group = 'B',
converted = rbinom(10000, size = 1, p = 0.06))
)
tab <- table(d)
## calculating conversion rate
aConverRate <- tab['A', '1']/sum(tab['A',])
bConverRate <- tab['B', '1']/sum(tab['B',])
## testing the significance using fisher's test for independnnce
fisher.test(tab)
##
## Fisher's Exact Test for Count Data
##
## data: tab
## p-value = 0.003239
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.044012 1.240187
## sample estimates:
## odds ratio
## 1.138793
##CommonRate
commonRate <- sum(tab[, '1'])/sum(tab)
The most common bayesian analysis for binomial experiments is to encode our priors and posteries as a beta distribution and measure how much mass is in the left tail of the distribution.
print(pbeta(aConverRate, shape1 =commonRate + tab['B', '1'],
shape2 = (1- commonRate) + tab['B','0']))
## [1] 0.001650022
This is the posterior estimate of seeing the probability larger than Aconversionrate given the observed B data. This means the unknown B conversion rate is likely to be larger than A conversion rate.
library(ggplot2)
plt <- data.frame(x = seq(from = 0.04, to = 0.07, length.out = 301))
plt$density <- dbeta(plt$x, shape1 = commonRate + tab['B', '1'],
shape2 = (1- commonRate) + tab['B','0'])
ggplot(data = plt) +
geom_line(aes(x= x, y = density)) +
geom_vline(aes(xintercept = aConverRate))