* Suppose N students participate in a coin flip experiment, when they get heads they stop, when they get tails they keep going. All students will stop after the second trial no matter the results. Y is the binary indicator of whether they claim they cheated in the experiment. Estimate how many students cheat in this experiment.

here

## using bayes rule
# # we want to know p(cheat|1st head)
# 
# p (c|1st head) 
# p(cheat) = p(cheat\1st head)p(1st head) + p(cheat|1st teail)p(1st tail)
# 
# p(c|1st tail) = p(c|2nd head)P(2nd head) + p(c|2nd tail)p(2nd tail) = 1*1/2 = 1/2
# 
# so that
# p(cheat) = p(cheat|1st head)*1/2 + 1/2*1/2
# 
# y/N = 1/2* X + 1/4
# 
* Apply function
# apply(x, 1, function)
a <- matrix(rnorm(10), ncol = 2, nrow = 5)
a
##            [,1]       [,2]
## [1,]  1.0809507  0.1948797
## [2,]  0.8653803 -0.7371508
## [3,] -0.4959525  0.5607351
## [4,] -0.9269606 -0.2049032
## [5,] -0.4435452 -0.2072242
apply(a, 2, sum)
## [1]  0.07987277 -0.39366342
# lapply
a <- list(a1 = c(10, 10), a2 = 10)
lapply(a, sum)
## $a1
## [1] 20
## 
## $a2
## [1] 10
# malapply
Q1 <- matrix(c(rep(1, 4), rep(2, 4), rep(3, 4), rep(4, 4)),4,4)

# Print `Q1`
print(Q1)
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    1    2    3    4
## [3,]    1    2    3    4
## [4,]    1    2    3    4
# Or use `mapply()`
Q2 <- mapply(rep,1:4,4)

# Print `Q2`
print(Q2)
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    1    2    3    4
## [3,]    1    2    3    4
## [4,]    1    2    3    4

The number used to multiply variable. explain how much changes x will need in order for y to change 1 unit

Draw sample from a population, and use sample mean to estimate the population mean, the confidence interval tell us how much confidence we have to make a conclusion that the population mean is in the range of this interval.

*Evaluate the estimator

MSE = variance + Bias ^ 2 Bias = E(theta^ - theta) Consistent: when n (sample size increases), MSE decreases Efficient: when MSE is small Unbised : (bias == 0)

* How to evaluate the results of a survey?
## coverage error: unknown chances of individuals included in the sample
## sample error: the sample population does not represent the characteristics of the population
## measurement error: the answer does not reflect the topic accurately
## unresponse error: lack of response of people in the sample
* Bivariate distribution simulation
library(mixtools)  #for ellipse
## mixtools package, version 1.1.0, Released 2017-03-10
## This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772.
N <- 200 # Number of random samples
set.seed(123)
# Target parameters for univariate normal distributions
rho <- -0.6
mu1 <- 1; s1 <- 2
mu2 <- 1; s2 <- 8

# Parameters for bivariate normal distribution
mu <- c(mu1,mu2) # Mean 
sigma <- matrix(c(s1^2, s1*s2*rho, s1*s2*rho, s2^2),
           2) # Covariance matrix

# Function to draw ellipse for bivariate normal data
ellipse_bvn <- function(bvn, alpha){
  Xbar <- apply(bvn,2,mean)
  S <- cov(bvn)
  ellipse(Xbar, S, alpha = alpha, col="red")
}

library(MASS)
mybvd <- mvrnorm(N, mu = mu, Sigma = sigma)
colnames(mybvd) <- c('X1', 'X2')
mybvd
##                  X1          X2
##   [1,] -1.736404159  -4.0180849
##   [2,] -0.763548866  -1.1607058
##   [3,] -0.531291499  13.5285462
##   [4,]  0.063158545   1.4313013
##   [5,]  1.486094242   2.1348926
##   [6,] -0.396578755  14.8303129
##   [7,]  1.656993892   4.8779822
##   [8,]  3.508274041  -8.9707367
##   [9,] -0.722467323  -4.8949803
##  [10,]  1.640716027  -2.5504587
##  [11,] -0.714274650  10.7590130
##  [12,]  0.170067967   3.8177285
##  [13,] -1.426202539   3.9040380
##  [14,]  1.668278009   2.0109355
##  [15,]  3.244774033  -3.2025552
##  [16,] -3.849030985  14.8798986
##  [17,]  1.067959411   5.0885666
##  [18,]  4.584698775 -14.5492413
##  [19,]  2.056429229   6.9098073
##  [20,]  3.597733350  -2.4671867
##  [21,]  3.229817128  -7.3986059
##  [22,]  0.306377190  -0.8937362
##  [23,]  0.546366453  -7.4749487
##  [24,]  0.804098830  -5.0010233
##  [25,]  2.348466898  -3.9092807
##  [26,]  3.012010015 -12.5018473
##  [27,]  1.055298950   7.8710550
##  [28,]  1.929327638   2.4013683
##  [29,]  1.038242136  -8.3166581
##  [30,]  1.021983996  11.2735929
##  [31,] -2.587796194   3.9331647
##  [32,]  1.509456082  -1.3374422
##  [33,] -0.452574165   8.1053422
##  [34,]  1.057960552   8.2019534
##  [35,]  0.872056448   7.7097054
##  [36,]  2.198475623   6.8278254
##  [37,]  0.594435291   5.4739025
##  [38,]  0.422550453   0.4027326
##  [39,]  0.875132067  -1.5256720
##  [40,]  2.696195310  -1.8517094
##  [41,]  3.099505959  -4.3626999
##  [42,]  2.044297166  -0.5400615
##  [43,]  0.241661163  -9.4834038
##  [44,]  0.069872266  18.6210066
##  [45,] -0.228015158  10.7028793
##  [46,] -0.570859497  -8.4447416
##  [47,]  1.660682477  -2.1969488
##  [48,]  3.707469399  -2.3997955
##  [49,]  1.065237605   7.3989795
##  [50,]  0.345451903   0.2149374
##  [51,]  1.270742340   3.1172281
##  [52,]  1.913657932   0.9087929
##  [53,]  1.590939884   0.7410889
##  [54,] -0.849744846  11.9216521
##  [55,] -1.216121341  -1.1952587
##  [56,] -0.754504635  13.1477278
##  [57,]  1.244269342 -11.6479051
##  [58,] -0.715389699   5.5208729
##  [59,]  1.022982756   2.0180939
##  [60,]  3.125869241   3.1006562
##  [61,]  1.340455332   4.1628219
##  [62,]  2.392518479  -2.8972230
##  [63,]  1.342232435  -1.6759222
##  [64,]  0.239638978  -7.4619768
##  [65,] -1.245475682  -8.1297028
##  [66,] -1.797225392   3.0495924
##  [67,]  0.648602111   4.6164910
##  [68,]  3.678704110   1.8523134
##  [69,]  0.456332211   8.4695457
##  [70,] -1.698375883  17.3712908
##  [71,]  0.292446224  -3.1325502
##  [72,]  2.378261487 -17.6995545
##  [73,] -1.324738845   8.8752407
##  [74,]  4.065592521  -4.3306131
##  [75,]  0.531083947  -4.7087665
##  [76,]  0.417672003   9.3096865
##  [77,]  1.082303652  -1.3197651
##  [78,]  2.407236449  -8.7793864
##  [79,]  0.104610155   2.3453090
##  [80,]  1.134809447  -0.1166333
##  [81,]  3.598511499   1.4528459
##  [82,] -0.631815252   3.9011536
##  [83,]  0.859437970  -2.0580665
##  [84,]  0.610795363   6.2174169
##  [85,]  1.090597280  -0.7918918
##  [86,]  0.376402565   3.6203249
##  [87,] -0.714485664   9.7167179
##  [88,] -2.107300478   4.0795749
##  [89,]  1.749139763  -1.5528064
##  [90,] -0.696604831  10.1451907
##  [91,] -2.065913819   8.6593266
##  [92,] -1.331857659   5.1279851
##  [93,] -1.087655475   2.6295972
##  [94,]  2.686160033  -3.8800488
##  [95,] -3.827617140  11.3916845
##  [96,]  1.645030140  -3.8161144
##  [97,] -4.647557860  18.0351387
##  [98,]  1.197960842  13.5847150
##  [99,]  1.261417646  -0.8898447
## [100,]  0.328009239  -7.5124457
## [101,]  3.004434546  -4.5061380
## [102,]  1.855551755   3.2377205
## [103,]  2.774550368  -0.7436759
## [104,]  3.078539331  -1.5223042
## [105,]  2.870972032  -6.5027706
## [106,]  0.538682494   0.5591598
## [107,]  5.127278500  -4.7849816
## [108,]  2.750718600 -12.3890402
## [109,] -0.457900258  -2.3420649
## [110,] -3.331165551   7.8515207
## [111,] -0.315138728  -3.9180366
## [112,] -0.941464730   5.6768499
## [113,]  5.717790592 -11.5158351
## [114,]  2.009308085   0.7024389
## [115,]  0.901791960   5.2392016
## [116,] -0.475281140   3.2364932
## [117,]  1.033220922   1.8707933
## [118,]  3.766595034  -3.8162375
## [119,] -0.571584834  -6.2053651
## [120,]  0.854146956  -7.4115400
## [121,]  0.482125929   1.8828178
## [122,]  0.279168300  -6.8734106
## [123,]  3.704384905  -2.5960622
## [124,]  0.287020292  -1.2089815
## [125,] -0.484424347  15.8715933
## [126,]  0.745316828  -4.3799628
## [127,]  0.801228092   2.8970535
## [128,] -0.086419120   1.4689962
## [129,]  0.113649301  -7.0170533
## [130,]  1.077616833   0.4280223
## [131,] -2.393234231  12.3028162
## [132,]  2.293545002   4.9002524
## [133,]  2.076158551   1.5057339
## [134,] -0.846657032  -2.7489919
## [135,]  2.973191036 -15.5103978
## [136,]  2.794770960  10.5470962
## [137,]  4.954747889 -10.3469521
## [138,]  0.390129841   6.9658049
## [139,] -2.735922079  16.0545339
## [140,]  2.961518554 -10.5209214
## [141,] -0.851383931   6.4594036
## [142,] -0.171322565  -1.3305386
## [143,]  0.351095208 -11.9789322
## [144,]  2.803116383 -11.1253713
## [145,]  3.080313967 -11.7936518
## [146,]  4.402417109  -2.8176031
## [147,]  2.669389607 -10.7128365
## [148,]  1.034933058   6.6402689
## [149,] -0.099364923  18.0306535
## [150,]  2.887641874  -9.2475710
## [151,] -1.569295450   7.0514014
## [152,]  3.154059476   7.6355763
## [153,]  1.253029950   3.7606128
## [154,]  2.076419638  -7.0917137
## [155,]  2.544889223   0.2627072
## [156,]  0.828218788  -1.3235699
## [157,] -0.345667282   5.4014603
## [158,]  1.516513271  -1.9700660
## [159,]  3.633888750   9.4136693
## [160,] -2.550767408  -2.6225196
## [161,]  0.006789361   9.4678553
## [162,]  1.292015427  -7.5483615
## [163,]  2.145156531  -9.1433353
## [164,] -4.646781417  26.6663106
## [165,]  0.242605099  -2.5327650
## [166,]  0.955580365   3.4358871
## [167,] -0.385534508   5.9979415
## [168,]  3.081227090  -2.6378280
## [169,] -0.984246494   4.9239410
## [170,]  1.259896746   4.0628067
## [171,] -2.507765219  -1.3117761
## [172,]  3.498537209   1.9248484
## [173,]  1.767580784   0.8407710
## [174,] -2.946587999  17.8183594
## [175,]  1.128190333  -5.0523698
## [176,]  3.289216065  -7.6200987
## [177,]  2.510466735   1.5453153
## [178,]  0.386681718   3.4474485
## [179,]  0.477476519   4.4940541
## [180,]  4.369756312  -2.2285071
## [181,]  2.273282554  -7.5110818
## [182,] -0.874010579  11.0543789
## [183,]  1.163404705  -1.8385242
## [184,]  3.728997289  -5.6635288
## [185,]  0.550893717  -1.0055027
## [186,] -0.908123949  -0.9129538
## [187,] -1.098379580   9.7639420
## [188,]  2.668773881   1.9545918
## [189,]  0.739536663   7.1358941
## [190,]  1.082386353  -3.0769053
## [191,]  1.743439346   2.8726012
## [192,]  4.776962727  -1.0699562
## [193,] -0.499850515   1.5406175
## [194,]  3.413814753  -5.9572384
## [195,]  3.532463349  -9.3416254
## [196,] -3.843073555  16.6034252
## [197,]  1.459907837   5.9922763
## [198,]  1.240277117  -9.2118190
## [199,]  3.732901095  -3.5795305
## [200,]  3.033781745  -8.3929459
gibbs<-function (n, mu1, s1, mu2, s2, rho) 
{
  mat <- matrix(ncol = 2, nrow = n)
  x <- 0
  y <- 0
  mat[1, ] <- c(x, y)
  for (i in 2:n) {
    x <- rnorm(1, mu1 + 
          (s1/s2) * rho * (y - mu2), sqrt((1 - rho^2)*s1^2))
    y <- rnorm(1, mu2 + 
          (s2/s1) * rho * (x - mu1), sqrt((1 - rho^2)*s2^2))
    mat[i, ] <- c(x, y)
  }
  mat
}
bvn4 <- gibbs(N,mu1,s1,mu2,s2,rho)
colnames(bvn4) <- c("bvn4_X1","bvn4_X2")
## a simple format for gibbs sampling
gibbs <- function(n,  rho) {
    
    result <- matrix(ncol = 2, nrow = n)
    x <- 0
    y <- 0
    result[1, ] <- c(x, y)
    for (i in 2:n) {
        x <- rnorm(1, y*rho, sqrt(1- rho^2))
        y <- rnorm(1, x*rho, sqrt(1-rho^2))
        result[i, ] <- c(x, y)
    }
    result
}

A/B test summary

set.seed(12)
d <- rbind(
    data.frame(group = 'A', 
               converted = rbinom(1000000, size = 1, p = 0.05)),
    data.frame(group = 'B',
               converted = rbinom(10000, size = 1, p = 0.06))
)

tab <- table(d)

## calculating conversion rate
aConverRate <- tab['A', '1']/sum(tab['A',])
bConverRate <- tab['B', '1']/sum(tab['B',])
## testing the significance using fisher's test for independnnce
fisher.test(tab)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  tab
## p-value = 0.003239
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.044012 1.240187
## sample estimates:
## odds ratio 
##   1.138793
##CommonRate 
commonRate <- sum(tab[, '1'])/sum(tab)
Bayesian posterior estimate

The most common bayesian analysis for binomial experiments is to encode our priors and posteries as a beta distribution and measure how much mass is in the left tail of the distribution.

print(pbeta(aConverRate, shape1 =commonRate + tab['B', '1'],
            shape2 = (1- commonRate) + tab['B','0']))
## [1] 0.001650022

This is the posterior estimate of seeing the probability larger than Aconversionrate given the observed B data. This means the unknown B conversion rate is likely to be larger than A conversion rate.

library(ggplot2)
plt <- data.frame(x = seq(from = 0.04, to = 0.07, length.out = 301))
plt$density <- dbeta(plt$x, shape1 = commonRate + tab['B', '1'],
                     shape2 = (1- commonRate) + tab['B','0'])
ggplot(data = plt) +
    geom_line(aes(x= x, y = density)) + 
    geom_vline(aes(xintercept = aConverRate))