interview30-1.rmd

* Suppose N students participate in a coin flip experiment, when they get heads they stop, when they get tails they keep going. All students will stop after the second trial no matter the results. Y is the binary indicator of whether they claim they cheated in the experiment. Estimate how many students cheat in this experiment.

here

this used the second part of bayes rule, which is p(A) = p(A|Ec)*p(Ec) + p(A|E)*p(E)

## using bayes rule
# # we want to know p(cheat|1st head)
# 
# p (c|1st head) 
# p(cheat) = p(cheat\1st head)p(1st head) + p(cheat|1st teail)p(1st tail)
# 
# p(c|1st tail) = p(c|2nd head)P(2nd head) + p(c|2nd tail)p(2nd tail) = 1*1/2 = 1/2
# 
# so that
# p(cheat) = p(cheat|1st head)*1/2 + 1/2*1/2
# 
# y/N = 1/2* X + 1/4
#

* Apply function

# apply(x, 1, function)
a <- matrix(rnorm(10), ncol = 2, nrow = 5)
a

##            [,1]       [,2]
## [1,]  1.0809507  0.1948797
## [2,]  0.8653803 -0.7371508
## [3,] -0.4959525  0.5607351
## [4,] -0.9269606 -0.2049032
## [5,] -0.4435452 -0.2072242

apply(a, 2, sum)

## [1]  0.07987277 -0.39366342

# lapply
a <- list(a1 = c(10, 10), a2 = 10)
lapply(a, sum)

## $a1
## [1] 20
## 
## $a2
## [1] 10

# malapply
Q1 <- matrix(c(rep(1, 4), rep(2, 4), rep(3, 4), rep(4, 4)),4,4)

# Print `Q1`
print(Q1)

##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    1    2    3    4
## [3,]    1    2    3    4
## [4,]    1    2    3    4

# Or use `mapply()`
Q2 <- mapply(rep,1:4,4)

# Print `Q2`
print(Q2)

##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    1    2    3    4
## [3,]    1    2    3    4
## [4,]    1    2    3    4

explaining coefficients

The number used to multiply variable. explain how much changes x will need in order for y to change 1 unit

explain confidence interval

Draw sample from a population, and use sample mean to estimate the population mean, the confidence interval tell us how much confidence we have to make a conclusion that the population mean is in the range of this interval.

*Evaluate the estimator

MSE = variance + Bias ^ 2 Bias = E(theta^ - theta) Consistent: when n (sample size increases), MSE decreases Efficient: when MSE is small Unbised : (bias == 0)

* How to evaluate the results of a survey?

## coverage error: unknown chances of individuals included in the sample
## sample error: the sample population does not represent the characteristics of the population
## measurement error: the answer does not reflect the topic accurately
## unresponse error: lack of response of people in the sample

* Bivariate distribution simulation

library(mixtools)  #for ellipse

## mixtools package, version 1.1.0, Released 2017-03-10
## This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772.

N <- 200 # Number of random samples
set.seed(123)
# Target parameters for univariate normal distributions
rho <- -0.6
mu1 <- 1; s1 <- 2
mu2 <- 1; s2 <- 8

# Parameters for bivariate normal distribution
mu <- c(mu1,mu2) # Mean 
sigma <- matrix(c(s1^2, s1*s2*rho, s1*s2*rho, s2^2),
           2) # Covariance matrix

# Function to draw ellipse for bivariate normal data
ellipse_bvn <- function(bvn, alpha){
  Xbar <- apply(bvn,2,mean)
  S <- cov(bvn)
  ellipse(Xbar, S, alpha = alpha, col="red")
}

library(MASS)
mybvd <- mvrnorm(N, mu = mu, Sigma = sigma)
colnames(mybvd) <- c('X1', 'X2')
mybvd

##                  X1          X2
##   [1,] -1.736404159  -4.0180849
##   [2,] -0.763548866  -1.1607058
##   [3,] -0.531291499  13.5285462
##   [4,]  0.063158545   1.4313013
##   [5,]  1.486094242   2.1348926
##   [6,] -0.396578755  14.8303129
##   [7,]  1.656993892   4.8779822
##   [8,]  3.508274041  -8.9707367
##   [9,] -0.722467323  -4.8949803
##  [10,]  1.640716027  -2.5504587
##  [11,] -0.714274650  10.7590130
##  [12,]  0.170067967   3.8177285
##  [13,] -1.426202539   3.9040380
##  [14,]  1.668278009   2.0109355
##  [15,]  3.244774033  -3.2025552
##  [16,] -3.849030985  14.8798986
##  [17,]  1.067959411   5.0885666
##  [18,]  4.584698775 -14.5492413
##  [19,]  2.056429229   6.9098073
##  [20,]  3.597733350  -2.4671867
##  [21,]  3.229817128  -7.3986059
##  [22,]  0.306377190  -0.8937362
##  [23,]  0.546366453  -7.4749487
##  [24,]  0.804098830  -5.0010233
##  [25,]  2.348466898  -3.9092807
##  [26,]  3.012010015 -12.5018473
##  [27,]  1.055298950   7.8710550
##  [28,]  1.929327638   2.4013683
##  [29,]  1.038242136  -8.3166581
##  [30,]  1.021983996  11.2735929
##  [31,] -2.587796194   3.9331647
##  [32,]  1.509456082  -1.3374422
##  [33,] -0.452574165   8.1053422
##  [34,]  1.057960552   8.2019534
##  [35,]  0.872056448   7.7097054
##  [36,]  2.198475623   6.8278254
##  [37,]  0.594435291   5.4739025
##  [38,]  0.422550453   0.4027326
##  [39,]  0.875132067  -1.5256720
##  [40,]  2.696195310  -1.8517094
##  [41,]  3.099505959  -4.3626999
##  [42,]  2.044297166  -0.5400615
##  [43,]  0.241661163  -9.4834038
##  [44,]  0.069872266  18.6210066
##  [45,] -0.228015158  10.7028793
##  [46,] -0.570859497  -8.4447416
##  [47,]  1.660682477  -2.1969488
##  [48,]  3.707469399  -2.3997955
##  [49,]  1.065237605   7.3989795
##  [50,]  0.345451903   0.2149374
##  [51,]  1.270742340   3.1172281
##  [52,]  1.913657932   0.9087929
##  [53,]  1.590939884   0.7410889
##  [54,] -0.849744846  11.9216521
##  [55,] -1.216121341  -1.1952587
##  [56,] -0.754504635  13.1477278
##  [57,]  1.244269342 -11.6479051
##  [58,] -0.715389699   5.5208729
##  [59,]  1.022982756   2.0180939
##  [60,]  3.125869241   3.1006562
##  [61,]  1.340455332   4.1628219
##  [62,]  2.392518479  -2.8972230
##  [63,]  1.342232435  -1.6759222
##  [64,]  0.239638978  -7.4619768
##  [65,] -1.245475682  -8.1297028
##  [66,] -1.797225392   3.0495924
##  [67,]  0.648602111   4.6164910
##  [68,]  3.678704110   1.8523134
##  [69,]  0.456332211   8.4695457
##  [70,] -1.698375883  17.3712908
##  [71,]  0.292446224  -3.1325502
##  [72,]  2.378261487 -17.6995545
##  [73,] -1.324738845   8.8752407
##  [74,]  4.065592521  -4.3306131
##  [75,]  0.531083947  -4.7087665
##  [76,]  0.417672003   9.3096865
##  [77,]  1.082303652  -1.3197651
##  [78,]  2.407236449  -8.7793864
##  [79,]  0.104610155   2.3453090
##  [80,]  1.134809447  -0.1166333
##  [81,]  3.598511499   1.4528459
##  [82,] -0.631815252   3.9011536
##  [83,]  0.859437970  -2.0580665
##  [84,]  0.610795363   6.2174169
##  [85,]  1.090597280  -0.7918918
##  [86,]  0.376402565   3.6203249
##  [87,] -0.714485664   9.7167179
##  [88,] -2.107300478   4.0795749
##  [89,]  1.749139763  -1.5528064
##  [90,] -0.696604831  10.1451907
##  [91,] -2.065913819   8.6593266
##  [92,] -1.331857659   5.1279851
##  [93,] -1.087655475   2.6295972
##  [94,]  2.686160033  -3.8800488
##  [95,] -3.827617140  11.3916845
##  [96,]  1.645030140  -3.8161144
##  [97,] -4.647557860  18.0351387
##  [98,]  1.197960842  13.5847150
##  [99,]  1.261417646  -0.8898447
## [100,]  0.328009239  -7.5124457
## [101,]  3.004434546  -4.5061380
## [102,]  1.855551755   3.2377205
## [103,]  2.774550368  -0.7436759
## [104,]  3.078539331  -1.5223042
## [105,]  2.870972032  -6.5027706
## [106,]  0.538682494   0.5591598
## [107,]  5.127278500  -4.7849816
## [108,]  2.750718600 -12.3890402
## [109,] -0.457900258  -2.3420649
## [110,] -3.331165551   7.8515207
## [111,] -0.315138728  -3.9180366
## [112,] -0.941464730   5.6768499
## [113,]  5.717790592 -11.5158351
## [114,]  2.009308085   0.7024389
## [115,]  0.901791960   5.2392016
## [116,] -0.475281140   3.2364932
## [117,]  1.033220922   1.8707933
## [118,]  3.766595034  -3.8162375
## [119,] -0.571584834  -6.2053651
## [120,]  0.854146956  -7.4115400
## [121,]  0.482125929   1.8828178
## [122,]  0.279168300  -6.8734106
## [123,]  3.704384905  -2.5960622
## [124,]  0.287020292  -1.2089815
## [125,] -0.484424347  15.8715933
## [126,]  0.745316828  -4.3799628
## [127,]  0.801228092   2.8970535
## [128,] -0.086419120   1.4689962
## [129,]  0.113649301  -7.0170533
## [130,]  1.077616833   0.4280223
## [131,] -2.393234231  12.3028162
## [132,]  2.293545002   4.9002524
## [133,]  2.076158551   1.5057339
## [134,] -0.846657032  -2.7489919
## [135,]  2.973191036 -15.5103978
## [136,]  2.794770960  10.5470962
## [137,]  4.954747889 -10.3469521
## [138,]  0.390129841   6.9658049
## [139,] -2.735922079  16.0545339
## [140,]  2.961518554 -10.5209214
## [141,] -0.851383931   6.4594036
## [142,] -0.171322565  -1.3305386
## [143,]  0.351095208 -11.9789322
## [144,]  2.803116383 -11.1253713
## [145,]  3.080313967 -11.7936518
## [146,]  4.402417109  -2.8176031
## [147,]  2.669389607 -10.7128365
## [148,]  1.034933058   6.6402689
## [149,] -0.099364923  18.0306535
## [150,]  2.887641874  -9.2475710
## [151,] -1.569295450   7.0514014
## [152,]  3.154059476   7.6355763
## [153,]  1.253029950   3.7606128
## [154,]  2.076419638  -7.0917137
## [155,]  2.544889223   0.2627072
## [156,]  0.828218788  -1.3235699
## [157,] -0.345667282   5.4014603
## [158,]  1.516513271  -1.9700660
## [159,]  3.633888750   9.4136693
## [160,] -2.550767408  -2.6225196
## [161,]  0.006789361   9.4678553
## [162,]  1.292015427  -7.5483615
## [163,]  2.145156531  -9.1433353
## [164,] -4.646781417  26.6663106
## [165,]  0.242605099  -2.5327650
## [166,]  0.955580365   3.4358871
## [167,] -0.385534508   5.9979415
## [168,]  3.081227090  -2.6378280
## [169,] -0.984246494   4.9239410
## [170,]  1.259896746   4.0628067
## [171,] -2.507765219  -1.3117761
## [172,]  3.498537209   1.9248484
## [173,]  1.767580784   0.8407710
## [174,] -2.946587999  17.8183594
## [175,]  1.128190333  -5.0523698
## [176,]  3.289216065  -7.6200987
## [177,]  2.510466735   1.5453153
## [178,]  0.386681718   3.4474485
## [179,]  0.477476519   4.4940541
## [180,]  4.369756312  -2.2285071
## [181,]  2.273282554  -7.5110818
## [182,] -0.874010579  11.0543789
## [183,]  1.163404705  -1.8385242
## [184,]  3.728997289  -5.6635288
## [185,]  0.550893717  -1.0055027
## [186,] -0.908123949  -0.9129538
## [187,] -1.098379580   9.7639420
## [188,]  2.668773881   1.9545918
## [189,]  0.739536663   7.1358941
## [190,]  1.082386353  -3.0769053
## [191,]  1.743439346   2.8726012
## [192,]  4.776962727  -1.0699562
## [193,] -0.499850515   1.5406175
## [194,]  3.413814753  -5.9572384
## [195,]  3.532463349  -9.3416254
## [196,] -3.843073555  16.6034252
## [197,]  1.459907837   5.9922763
## [198,]  1.240277117  -9.2118190
## [199,]  3.732901095  -3.5795305
## [200,]  3.033781745  -8.3929459

Second method: use gibbs sampling

gibbs<-function (n, mu1, s1, mu2, s2, rho) 
{
  mat <- matrix(ncol = 2, nrow = n)
  x <- 0
  y <- 0
  mat[1, ] <- c(x, y)
  for (i in 2:n) {
    x <- rnorm(1, mu1 + 
          (s1/s2) * rho * (y - mu2), sqrt((1 - rho^2)*s1^2))
    y <- rnorm(1, mu2 + 
          (s2/s1) * rho * (x - mu1), sqrt((1 - rho^2)*s2^2))
    mat[i, ] <- c(x, y)
  }
  mat
}
bvn4 <- gibbs(N,mu1,s1,mu2,s2,rho)
colnames(bvn4) <- c("bvn4_X1","bvn4_X2")

## a simple format for gibbs sampling
gibbs <- function(n,  rho) {
    
    result <- matrix(ncol = 2, nrow = n)
    x <- 0
    y <- 0
    result[1, ] <- c(x, y)
    for (i in 2:n) {
        x <- rnorm(1, y*rho, sqrt(1- rho^2))
        y <- rnorm(1, x*rho, sqrt(1-rho^2))
        result[i, ] <- c(x, y)
    }
    result
}

A/B test summary

set.seed(12)
d <- rbind(
    data.frame(group = 'A', 
               converted = rbinom(1000000, size = 1, p = 0.05)),
    data.frame(group = 'B',
               converted = rbinom(10000, size = 1, p = 0.06))
)

tab <- table(d)

## calculating conversion rate
aConverRate <- tab['A', '1']/sum(tab['A',])
bConverRate <- tab['B', '1']/sum(tab['B',])
## testing the significance using fisher's test for independnnce
fisher.test(tab)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  tab
## p-value = 0.003239
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.044012 1.240187
## sample estimates:
## odds ratio 
##   1.138793

##CommonRate 
commonRate <- sum(tab[, '1'])/sum(tab)

Bayesian posterior estimate

The most common bayesian analysis for binomial experiments is to encode our priors and posteries as a beta distribution and measure how much mass is in the left tail of the distribution.

print(pbeta(aConverRate, shape1 =commonRate + tab['B', '1'],
            shape2 = (1- commonRate) + tab['B','0']))

## [1] 0.001650022

This is the posterior estimate of seeing the probability larger than Aconversionrate given the observed B data. This means the unknown B conversion rate is likely to be larger than A conversion rate.

library(ggplot2)
plt <- data.frame(x = seq(from = 0.04, to = 0.07, length.out = 301))
plt$density <- dbeta(plt$x, shape1 = commonRate + tab['B', '1'],
                     shape2 = (1- commonRate) + tab['B','0'])
ggplot(data = plt) +
    geom_line(aes(x= x, y = density)) + 
    geom_vline(aes(xintercept = aConverRate))

interview30-1.rmd

xg

7/30/2017

A/B test summary

Bayesian posterior estimate