A First Touch in Monte-Carlo Simulation

In this world filled with randomness, Monte-Carlo simulation (in homage to the famous Monte-Carlo casino) provides an easy way of approximating probabilities, statistics (e.g., \(\bar{x}\) ), and any quantities/intervals of interest. One can simulate a random/stochastic/probabilistic experiment (with uncertain outcomes) many times, and estimate the probability of an event (E) by the relative frequency of event occurrence over a long series of experiment. That is, repeat the experiment over and over again, and then count how many times an event occurs.Below is the underlying rationale of simulation experiments. Define

\(N_n(E)=\) the number of time \(E\) occurs in n repetitions

where $N_n(E) $. Accordingly, we can define \[P(E) = lim_{n\rightarrow \infty} \frac{N_n(E)}{n}\]

The statement of “limit” is for mathematical convenience and supposed to be taken seriously. We focus on two nice R functions – sample and replicate – that greatly simplify Monte-Carlo simulation experiments. Let’s simulate probabilities of tossing a coin.

##Simulating coin-tossing experiments
x = sample(c("H","T"), 10, replace=TRUE) # replace=TRUE 取後放回
x
##  [1] "H" "H" "T" "T" "H" "T" "T" "T" "H" "T"
table(x)
## x
## H T 
## 4 6
table(x)/10
## x
##   H   T 
## 0.4 0.6

發現抽10次,每次出來的答案都不太一樣 -> variablility high, confidence low

x = sample(c("H","T"), 100, replace=TRUE)
table(x)/100
## x
##    H    T 
## 0.51 0.49

How about increasing the number of tosses to 10,000 times?

x=sample(c("H","T"), 10000, replace=TRUE)
table(x)/10000
## x
##      H      T 
## 0.4891 0.5109

大數法則,我們要確保n不要太小,只要數量變大變異就會變小,信心變高。

Simulation of a coin-tossing game

KoP and Han are playing a coin-tossing game using a FAIR coin. Each time a head comes up, KoP wins a dollar from Han. Otherwise Han wins a dollar to KoP. KoP starts with zero dollars and decides to play the game for 50 tosses.

##Simulating a game of chance
win=sample(c(-1,1), size=50, replace=T)
cum.win=cumsum(win)
cum.win
##  [1] -1  0 -1  0 -1  0  1  0 -1 -2 -1 -2 -1 -2 -1  0 -1  0  1  2  3  4  3  4  5
## [26]  4  3  2  3  4  5  6  5  6  5  4  5  6  5  4  3  4  5  6  5  6  5  6  5  6
sum(win)
## [1] 6
#
win=sample(c(-1,1),
           size=50,replace=T,
           prob=c(0.6,0.4))

KoP = function (n=50) {
    win=sample(c(-1,1),size=n,replace=T)
    sum(win)
}

F=replicate(10000,KoP())
table(F)
## F
##  -24  -22  -20  -18  -16  -14  -12  -10   -8   -6   -4   -2    0    2    4    6 
##    1   13   18   40   64  178  264  391  610  824  941 1044 1073 1125  987  787 
##    8   10   12   14   16   18   20   22   24   26 
##  621  423  264  163   86   48   24    4    6    1
plot(table(F))

sum(F==0)/10000
## [1] 0.1073

This simulated probability, should be close to the theoretical probability 0.1122752. That is, KoP breaks even if there are exactly 25 heads in the Bernoulli experiment of 50 trials.

choose(50,25)*0.5^25*0.5^25
## [1] 0.1122752

A simulation like this allows us to answer questions that are difficult to address analytically. (a) What is the likely number of tosses where KoP will be in the lead?
(b) What will be the value of KoP’s highest fortune during the game?
(c) How likely KoP will have a maximum cumulative earning > 10?

#
KoP=function(n=50){
  win=sample(c(-1,1),size=n,replace=T)
  cum.win=cumsum(win) 
  c(F=sum(win),        # earnings
    L=sum(cum.win>0),  # 在每個賭局中有沒有處於領先 lead
    M=max(cum.win))    # 累積損益贏最大是多少 answer highest earning
}
S=replicate(1000, KoP()) # output: 1000*3
#
S[,1:10]
##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## F    6    0    4    8    0   -4    0   -2    2    -6
## L   46   10   27   50   36    1    6    2   34     0
## M   10    2    5    8    5    1    2    1    5     0
# (a) What is the likely number of tosses where KoP will be in the lead? 
mean(S["L",])
## [1] 21.989
# (b) What will be the value of KoP’s highest fortune during the game? 
mean(S["M",])
## [1] 4.869
# (c) How likely KoP will have a maximum cumulative earning > 10?
sum(S["M",] > 10)/1000
## [1] 0.107

IC Module Analysis

Through this simple exercise, I hope you have started to realize the power of simulation. Let’s consider another case from high-tech sectors The TSMC produces sophisticated IC modules in production runs of several thousand at a time. It has been found that the fraction of defective modules can be very different in different production runs. These differences are caused by irregularities that sometimes occur in the electronic content. For a simple first model, we deliberately assume that there are just two possible values of the defective rate. In about 70% of the production runs, the electric current is regular, in which every IC module that is produced has an independent 10% chance of being defective. In the other 30% of production runs, when current is irregular, every IC module produced has an independent 40% chance of being defective.

Testing these IC modules is quite expensive. So it is valuable to make inferences about the overall rate of defective based on a small sample of tested modules in each production run, from which we test 10 IC modules and aim to answer the following questions.

  1. What is the probability of finding exactly 2 defective IC modules among the 10 tested?
  2. What is the conditional probability that the current is exclusively regular given that 2 defective IC modules are found among the 10 tested?
  3. What is P(at least 1 irregular current| k defectives among 10 tested), for k=0, 1, 2, 3?

computer simulation simulate production run!

preg=0.7
pbad.reg=0.1
pgood.reg=1-pbad.reg
#
pirreg=1-preg
pbad.irreg=0.4
pgood.irreg=1-pbad.irreg
# n = num of sample
ICmodule.sim = function(n=10) {
   simulated.modules=rep(NA,n)
   # 1 denotes regular
   # -1 denotes irregular
   labels=sample(c(1,-1),n,
                 prob=c(preg,pirreg),
                 replace=TRUE)
   if(any(labels==1)){
      simulated.modules[which(labels==1)]=
        sample(c("goodreg","badreg"),
               sum(labels==1),
               prob=c(pgood.reg,pbad.reg),
               replace=TRUE) 
   }
   if(any(labels==-1)){
      simulated.modules[which(labels==-1)]=
       sample(c("goodirreg","badirreg"),
              sum(labels==-1),
              prob=c(pgood.irreg,pbad.irreg),
              replace=TRUE) 
   }
   simulated.modules
}
ICmodule.sim()
##  [1] "badirreg"  "goodreg"   "goodreg"   "goodreg"   "goodirreg" "goodreg"  
##  [7] "goodreg"   "badirreg"  "goodreg"   "goodreg"
S=10000
sim.table=replicate(S,ICmodule.sim())
dim(sim.table)
## [1]    10 10000
badnum=c()
badnumreg=c()
badnumirreg=c()
numreg=c()
numirreg=c()
for(i in 1:ncol(sim.table)){
    badnumreg[i]=sum(sim.table[,i]=="badreg")
    badnumirreg[i]=sum(sim.table[,i]=="badirreg")
    badnum[i]=sum(sim.table[,i]=="badreg")+
              sum(sim.table[,i]=="badirreg")
    numreg[i]=sum(sim.table[,i]=="badreg")+
              sum(sim.table[,i]=="goodreg")
    numirreg[i]=sum(sim.table[,i]=="badirreg")+
                sum(sim.table[,i]=="goodirreg")
}

\[P(2\; defects)=\frac{num\;of \;runs\; with\; 2\, defects}{10,000}\]

sum(badnum==2)/S
## [1] 0.3057

$$$$

sum(numreg==10 & badnum==2)
## [1] 62
sum(numreg==10 & badnum==2)/sum(badnum==2)
## [1] 0.02028132

\[P(at \; least \; 1 \; irreg | k \; defects)=\frac{P(at \; least|irreg\,\&\,defects)}{P(k \; defects)}\]

k=1
sum(badnum==k)
## [1] 2788
sum(numirreg>=1 & badnum==k)
## [1] 2687
sum(numirreg>=1 & badnum==k)/sum(badnum==k)
## [1] 0.9637733

Wait Vanessa’s Offer?

Vanessa Pay

Vanessa Pay

#John's payoff is deterministic
John_pay = 12000

#The probability of getting the job from Vanessa
prob_interval = seq(0.1,0.9,0.1)

#Simulation runs
simulation_runs = 1:10000

#Earnings of waiting for Vanessa's uncertain offer
results1 = matrix(NA, nrow=length(prob_interval),
                       ncol=length(simulation_runs))
rownames(results1) = prob_interval 
colnames(results1) = simulation_runs

sim.School_pay=sample(c(21600,16800,12000,6000,0),length(simulation_runs),
                      prob = c(0.05,0.25,0.4,0.25,0.05),replace=TRUE)
# every time the prob we thought we get the job from Vanessa
for(p in prob_interval){
  # every prob we run 10,000 times of simulation
  for(i in 1:length(simulation_runs)){
    
    #The payoff from Vanessa is deterministic
    Vanessa_pay = 14000
    #The chance to get the job from Vanessa
    Vanessa_offer = sample(c(1,0),1, prob = c(p,1-p))
    
    Vanessa_pay = Vanessa_offer*Vanessa_pay
    
    #The payoff from school is uncertain
    School_pay = sim.School_pay[i]
    
    if(Vanessa_offer==1){
      results1[which(p==prob_interval),i] = Vanessa_pay
    }else{
      results1[which(p==prob_interval),i] = School_pay
    }
  }
  cat("P(Vanessa Offer=1)=",p, "\n")
}
## P(Vanessa Offer=1)= 0.1 
## P(Vanessa Offer=1)= 0.2 
## P(Vanessa Offer=1)= 0.3 
## P(Vanessa Offer=1)= 0.4 
## P(Vanessa Offer=1)= 0.5 
## P(Vanessa Offer=1)= 0.6 
## P(Vanessa Offer=1)= 0.7 
## P(Vanessa Offer=1)= 0.8 
## P(Vanessa Offer=1)= 0.9
# Earnings of waiting for Vanessa's uncertain offer
# we wanna know the percentile of payoff under different probability from Vanessa
# 你自己覺得自己機會高,也等於確定性高,分布比較不太會變動,報酬就是拿Vanessa的錢。

pctile.range = c(0.05,0.1,0.25,0.5,0.75,0.9,0.95)
sim.pctile1 = matrix(NA, ncol=length(pctile.range), 
                         nrow= length(prob_interval))
colnames(sim.pctile1)=pctile.range
rownames(sim.pctile1)=prob_interval
  
#apply(results1,1,summary)

for(i in 1:nrow(results1)){
  sim.pctile1[i,] = quantile(results1[i,], probs = pctile.range)
}

sim.pctile1
##      0.05   0.1  0.25   0.5  0.75   0.9  0.95
## 0.1  6000  6000  6000 12000 16800 16800 16800
## 0.2  6000  6000 12000 12000 14000 16800 16800
## 0.3  6000  6000 12000 14000 14000 16800 16800
## 0.4  6000  6000 12000 14000 14000 16800 16800
## 0.5  6000  6000 12000 14000 14000 16800 16800
## 0.6  6000  6000 12000 14000 14000 16800 16800
## 0.7  6000 12000 14000 14000 14000 14000 16800
## 0.8  6000 12000 14000 14000 14000 14000 16800
## 0.9 12000 14000 14000 14000 14000 14000 14000
# Prob(offer >= John)
# 大於John Pay 需要至少多少Vanessa工作的機會
Pbetter1=c() 
for(i in 1:nrow(results1)){
    Pbetter1[i]=
      sum(results1[i,] >= John_pay)/ncol(results1)
}

plot(prob_interval,Pbetter1,type='l',xaxt='n',lwd=3,
     xlab="P(Vanessa offer=1)",ylab="P(Earning>=John offer)")
axis(1,seq(0.1,0.9,0.1))
abline(h=0.95,col='red',lty=2,lwd=2)
abline(h=0.9,col='green',lty=3,lwd=2)