In this world filled with randomness, Monte-Carlo simulation (in homage to the famous Monte-Carlo casino) provides an easy way of approximating probabilities, statistics (e.g., \(\bar{x}\) ), and any quantities/intervals of interest. One can simulate a random/stochastic/probabilistic experiment (with uncertain outcomes) many times, and estimate the probability of an event (E) by the relative frequency of event occurrence over a long series of experiment. That is, repeat the experiment over and over again, and then count how many times an event occurs.Below is the underlying rationale of simulation experiments. Define
\(N_n(E)=\) the number of time \(E\) occurs in n repetitions
where $N_n(E) $. Accordingly, we can define \[P(E) = lim_{n\rightarrow \infty} \frac{N_n(E)}{n}\]
The statement of “limit” is for mathematical convenience and supposed to be taken seriously. We focus on two nice R functions – sample and replicate – that greatly simplify Monte-Carlo simulation experiments. Let’s simulate probabilities of tossing a coin.
##Simulating coin-tossing experiments
x = sample(c("H","T"), 10, replace=TRUE) # replace=TRUE 取後放回
x
## [1] "H" "H" "T" "T" "H" "T" "T" "T" "H" "T"
table(x)
## x
## H T
## 4 6
table(x)/10
## x
## H T
## 0.4 0.6
發現抽10次,每次出來的答案都不太一樣 -> variablility high, confidence low
x = sample(c("H","T"), 100, replace=TRUE)
table(x)/100
## x
## H T
## 0.51 0.49
How about increasing the number of tosses to 10,000 times?
x=sample(c("H","T"), 10000, replace=TRUE)
table(x)/10000
## x
## H T
## 0.4891 0.5109
大數法則,我們要確保n不要太小,只要數量變大變異就會變小,信心變高。
KoP and Han are playing a coin-tossing game using a FAIR coin. Each time a head comes up, KoP wins a dollar from Han. Otherwise Han wins a dollar to KoP. KoP starts with zero dollars and decides to play the game for 50 tosses.
##Simulating a game of chance
win=sample(c(-1,1), size=50, replace=T)
cum.win=cumsum(win)
cum.win
## [1] -1 0 -1 0 -1 0 1 0 -1 -2 -1 -2 -1 -2 -1 0 -1 0 1 2 3 4 3 4 5
## [26] 4 3 2 3 4 5 6 5 6 5 4 5 6 5 4 3 4 5 6 5 6 5 6 5 6
sum(win)
## [1] 6
#
win=sample(c(-1,1),
size=50,replace=T,
prob=c(0.6,0.4))
KoP = function (n=50) {
win=sample(c(-1,1),size=n,replace=T)
sum(win)
}
F=replicate(10000,KoP())
table(F)
## F
## -24 -22 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6
## 1 13 18 40 64 178 264 391 610 824 941 1044 1073 1125 987 787
## 8 10 12 14 16 18 20 22 24 26
## 621 423 264 163 86 48 24 4 6 1
plot(table(F))
sum(F==0)/10000
## [1] 0.1073
This simulated probability, should be close to the theoretical probability 0.1122752. That is, KoP breaks even if there are exactly 25 heads in the Bernoulli experiment of 50 trials.
choose(50,25)*0.5^25*0.5^25
## [1] 0.1122752
A simulation like this allows us to answer questions that are difficult to address analytically. (a) What is the likely number of tosses where KoP will be in the lead?
(b) What will be the value of KoP’s highest fortune during the game?
(c) How likely KoP will have a maximum cumulative earning > 10?
#
KoP=function(n=50){
win=sample(c(-1,1),size=n,replace=T)
cum.win=cumsum(win)
c(F=sum(win), # earnings
L=sum(cum.win>0), # 在每個賭局中有沒有處於領先 lead
M=max(cum.win)) # 累積損益贏最大是多少 answer highest earning
}
S=replicate(1000, KoP()) # output: 1000*3
#
S[,1:10]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## F 6 0 4 8 0 -4 0 -2 2 -6
## L 46 10 27 50 36 1 6 2 34 0
## M 10 2 5 8 5 1 2 1 5 0
# (a) What is the likely number of tosses where KoP will be in the lead?
mean(S["L",])
## [1] 21.989
# (b) What will be the value of KoP’s highest fortune during the game?
mean(S["M",])
## [1] 4.869
# (c) How likely KoP will have a maximum cumulative earning > 10?
sum(S["M",] > 10)/1000
## [1] 0.107
Through this simple exercise, I hope you have started to realize the power of simulation. Let’s consider another case from high-tech sectors The TSMC produces sophisticated IC modules in production runs of several thousand at a time. It has been found that the fraction of defective modules can be very different in different production runs. These differences are caused by irregularities that sometimes occur in the electronic content. For a simple first model, we deliberately assume that there are just two possible values of the defective rate. In about 70% of the production runs, the electric current is regular, in which every IC module that is produced has an independent 10% chance of being defective. In the other 30% of production runs, when current is irregular, every IC module produced has an independent 40% chance of being defective.
Testing these IC modules is quite expensive. So it is valuable to make inferences about the overall rate of defective based on a small sample of tested modules in each production run, from which we test 10 IC modules and aim to answer the following questions.
simulate production run!
preg=0.7
pbad.reg=0.1
pgood.reg=1-pbad.reg
#
pirreg=1-preg
pbad.irreg=0.4
pgood.irreg=1-pbad.irreg
# n = num of sample
ICmodule.sim = function(n=10) {
simulated.modules=rep(NA,n)
# 1 denotes regular
# -1 denotes irregular
labels=sample(c(1,-1),n,
prob=c(preg,pirreg),
replace=TRUE)
if(any(labels==1)){
simulated.modules[which(labels==1)]=
sample(c("goodreg","badreg"),
sum(labels==1),
prob=c(pgood.reg,pbad.reg),
replace=TRUE)
}
if(any(labels==-1)){
simulated.modules[which(labels==-1)]=
sample(c("goodirreg","badirreg"),
sum(labels==-1),
prob=c(pgood.irreg,pbad.irreg),
replace=TRUE)
}
simulated.modules
}
ICmodule.sim()
## [1] "badirreg" "goodreg" "goodreg" "goodreg" "goodirreg" "goodreg"
## [7] "goodreg" "badirreg" "goodreg" "goodreg"
S=10000
sim.table=replicate(S,ICmodule.sim())
dim(sim.table)
## [1] 10 10000
badnum=c()
badnumreg=c()
badnumirreg=c()
numreg=c()
numirreg=c()
for(i in 1:ncol(sim.table)){
badnumreg[i]=sum(sim.table[,i]=="badreg")
badnumirreg[i]=sum(sim.table[,i]=="badirreg")
badnum[i]=sum(sim.table[,i]=="badreg")+
sum(sim.table[,i]=="badirreg")
numreg[i]=sum(sim.table[,i]=="badreg")+
sum(sim.table[,i]=="goodreg")
numirreg[i]=sum(sim.table[,i]=="badirreg")+
sum(sim.table[,i]=="goodirreg")
}
\[P(2\; defects)=\frac{num\;of \;runs\; with\; 2\, defects}{10,000}\]
sum(badnum==2)/S
## [1] 0.3057
$$$$
sum(numreg==10 & badnum==2)
## [1] 62
sum(numreg==10 & badnum==2)/sum(badnum==2)
## [1] 0.02028132
\[P(at \; least \; 1 \; irreg | k \; defects)=\frac{P(at \; least|irreg\,\&\,defects)}{P(k \; defects)}\]
k=1
sum(badnum==k)
## [1] 2788
sum(numirreg>=1 & badnum==k)
## [1] 2687
sum(numirreg>=1 & badnum==k)/sum(badnum==k)
## [1] 0.9637733
Vanessa Pay
#John's payoff is deterministic
John_pay = 12000
#The probability of getting the job from Vanessa
prob_interval = seq(0.1,0.9,0.1)
#Simulation runs
simulation_runs = 1:10000
#Earnings of waiting for Vanessa's uncertain offer
results1 = matrix(NA, nrow=length(prob_interval),
ncol=length(simulation_runs))
rownames(results1) = prob_interval
colnames(results1) = simulation_runs
sim.School_pay=sample(c(21600,16800,12000,6000,0),length(simulation_runs),
prob = c(0.05,0.25,0.4,0.25,0.05),replace=TRUE)
# every time the prob we thought we get the job from Vanessa
for(p in prob_interval){
# every prob we run 10,000 times of simulation
for(i in 1:length(simulation_runs)){
#The payoff from Vanessa is deterministic
Vanessa_pay = 14000
#The chance to get the job from Vanessa
Vanessa_offer = sample(c(1,0),1, prob = c(p,1-p))
Vanessa_pay = Vanessa_offer*Vanessa_pay
#The payoff from school is uncertain
School_pay = sim.School_pay[i]
if(Vanessa_offer==1){
results1[which(p==prob_interval),i] = Vanessa_pay
}else{
results1[which(p==prob_interval),i] = School_pay
}
}
cat("P(Vanessa Offer=1)=",p, "\n")
}
## P(Vanessa Offer=1)= 0.1
## P(Vanessa Offer=1)= 0.2
## P(Vanessa Offer=1)= 0.3
## P(Vanessa Offer=1)= 0.4
## P(Vanessa Offer=1)= 0.5
## P(Vanessa Offer=1)= 0.6
## P(Vanessa Offer=1)= 0.7
## P(Vanessa Offer=1)= 0.8
## P(Vanessa Offer=1)= 0.9
# Earnings of waiting for Vanessa's uncertain offer
# we wanna know the percentile of payoff under different probability from Vanessa
# 你自己覺得自己機會高,也等於確定性高,分布比較不太會變動,報酬就是拿Vanessa的錢。
pctile.range = c(0.05,0.1,0.25,0.5,0.75,0.9,0.95)
sim.pctile1 = matrix(NA, ncol=length(pctile.range),
nrow= length(prob_interval))
colnames(sim.pctile1)=pctile.range
rownames(sim.pctile1)=prob_interval
#apply(results1,1,summary)
for(i in 1:nrow(results1)){
sim.pctile1[i,] = quantile(results1[i,], probs = pctile.range)
}
sim.pctile1
## 0.05 0.1 0.25 0.5 0.75 0.9 0.95
## 0.1 6000 6000 6000 12000 16800 16800 16800
## 0.2 6000 6000 12000 12000 14000 16800 16800
## 0.3 6000 6000 12000 14000 14000 16800 16800
## 0.4 6000 6000 12000 14000 14000 16800 16800
## 0.5 6000 6000 12000 14000 14000 16800 16800
## 0.6 6000 6000 12000 14000 14000 16800 16800
## 0.7 6000 12000 14000 14000 14000 14000 16800
## 0.8 6000 12000 14000 14000 14000 14000 16800
## 0.9 12000 14000 14000 14000 14000 14000 14000
# Prob(offer >= John)
# 大於John Pay 需要至少多少Vanessa工作的機會
Pbetter1=c()
for(i in 1:nrow(results1)){
Pbetter1[i]=
sum(results1[i,] >= John_pay)/ncol(results1)
}
plot(prob_interval,Pbetter1,type='l',xaxt='n',lwd=3,
xlab="P(Vanessa offer=1)",ylab="P(Earning>=John offer)")
axis(1,seq(0.1,0.9,0.1))
abline(h=0.95,col='red',lty=2,lwd=2)
abline(h=0.9,col='green',lty=3,lwd=2)