Rapport for exercises

Day 1: Exercise 2 RNG generation with linear congruence

The algorithm casn be implemented as:

a = 16807 ; b =0 ; M =2147483647 #choice of parameters
n = 1000 # nb of iterations
x = numeric(length=n)
x[1] = 3 #start condition
for(i in 2:n) #exication of the linear congruential method
{
  x[i] = (a*x[i-1] + b) %% M
}
x = x/M #normalizing x

The results has been plotted as followed:

#Different plots of x (HISTORGRAM SCATTER PLOT AND ACF(X))
par(mfrow=c(1,3)) 
hist(x,prob=TRUE)
u = seq(0,1,0.01)
lines(u,dunif(u),col=2) #adding the "true" uniform distribution to the histogram

plot(x,main="x",xlab="Interations")

acf(x,main="ACF of x")

plot of chunk unnamed-chunk-2

Both the histogram and ACF indicates that there might be problems. Some of the points falls outside the allowed area for the ACF. For large value of x it seems that the histogram has few observations. From the scatter plot it seems fine and evenly distributed. The Kolmogorov-Smirnov was conducted:

ks.test(x,"punif",alternative="two.sided")

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  x
## D = 0.043, p-value = 0.04975
## alternative hypothesis: two-sided

The p-value is 0.04975 and would be rejected at a 95 % confident interval.

The M value and initialization of x was changed and the algorithm was ran again.

a = 16807 ; b =0 ; M =2147493647 #choice of parameters
n = 1000 # nb of iterations
x = numeric(length=n)
x[1] = 6 #start condition
for(i in 2:n)
{
  x[i] = (a*x[i-1] + b) %% M
}
x = x/M #normalizing x

#Different plot of x (HISTORGRAM SCATTER PLOT AND ACF(X))
par(mfrow=c(1,3)) 
hist(x,prob=TRUE)
u = seq(0,1,0.01)
lines(u,dunif(u),col=2) #adding the "true" uniform distribution to the histogram

plot(x,main="x",xlab="Interations")

acf(x,main="ACF of x")

plot of chunk unnamed-chunk-5

The plot seems better, there are only a single violation in the AFC and there are no obviously under populated areas in the histogram. The properties of the scatter plot seems unchanged.

The Kolmogorov-Smirnov also gives a better result.

ks.test(x,"punif",alternative="two.sided")

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  x
## D = 0.02, p-value = 0.818
## alternative hypothesis: two-sided

The p-value indicates that the null hypothesis can not be rejected. It seems that this combination of numbers gives a more accurate approximation to the uniform distribution.

Day 2: Exercise 1-3 simulat sample from none uniform distributions

One way of doing this is by the inverse of the distribution. The idea behind it is to simulate a uniform random variable and then transform it to the target density, by taking the inverse of the target density function to the uniform random variable. This method has been applied to the Cauchy distribution, the inverse of the Cauchy distribution has been found to be \( tan(\frac{1}{2} \pi (2u-1)) \). The code for it is shown here:

## Simulation of the Cauchy by inversion
#initilization n is number of points simulated
rm(list=ls())
n=100000 ; u = runif(n) 
#plotting the target density
h = seq(-10,10,.1); plot(h,dcauchy(h, location = 0, scale = 1),
                        type='l',lwd=3,col=3,xlab='x',ylab='density',main = 'The cauchy function and the estimated one (histogram)')

#the inverse of the dcauchy evauated on the generated uniform random variable
x = tan(1/2*pi*(2*u-1))

par(mfrow=c(1,1))
hist(x[abs(x)<10],add=TRUE,breaks=100,prob=TRUE)#plotting the histrogram of the resulting distribution

plot of chunk unnamed-chunk-7

The result seems to be quite good, unfortunately the inverse of the pdf can be hard to get. For such case the rejection method can be used. The idea behind this method is to simulate uniform random numbers from an auxiliary probability density function, which is greater then the target density at all times.Then all point larger then the pdf of the target distribution is discarded.Because the points were uniformly distributed then they would still be uniformly distributed in the subspace of the target function. This method has some drawbacks, in the form of resources that will be spended, on computing the points which will be discarded. This lead to an inefficiency and we don't know how many iterations needed to achieve the sample size that we desires. The inefficiency can be dealt with by choosing a function as close to the target pdf as possible.

This algorithm has been applied to the beta(2,5) and N(0,1). First is shown the code and the result from the beta(2,5) distribution. An uniform distribution has been chosen as auxiliary function.

#rejection method 1
#initialization where n point is drawn from the auxiliary function
n = 100000; x= runif(n)
u = runif(n)

#evaluating the generated uniform random variable
# on the auxiliary function.
y= 2.5*dunif(x,0,1)         

#y*u gives uniformly random points drawn from the auxillery function,
#all points laying out side of the beta function is discarded
h = seq(0,1,.001)
f=  x[y*u<=dbeta(x,2,5)]

#the target function is plotted
h = seq(0,1,.001) 
plot(h,dbeta(h,2,5),
      type='l',lwd=3,col=3,xlab='x',ylab='density', main = 'The beta(2,5) function and the estimated one (histogram)')
#Histrogram of the estimated distribution are found
hist(f,breaks=100,add=TRUE,prob=TRUE)

The code and the result from the N(0,1) distribution are shown. An exponential distribution has been chosen as auxiliary function.

#rejection method 2
#initilizing where n points are drawn for the auxiliary distibution
rm(list=ls())
n = 100000; z=rexp(n);

#The uniformly random drawn points from the auxiliary function would be y*u.
u = dexp(z,1)
y= runif(n,0,1)

#making sure that z (the point from our auxiliary function) is symmetric around 0
z[(length(z)/2+1):length(z)]=-z[(length(z)/2+1):length(z)]

#y*u gives uniformly random points drawn from the auxillery function,
#all point laying outside of the beta function is discarded         
f=  z[y*u<=dnorm(z,0,1)]

#plotting the target density
h = seq(-4,4,.001)
plot(h,dnorm(h,0,1),
     type='l',lwd=3,col=3,xlab='x',ylab='density', main ='The N(0,1) function and the estimated one (histogram)')
#Histrogram of the estimated distribution are found
hist(f,add=TRUE,breaks=100,prob=TRUE)

plot of chunk unnamed-chunk-9

It seems that the rejection method makes quite good results based on the plots for a significant high number of points simulated.

Day 2: Exercise 4 needle on a floor

The length of the needle has been chosen to 2. The problem has to symmetric axis a vertical and a horizontal. Due to the horizontal axis symmetry, it is only necessary to take into account what is above the center of the needle and due to the vertical axis symmetry only what is to the right of the needle center. Then the system can be describe with two variables: X which indicates the distance from the center of the plank to the center of the needle, and the angle between the needle and the horizontal axis. In this case we came up with the following constrain to be satisfied for the needle to land and touching the boarder \( 1 < x+cos(\frac{\pi}{2}u) \). x and the angle is simulated and the found possibilities are found, p1 is the probability of the needle touching the boarder and p2 is the probability of the needle to fall within the plank.

#thrown needle
#the needle are thrown n times
#it is assumed that the plank is 2 wide
#x indicates how far the center of the needle is from the center of the plank
#u is a scaler to find the angle of the needle bewtween 0 and pi/2
n=100000; x=runif(n); u=runif(n)

# the length of the needle projected on to the horionzotal axis of the plank
l = x+cos(pi/2*u)

#calculating the probability of the needle hitting a line, which
#means that l shall be less than half the withe of the plank
p1 = length(l[l>1])/n
#calculating the probability of the needle not touching the line
p2=1-p1

p1

## [1] 0.6367

p2

## [1] 0.3633

Day 2: Exercise 5 smelling the market

The idea behind this exercise is that you want to sell your car and gets an offer, and then the question is how long shall you wait before you get a better offer. It is assumed that the offers comes from a Poisson(10) distribution. In the first simulation we fix the value of off the offer to 12. The code and result for the first presentation is shown below.

##simulating the probability of getting a better offer than 12
#Initializing. The time before n better offers are simulateded
rm(list=ls())
n=100000
y=rep(NA,n)

#running until n better offers has occured
for (i in 1:n){
  x=rpois(1,10)#drawing an initial offer from the poisson distribution
  t=1
  while(x<=12){ #keep simulating until a better offer than 12 has occured
    x=rpois(1,10)#generating new offers
    t=t+1 #increasing time
  }
  y[i]=t
}

mean(y) #computing mean waiting time

## [1] 4.823

hist(y,breaks=10,prob=TRUE, xlab='number of offers',main = 'waiting time for better offer') #plotting the histrogram for the waiting time

plot of chunk unnamed-chunk-14

In the next simulation we wish to consider a random starting offer and the waiting time from that. So we draw a random number from the Poisson distribution and makes that the initial offer, then we count the numbers of draws needed to get a better offer. Then we draw a new starting offer and reapets the procedure. The code and result are shown below.

##simulating the probability of getting a better offer than a random one
#Initializing. The time before n better offers are simulateded
#the offers are drawn from the poisson distribution
#Initializing. The time before n better offers are simulateded
rm(list=ls())
n=100000
y=rep(NA,n)

#running until n better offers as occured
for (i in 1:n){
  z=rpois(1,10)#the initail offers are generated
  x=rpois(1,10)#counter offers are simulated
  t=1
  while(x<=z){#keep simulating until a better offer has occured
    x=rpois(1,10)#generating new counter offers
    t=t+1#keeping track of wating time
  }
  y[i]=t
}

mean(y) #computing mean waiting time

## [1] 18.29

hist(y[y<100],breaks=10,prob=TRUE, xlab='number of offers',main = 'waiting time for better offer') #plotting the histrogram for the waiting time

plot of chunk unnamed-chunk-17

The last simulation varies a lot between runs, so the estimate is quite uncertain.

Day 2: Exercise 6 bootstrap sampling

The idea is to estimate the median of the Poisson using bootstrap. A sample of 200 is drawn from the Poisson distribution and the theoretical median is found for the sample. The theoretical median is used to compare with the bootstrap sample. First this is done by varying the number of bootstrap samples from 1 to 200 where each bootstrap sample contains 200 samples randomly sampled from the starting data. The results and code is shown below.

# this code is for estimating the theoretical median of a sample using bootstrap.
# The sample is randomly drawn from a poisson distribution with lambda 10.
rm(list=ls())
n=200 #the sample size
x=rpois(n,10) #the sample is drawn
true = median(x)
y=matrix(NA,n,n)

# bootstrap is performed the number of bootstrap samples are varied from 1 to 200
for (i in 1:n){
  for (j in 1:i){
    bootstrap=sample(x,200,replace=TRUE)
    y[j,i]=median(bootstrap)
  }
}
my = colMeans(y,na.rm=TRUE)
plot(my,xlab='number of bootstap samples',ylab='estimated median',main = 'estimated median as a function of bootstrap samples')
lines(seq(1,n,1),matrix(true,n,1),col=2)

plot of chunk unnamed-chunk-18

The line indicates the theoretical median.

va = matrix(NA,200,1)

# The variance of the median is found
for (k in 1:n){
  va[k] = mean((y[1:k,k]-my[k])^2)  
}
plot(va,xlab='number of bootstap samples',ylab='estimated variance of the median',main = 'estimated variance of the median as a function of bootstrap samples')

plot of chunk unnamed-chunk-19

It can be seen that the uncertainty of the estimated variance of the median drops when the number of bootstraps increases. It is also seen that the bootstraps has a fairly accurate estimate of the media.

Now the same procedure is repeated just with the bootstrap sample size fixed at 200 and the sample of each bootstrap are varied between 1 to 200.

# this code is for estimating the theoretical median of a sample using bootstrap.
# The sample is randomly drawn from a poisson distribution with lambda 10.
rm(list=ls())
n=200 #the sample size
x=rpois(n,10) #the sample is drawn
true = median(x)
y=matrix(NA,n,n)
# bootstrap is performed. The number of bootstraps samples are 200, and the number of 
# observation in each sample are varied between 1 to 200
for (i in 1:n){
  for (j in 1:n){
    bootstrap=sample(x,i,replace=TRUE)
    y[j,i]=median(bootstrap)
  }
}
#the found medians are plotted 
my = colMeans(y,na.rm=TRUE)
plot(my,xlab='sample size of each bootstrap',ylab='estimated median',main = 'estimated median as a function of bootstrap sample size')
lines(seq(1,n,1),matrix(true,n,1),col=2)

plot of chunk unnamed-chunk-20

The line indicates the theoretical median.

# The variance of the bootstaps are found
va = matrix(NA,200,1)
for (k in 1:n){
  va[k] = mean((y[1:k,k]-my[k])^2)  
}
plot(va[10:length(va)],xlab='sample size of each bootstrap',ylab='estimated variance of the median',main = 'estimated variance of the median as a function of bootstrap sample size')

plot of chunk unnamed-chunk-21

This reveals an interesting thing: it seems that the variance of the estimated parameter is asymptotically decreasing, when the bootstrap sample size is increased.

Day 2: Exercise 7 sampling conditonal distribution

The target of this exercise is to be able to simulate the conditional distribution of Y2|Y1 = 0.5 when Y1 and Y2 comes from a bivariant centered standardized Gaussian distribution with covariance of 0.7. First the Gaussian distribution is simulated in the following way.

########################
#Simulating 50000 points in a bivariant gaussian distribution, with rho = 0.7.
# further is it investigated if Y2|Y1=0.5 has a normal distribution.
rm(list=ls())

## Define target density
sd =1 ; rho = .7
Sigma = matrix(nr=2,nc=2,data=c(sd,rho,rho,sd))
Sigma

##      [,1] [,2]
## [1,]  1.0  0.7
## [2,]  0.7  1.0

## Choleski factorisation
U = chol(Sigma) ; L = t(U)
L

##      [,1]   [,2]
## [1,]  1.0 0.0000
## [2,]  0.7 0.7141

L %*% U ## just checking

##      [,1] [,2]
## [1,]  1.0  0.7
## [2,]  0.7  1.0

# simulation
n = 50000
x1 = rnorm(n) ; x2 = rnorm(n)
X = rbind(x1,x2)
Y = L %*% X
Y = t(Y)
plot(Y,asp=1,xlab='Y1',ylab='Y2',main = 'bivariant gaussian with covariance of 0.7')

plot of chunk unnamed-chunk-22

It is not possible to find the exact conditional distribution Y2|Y1 = 0.5, instead Y1 in the interval 0.45 to 0.55 is considered and the histogram of of Y2 is found, which should be an estimate of the true conditional distribution. The histogram is plotted with the expected conditional distribution of

# a sample for Y2|Y1=0.5 is drawn from the data, the simulated values for Y2 will be the
# once having Y1 = 0.45 to 0.55.
Y = t(Y)
vec1 = Y[2,abs(Y[1,]-0.5)<0.05]

h = seq(-2,2,.01) 
plot(h,dnorm(h,0,1-rho^2),
     type='l',lwd=3,col=3,xlab='x',ylab='density',main = 'estimate of the the conditional distribuiton Y2|Y1=0.5')
#Histrogram of the estimated distribution are found
hist(vec1,breaks=100,add=TRUE,prob=TRUE)

plot of chunk unnamed-chunk-23

It is quite obvious to see that this distribution is screwed from the expected normal distributions. This is due to the way the interval for Y1 is chosen. It is symmetrically around the desired value of a half, the problem is that the volume of the bi variant Gaussian is not. That leads to a higher probability of points laying on one side of 0.5. If this should be done right the interval should be chosen in a way so that the volume on each side is the same.

Day 3: Bayesian approach

A data sample containing data about tails and heads in coin tosses are created, the idea then is if it is possible to estimate the probability of getting head and the uncertainty of this estimate. First of all an assumption is made that the coin tosses followes the binomial distribution give with the coefficient \( {N}\choose{k} \)\( q^kq^{N-k} \). The formula can be understood as follows: we want k successes \( (q^k) \) and n - k failures (1 - q)n - k. However, the k successes can occur anywhere among the n trials, and there are \( {n}\choose{k} \) different ways of distributing k successes in a sequence of n trials. A way to estimate q is to draw a random q from the uniform distribution between 0 and 1, and then simulate 10 coin tosses from this distribution. The number of times a given q give raise to a similar number of heads and tails are counted. This would then give us an estimate of the distribution from which q can come from and we choose the q to be the most likely one. The more likely a certain q is to generate our data the more likely it is to accept it as our generated following this procedure. The variance of this data will then be the uncertainty. This is at first done with 10 tosses.

# This is a bayesian approch to estimate the probalility of getting head or tail in a waited coin
# The sample size and observed heads and tails are found
n=10;
s = sample(0:1,n,replace=TRUE)
data = s
s0 = sum(s)/n # probability of getting head in the sample

# The algorithem has been  implemented 
NL = 1000
N = 0
combinedq = matrix(NA,NL,1)
while (N<NL){
  n=10;
  q=runif(1,0,1)
  s = rbinom(n,1,q)
  s = sum(s)/n
  if(s==s0){
    N=N+1
    combinedq[N,1]=q
  }
}

# Statistical propperties are found 
hist(combinedq[,1],prob=TRUE,xlab ='q values',main='estimation of q')

plot of chunk unnamed-chunk-24

mean(combinedq[,1])

## [1] 0.2528

s0

## [1] 0.2

var(combinedq[,1])

## [1] 0.01472

We can see that the estimated value is very close to the true value and that the distribution of q looks like a beta distribution which it should according to theory.

If it is repeated with 100 tosses, this will take along time, due to the fact even if we chooses the most likely the probability of it to generate the data again is really low. For this reason there has been no attempt to include the algorithm used on a larger data sample in the report.

Day 4: exercise 1 simulation of a markov chain

For the chosen Markov chain there are four possibilities for the n+1 step. They can stay in 0, stay in 1, move from 0 to 1 or move from 1 to 0. The probability for this case are:

\( p(0 \rightarrow 0) = 1-\frac{1}{\sqrt(2)}\\ p(0 \rightarrow 1) = \frac{1}{\sqrt(2)}\\ p(1 \rightarrow 0) = 1-\frac{1}{\pi}\\ p(1 \rightarrow 1) = \frac{1}{\pi} \)

This leads to the probability of the n+1 step to be 0.49 is and the probability of n+1 step to be 1 is 0.51. The code with result for simulating this system is shown below.

#Initializing the chain
n = 1000 ; x = rep(NA,n)
x[1] = rbinom(n=1,size=1,prob=1/2)

# evaluating if the chain are going to jump to a new state or not, giving the 
# number of iterations and the probalility of going to different states
for(i in 2:n){

  if(x[i-1]==0){
    x[i]=rbinom(n=1,size=1,prob=1/sqrt(2))#drawing the next state
  }
  if(x[i-1]==1){
    x[i]=rbinom(n=1,size=1,prob=1/pi)#drawing the next state
  }
}

hist(x,xlab = 'states')

plot of chunk unnamed-chunk-28 This seems to be more or less the case.

Day 4: exercise 2 metropolian algorithm

The metropolian algorithm has been used to simulate the Cauchy(0,1) distribution to see of this is a vialded alternative to the rejection method. the algorithm work like the chain model the chain is in a certain state and has a probability to go to a new state. In each iteration a new state is proposed and accepted with the following probability \( min(\frac{f(Y)*q(X_i)}{f(X_i)*q(Y)},1) \). The N(0,1) has been used as the auxiliary function. The code and results are shown below.

# implementation of the Metropolian algorithm for estimating a sample drawn for the 
# couchy (0,1) distribution. N(0,1) is used as auxillary function.

# Initializing the code
n = 100000 ; x = rep(NA,n)
x[1] = rnorm(n=1,0,1)

#running the chain
for (i in 2:n){

  Y = rnorm(n=1,0,1)

  # computing the probalility of accepting the chain to go to a new state
  p = min((dcauchy(Y,0,1)*dnorm(x[i-1],0,1))/(dcauchy(x[i-1],0,1)*dnorm(Y,0,1)),1)

  #computing if the chain are going to a new state using an unfair coin
  accept = rbinom(n=1,size=1,prob=p)
  if(accept==1){
    x[i]=Y
  }
  if(accept==0){
    x[i]=x [i-1]
  }
}

#plotting the found distribuion and the approximated for comparison.
h = seq(-10,10,.1); plot(h,dcauchy(h, location = 0, scale = 1),
                         type='l',lwd=3,col=3,xlab='x',ylab='density',main='metroploian usid to estimate cauchy(0,1)')
hist(x[abs(x)<10],add=TRUE,breaks=100,prob=TRUE)

plot of chunk unnamed-chunk-29

d = x

It is seen that the approximation is okay around the center of the Cauchy function, but it is not able to estimate the extreme values of the Cauchy function.

Day 4: exercise 3 Metropolis-Hastings algorithm

The Metropolis-Hastings algorithm is an improvement to the metropolian algorithm. The changes lay in how we accept a new state. The auxiliary function is now a function of the current state and the proposed state of the chain, and the acceptances probability is then given by \( min(\frac{f(Y)q(X_i|Y)}{f(X_i)q(Y|x_i)},1) \). The normal distribution has been used as the auxiliary function. The code and results are shown below.

# implementation of the Metropolis-Hastings algorithm for estimating a sample drawn for the 
# couchy (0,1) distribution. N(0,1) is used as auxillary function.

# Initializing the code
n = 100000 ; x = rep(NA,n)
x[1] = runif(n=1,0,1)
sigma=1

#running the chain
for (i in 2:n){

  Y = rnorm(n=1,mean=x[i-1],sd=sigma) # propose candidate value

  # compute Metropolis ratio
  R = (dcauchy(Y) * dnorm(x[i-1],mean=Y,sd=sigma)) /
    (dcauchy(x[i-1]) * dnorm(Y,mean=x[i-1],sd=sigma))
  # compute acceptance probability
  p = min(R,1)

  accept = rbinom(n=1,size=1,prob=p)
  if(accept==1){
    x[i]=Y
  }
  if(accept==0){
    x[i]=x[i-1]
  }
}

#plotting the found distribuion and the approximated for comparison.
h = seq(-10,10,.1); plot(h,dcauchy(h, location = 0, scale = 1),
                         type='l',lwd=3,col=3,xlab='x',ylab='density',main='MH used to estimate cauch(0,1')
hist(x[abs(x)<10],add=TRUE,breaks=100,prob=TRUE)

plot of chunk unnamed-chunk-30

c = x

It can be seen that this algorithm has been quite successful in estimating the target function. Further it can be concluded that the change in how we accept a new state has ensured that a correct sampling of extreme values has been achieved.

Day 4: exercise 4-5 comparison between Metropolis-Hastings algorithm and rejection method

Both the MH and the rejection method has been used to simulate from the \( \beta(frac{1}{2},frac{1}{2}) \) distribution as follows.

# Comparing the rejection method with the Metropolis-Hastings algorithm, estimating 
# a beta (1/2,1/2) distribution

# Initialization of the MH alogrithm
n = 100000 ; x = rep(NA,n)
x[1] = runif(n=1,0,1)
sigma=1
for (i in 2:n){

  Y = rnorm(n=1,mean=x[i-1],sd=sigma) # propose candidate value
  # compute Metropolis ratio
  R = (dbeta(Y,1/2,1/2) * dnorm(x[i-1],mean=Y,sd=sigma)) /
    (dbeta(x[i-1],1/2,1/2) * dnorm(Y,mean=x[i-1],sd=sigma))
  # compute acceptance probability
  p = min(R,1)

  accept = rbinom(n=1,size=1,prob=p)
  if(accept==1){
    x[i]=Y
  }
  if(accept==0){
    x[i]=x[i-1]
  }
}

#plotting the result
par(mfrow=c(2,1))
h = seq(0,1,.01); plot(h,dbeta(h,1/2,1/2),
                         type='l',lwd=3,col=3,xlab='x',ylab='density',main='MH')
hist(x,add=TRUE,breaks=100,prob=TRUE)
a = x

#rejection methode
#initialization where n point is drawn from the auxiliary function
n = 100000; x= runif(n)
u = runif(n)

#evalueting the generated uniform random variable
# on the auxiliary function.
y= 3*dunif(x,0,1)         

#y*u gives uniformly random point drawn from the auxillery function,
#all point laying out side of the beta function is discarded
h = seq(0,1,.001)
f=  x[y*u<=dbeta(x,1/2,1/2)]

#the target function is plottet
h = seq(0,1,.01) 
plot(h,dbeta(h,1/2,1/2),
     type='l',lwd=3,col=3,xlab='x',ylab='density',main='rejection method')
#Histrogram of the estimated distribution are found
hist(f,breaks=100,add=TRUE,prob=TRUE)

plot of chunk unnamed-chunk-31

From the histograms alone is it hard to see if they differ. For further investigation has the auto correlation function been computed.

# ploting the acf for all the estimated distributions.
# f = beta(1/2,1/2) approixmated with the rejection methode
# a = beta(1/2,1/2) approixmated with MH methode
# c = cauchy(0,1) approixmated with MH methode
# d = cauchy(0,1) approixmated with metroplian methode

par(mfrow=c(2,2))
acf(f)
acf(a)
acf(c)
acf(d)

plot of chunk unnamed-chunk-32

f = beta(½,½) approximated with the rejection method, a = beta(½,½) approximated with MH method, c = Cauchy(0,1) approximated with MH method and d = Cauchy(0,1) approximated with Metroplolian method. It is seen that that the rejection method as no problems with lag which both the MH and metropolian algorithm suffered from. The MH seems to have bigger problems with lag then the metropolian algorithm. The lag problems comes form the way the next state of the chain is chosen. For many proposed values for the next state, will there be a very low probability for acceptances, resulting in a high probability of being in the same state, This can happen for multiple iterations and resulting in a high correlation between the next value and the current. For the MH algorithm this is a bigger problem because the auxiliary function is also a function of the current state. This makes it less likely for the chain to change to a state far from the previous state.

Day 4: exercise 6 sampling from a conditional distribution using gibbs sample

In order to sample from a conditional distribution can Gibbs sampling be used. This has been done on the following conditional distribution Y|X = x~N(x,1), where X and Y come from a MVN wide covariance of \( \rho \). It has been done the following way.

#Using gibbs sample to simulate numbers drawn from the bivariant gaussian distribution

# initalization of the code
rm(list=ls())
n = 100000 ; x = rep(NA,n); y = rep(NA,n)
rho = 0.5
x[1] = rnorm(n=1,0,1)

# Implemented Gibbs sampling algorithm
for (t in 1:(n-1)){

  y[t] = rnorm(1,(rho*x[t]),(1-rho^2))
  x[t+1] =rnorm(1,(rho*y[t]),(1-rho^2))
}

#Plotting results
plot(x,y,xlab='Y1',ylab='Y2',main='the bivariant normal distribution with covariance of 0.5')

plot of chunk unnamed-chunk-33

h = seq(-2,2,.01) 
plot(h,dnorm(h,0,1-rho^2),
     type='l',lwd=3,col=3,xlab='x',ylab='density',main='estimate of conditional distribution using gibbs sample')
#Histrogram of the estimated distribution are found
hist(y,breaks=100,add=TRUE,prob=TRUE)

plot of chunk unnamed-chunk-33

The histogram seems to be a little low in the middle and overestimating the extreme values. But seems to overall capture the target density nicely. This could be further improved by using Gibbs sampling in a MH algorithm.

Day 5: exercise 1 stokastic integration in one dimension.

we wish to integrate the following function \( f(x) = (cos(50x)+sin(20x))^2 \) in the interval 0 to 1. A plot for the function in this interval as been plotted in the following way.

# ploting the integrand
h=seq(0,1,.01)
plot(h,(cos(50*h)+sin(20*h))^2,type="l",xlab = 'x', ylab = 'f(x)',main='plot of integrand function')

plot of chunk unnamed-chunk-34

The chosen algorithm for integrating f(x) is to generate random number from f(x) and then take the average value of them. This has been implemented in the following way.

#randomly selecting points from the integrand
n=10000
integrand = function(x) {(cos(50*x)+sin(20*x))^2}
x = runif(n,0,1)

#integrating the integrand with r's build-in function and
# as the mean for the selected points
I = mean(integrand(x),na.rm=TRUE)
II = integrate(integrand,lower=0,upper=1)

The value found by previous mentioned algorithm

## [1] 0.9582

The value found using R's build-in integration function

II

## 0.9652 with absolute error < 1.9e-10

It can be seen that the values are very close to each other.

Day 5: exercise 2 stokastic integration of the unit ball in high dimensions.

Integration in higher dimensions using stochastic variables for the unit ball and random number from a unit box are given by \( V_d = \int_{R^d} \mathrm{I}_{B^d}(x),\mathrm{d}x = |C_d|\int_{R^d} \mathrm{I}_{B^d}(x)\frac{1}{|C_d|}\mathrm{I}_{C^d}(x),\mathrm{d}x = |C_d|E(\mathrm{I}_{B^d}(x)) \) and the variance of the estimated parameter are given by \( \sigma^2_d = \frac{1}{n}\int_{R^d} (h(x)-?)^2f(x),\mathrm{d}x = \frac{|C_d|}{n}\int_{R^d} (h(x)-?)^2\frac{1}{|C_d|}\mathrm{I}_{C^d}(x),\mathrm{d}x = \frac{|C_d|}{n}E((h(x)-?)^2) \). The results would be the same for the bivariant normal distribution except just without the \( |C_d| \) factor in front of the expectation value. Monte carlo integration for the unit ball and the two cases has been implemented in the following way.

#Monte carlo integration in higher dimension. Integrating the unit ball
#in 2, 3 and 10 dimensions

#initializing the code
rm(list=ls())
n=10000
j=1
I=matrix(NA,3)
V=matrix(NA,3)

#drawing points from the unit box in 2, 3, 10 dimensions
for (d in c(2,3,10)){
x = matrix(NA,n,d)
for (i in 1:d){
x[,i] = runif(n,-1,1)
}

#making the monte carlo integration
x = x[rowSums(x^2)<1,]
I[j] = (length(x[,1])/n)*2^(d)

#computing the variance on the estimate
V[j] = 2^(d)/n*mean(((length(x[,1])-I[j])/n)^2)
j=j+1
}

#Monte carlo integration in higher dimension. Integrating the unit ball
#in 2, 3 and 10 dimensions
j=1
I2=matrix(NA,3)
V2=matrix(NA,3)

#drawing points from the a bivariant centered normal distributiom in 2,3,10 dimensions
for (d in c(2,3,10)){
  x = matrix(NA,n,d)
  for (i in 1:d){
    x[,i] = rnorm(n,0,1)
  }

  #making the monte carlo integration
  x = x[rowSums(x^2)<1,]
  I2[j] = (length(x[,1])/n)

  #computing the variance on the estimate
  V2[j] = 1/n*mean(((length(x[,1])-I[j])/n)^2)
  j=j+1
}

## Error: incorrect number of dimensions

# finding true volume
j=1
I3=matrix(NA,3)

# computing the true value for the volume
for (d in c(2,3,10)){

  I3[j] = pi^(d/2)/gamma(1+d/2)
  j = j+1
}

d = c(2,3,10)
c = cbind(1,2,3)

plot(cbind(d,d,d),rbind(I,I2,I3), col=rbind(c,c,c), main = 
       'estimated Volume of the unit ball as a function of dimension',ylab = 'volume',
     xlab = 'dimension')
legend(7.6,4,c('uniform', 'bivariant normal', 'theoratical'),lwd=c(2.5,2.5),col=c(1,2,3))

plot of chunk unnamed-chunk-38

The uni box approach gives quite good and accurate results, where the bivariant normal distribution perform really badly. The reason behind this is probably that the bivariant normal distribution do not give a uniform sample across the ball. Another possibility is that i was unable to determine a correct normalization constant for the integral. The variance for the unit box (V) and for the normal distribution (V2) is shown below.

##           [,1]
## [1,] 2.439e-04
## [2,] 2.203e-04
## [3,] 5.156e-07

V2

##           [,1]
## [1,] 1.534e-05
## [2,] 3.971e-06
## [3,]        NA

Day 5: exercise 3 optimization using monte carlo

This exercise is about using the Monte carlo simulation in order to optimize a function. First this is done with a uniform distribution as auxiliary function and secondly with the normal distribution. This is done in order to simulate points from the function, then the simulated point which maximizes h(x) to be the optimal value of x. For the uniform case.

# the objective function
h <- function(x){3*dnorm(x,m=-2,sd=.2) + 
                   6*dnorm(x,m=1,sd=.5) + 1*dnorm(x,m=3,sd=.3)}

# finding the point which maximizing the objective function 
n <- 100
U <- runif(n=n,min=-5,max=5)
istar <- which.max(h(U))
xstar <- U[istar]
hstar <- h(xstar)

Optimal x value found.

xstar

## [1] -2.02

Corresponding h(x).

hstar

## [1] 5.954

Then for the case with N(0,1) as auxiliary function.

rm(list=ls())
# the obejctive function
h <- function(x){3*dnorm(x,m=-2,sd=.2) + 
                   6*dnorm(x,m=1,sd=.5) + 1*dnorm(x,m=3,sd=.3)}

#uses MH algorithm to draw random numbers from h
n = 100 ; U = rep(NA,n)
U[1] = runif(n=1,0,1)
sigma=1
for (i in 2:n){

  Y = rnorm(n=1,mean=U[i-1],sd=sigma) # propose candidate value

  # compute Metropolis ratio
  R = (h(Y) * dnorm(U[i-1],mean=Y,sd=sigma)) /
    (h(U[i-1]) * dnorm(Y,mean=U[i-1],sd=sigma))
  # compute acceptance probability
  p = min(R,1)

  #toss an unfair coin to see if new step should be accepted
  accept = rbinom(n=1,size=1,prob=p)
  if(accept==1){
    U[i]=Y
  }
  if(accept==0){
    U[i]=U[i-1]
  }
}

istar <- which.max(h(U))
xstar <- U[istar]
hstar <- h(xstar)

Optimal x value found.

xstar

## [1] -1.981

Corresponding h(x).

hstar

## [1] 5.957

Day 5: exercise 4 simulated annealing

The idea is to make a biased random walk such that the likelihood of going uphill is bigger than downhill. The implemented algorithm and results is shown below.

#initializing the object function
h = function(x,y)
{
  3* dnorm(x,m=0,sd=.5)*dnorm(y,m=0,sd=.5) +
    dnorm(x,m=-1,sd=.5)*dnorm(y,m=1,sd=.3) +
    dnorm(x,m=1,sd=.5)*dnorm(y,m=1,sd=.3)
}

# plotting h
x = y = seq(-3,3,0.01)
H = outer(x,y,h)
image(x,y,H)
contour(x,y,H,add=TRUE)

plot of chunk unnamed-chunk-47

#Initializing simulated annealing
n = 10000
X = matrix(NA,n,2)
k = 1
# initializing the starting point for the random walk
oldpointy = runif(1,-3,3); oldpointx = runif(1,-3,3)
X[1,1] = oldpointx; X[1,2] = oldpointy;


#implemented algorithm for simulated annealing
for(i in 2:n){

  # finding the step in the random walk from an N(0,1) as auxillary distribution
  xstep = rnorm(1,0,1)
  ystep = rnorm(1,0,1)

  #computing new point
  newpointx = oldpointx + xstep
  newpointy = oldpointy + ystep

  #ensuring the step to be with in the boundaries
  newpointx = max(newpointx,-3)
  newpointx = min(newpointx,3)

  newpointy = max(newpointy,-3)
  newpointy = min(newpointy,3)

  #the probability of accepting the step
  Ti = 1/log(i)
  deltah = h(newpointx,newpointy)-h(oldpointx,oldpointy)
  paccept = min(exp(deltah/Ti),1)

  #Toss an unfair coin to see if the step is accepted
  accept = rbinom(1,1,paccept)

  #updating the step if the step is accepted
  if(accept == 1){
    k = k+1
    oldpointx = newpointx; oldpointy = newpointy;
    X[k,1] = oldpointx; X[k,2] = oldpointy;

  }

}

#plotting the random walk, to varify the processe
image(x,y,H)
lines(X,type='l',lwd=.5)
points(X[1,1],X[1,2],col=4,pch=16)
points(X[k,1],X[k,2],col=3,pch=16)

plot of chunk unnamed-chunk-47

It is clearly seen that the random walk has a preference of going uphill and not very likely to go downhill.

Optimal x1 and x2 value found,

X[k,1];X[k,2]

## [1] -0.2788

## [1] -0.1855

other values seem reasonable close to (0,0) which is the true optimum.