Problem 1: Rejection sampling from a truncated Gaussian

  1. Samples from a Gaussian distribution.
set.seed(12654)
sample1 <- rnorm(mean = 1, sd = 2, n = 10000)
hist(sample1)

Sample from a truncated Gaussian: 2. Write Function.

trunc_norm <- function(tmean=0, tsd=1, lower=-Inf, upper=Inf, num_samp){
    v <- numeric(num_samp) 
    for(i in 1:num_samp){
        while(1){
            t <- rnorm(mean = tmean, sd = tsd, n= 1)
            if(t < upper && t > lower)break()
        }
        v[i] <- t
    }
    v
}
  1. Plot the histogram of 10000 samples from a Gaussian with mean 1, standard deviation 2, truncated to (0; 5).
set.seed(90654)
sample2 <- trunc_norm(tmean = 1, tsd = 2, lower = 0,  upper = 5, num_samp = 10000)
hist(sample2)

  1. Introduce some vectorization.
trunc_norm_vec <- function(tmean=0, tsd=1, lower=-Inf, upper=Inf, num_samp){
    v=numeric()
    while(length(v) < num_samp){
        t <- rnorm(mean = tmean, sd = tsd, num_samp)
        v <- c(v, t[t > lower & t<upper])
    }
    v[1:num_samp]
}
  1. Plot the same distribution.
set.seed(90654)
sample3 <- trunc_norm_vec(tmean = 1, tsd = 2, lower = 0,  upper = 5, num_samp = 10000)
hist(sample3)

  1. Use system.time() to compare the efficiency of both functions.
set.seed(90654)
system.time(trunc_norm(1,2,0,5,1000000))
##    user  system elapsed 
##    3.95    0.00    4.09
system.time(trunc_norm_vec(1,2,0,5,1000000))
##    user  system elapsed 
##    0.44    0.02    0.45

Problem 2: Calculating entropy

Calculate for 10000 probability distributions, each of which are 6-dimensional. 1. Generate samples and rescale.

set.seed(7477)
t1 <- matrix(rnorm(mean=1, sd=1, n=60000), 10000, 6)
t1[t1 < 0] <- 0
t1[rowSums(t1)>0] <- t1[rowSums(t1)>0]/rowSums(t1)[rowSums(t1)>0]
  1. Calculate entropy.
entropy <- function(v){
    v <- v[v!=0]
    return(-sum(v*log(v)))
}
entrp6 <- apply(t1,1,entropy)
  1. Plot.
hist(entrp6)

entrp_max <- max(entrp6, na.rm=TRUE)
entrp_max
## [1] 1.790128
t1[entrp6==entrp_max,]
## [1] 0.1672765 0.1810796 0.1593577 0.1757742 0.1528369 0.1636752

5.Make a guess about the probability distribution that has the theoretical maximum entropy.

t1[entrp6==min(entrp6),]
##       [,1] [,2] [,3] [,4] [,5] [,6]
##  [1,]    0    0    1    0    0    0
##  [2,]    0    0    0    1    0    0
##  [3,]    1    0    0    0    0    0
##  [4,]    1    0    0    0    0    0
##  [5,]    0    0    0    0    1    0
##  [6,]    1    0    0    0    0    0
##  [7,]    0    0    0    0    0    1
##  [8,]    0    0    0    1    0    0
##  [9,]    0    0    0    0    0    0
## [10,]    0    0    0    0    1    0
## [11,]    0    1    0    0    0    0
t2 <- t1[entrp6>0,]
entrppos <- entrp6[entrp6>0]
t2[entrppos==min(entrppos),]
## [1] 0.00000000 0.98672021 0.00000000 0.00000000 0.01327979 0.00000000

First look at the distributions with lowest entropy 0. One of them is all zero, and the others all have a probability equal to 1. Look at the vector with the minimal positive entropy, only 2 of the variables are nonzero.

entropy(rep(1/6,6))
## [1] 1.791759
entropy(c(1,0,0,0,0,0))
## [1] 0

It seems that the vector having maximum entropy is a vector whose probability is more even for the 6 dimensions. If the uncertainty of the distribution is very low, the entropy will also be very low. For example, look at the vector v=(1,0,0,0,0,0), the entropy is 0.