set.seed(12654)
sample1 <- rnorm(mean = 1, sd = 2, n = 10000)
hist(sample1)
Sample from a truncated Gaussian: 2. Write Function.
trunc_norm <- function(tmean=0, tsd=1, lower=-Inf, upper=Inf, num_samp){
v <- numeric(num_samp)
for(i in 1:num_samp){
while(1){
t <- rnorm(mean = tmean, sd = tsd, n= 1)
if(t < upper && t > lower)break()
}
v[i] <- t
}
v
}
set.seed(90654)
sample2 <- trunc_norm(tmean = 1, tsd = 2, lower = 0, upper = 5, num_samp = 10000)
hist(sample2)
trunc_norm_vec <- function(tmean=0, tsd=1, lower=-Inf, upper=Inf, num_samp){
v=numeric()
while(length(v) < num_samp){
t <- rnorm(mean = tmean, sd = tsd, num_samp)
v <- c(v, t[t > lower & t<upper])
}
v[1:num_samp]
}
set.seed(90654)
sample3 <- trunc_norm_vec(tmean = 1, tsd = 2, lower = 0, upper = 5, num_samp = 10000)
hist(sample3)
set.seed(90654)
system.time(trunc_norm(1,2,0,5,1000000))
## user system elapsed
## 3.95 0.00 4.09
system.time(trunc_norm_vec(1,2,0,5,1000000))
## user system elapsed
## 0.44 0.02 0.45
Calculate for 10000 probability distributions, each of which are 6-dimensional. 1. Generate samples and rescale.
set.seed(7477)
t1 <- matrix(rnorm(mean=1, sd=1, n=60000), 10000, 6)
t1[t1 < 0] <- 0
t1[rowSums(t1)>0] <- t1[rowSums(t1)>0]/rowSums(t1)[rowSums(t1)>0]
entropy <- function(v){
v <- v[v!=0]
return(-sum(v*log(v)))
}
entrp6 <- apply(t1,1,entropy)
hist(entrp6)
entrp_max <- max(entrp6, na.rm=TRUE)
entrp_max
## [1] 1.790128
t1[entrp6==entrp_max,]
## [1] 0.1672765 0.1810796 0.1593577 0.1757742 0.1528369 0.1636752
5.Make a guess about the probability distribution that has the theoretical maximum entropy.
t1[entrp6==min(entrp6),]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 0 1 0 0 0
## [2,] 0 0 0 1 0 0
## [3,] 1 0 0 0 0 0
## [4,] 1 0 0 0 0 0
## [5,] 0 0 0 0 1 0
## [6,] 1 0 0 0 0 0
## [7,] 0 0 0 0 0 1
## [8,] 0 0 0 1 0 0
## [9,] 0 0 0 0 0 0
## [10,] 0 0 0 0 1 0
## [11,] 0 1 0 0 0 0
t2 <- t1[entrp6>0,]
entrppos <- entrp6[entrp6>0]
t2[entrppos==min(entrppos),]
## [1] 0.00000000 0.98672021 0.00000000 0.00000000 0.01327979 0.00000000
First look at the distributions with lowest entropy 0. One of them is all zero, and the others all have a probability equal to 1. Look at the vector with the minimal positive entropy, only 2 of the variables are nonzero.
entropy(rep(1/6,6))
## [1] 1.791759
entropy(c(1,0,0,0,0,0))
## [1] 0
It seems that the vector having maximum entropy is a vector whose probability is more even for the 6 dimensions. If the uncertainty of the distribution is very low, the entropy will also be very low. For example, look at the vector v=(1,0,0,0,0,0), the entropy is 0.