Estimating Shannon Entropy

Let’s be interested in guess the Shannon entropy of the populations from which a sample was drawn, and the best representation for this population is a continous random variable.

This estimation can be accomplished with a variation of the Box-Counting algorithm. This algorithm is a standard for the estimation of the fractal dimension of phenomena TÉL1989,Saa2007,Lopes2009,Hausser2009.

In short, hopefully \[ Sh=D_1 log(d_N)+H0 \] holds on some range of d_N, where:

It is direct that Sh=H0 for d_N=1, that is for the measure in the unitary scale.

Simulation

You need the ebc.R script at https://docs.google.com/file/d/0B6ZuqpeSKSqcWFlCN2hqT2tGSEk/edit?usp=sharing,

Let’s draw a sample of size 100 from a normal population with mean 0 and entropy 3.5 nats and test all the methods in the entropy package:

par(mfrow=c(3,3))
sample=get_sample(N=100,dist='normal',Sh=3.5,okgraph=T)
## [1] "N(mean= 0 ,sd= 8.01296990322366 ); Sh0= 3.5"
Sh=c()
for (met in c('ML','MM','Jeffreys','Laplace','SG','minimax','CS','shrink')){
    Sh=append(Sh,ebc_sample(sample,method=met, bins=set_bins('dyadic',1e5),okplot=T,npts=6))
}

plot of chunk simulation1

summary(Sh)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.23    3.30    3.48    3.44    3.57    3.63

Mixing samples

methods=c('MM','CS','shrink')
par(mfrow=c(4,3))
H0=3.5
dist='normal'
p1=50
base=1e5
factor=set_bins('fib',100)
res=data.frame()
sa=get_sample(base,dist=dist,Sh=H0)
## [1] "N(mean= 0 ,sd= 8.01296990322366 ); Sh0= 3.5"
sb=get_sample(base,dist=dist,Sh=H0)
## [1] "N(mean= 0 ,sd= 8.01296990322366 ); Sh0= 3.5"
for (f in append(0,factor)){
    #sa=get_sample(base,dist=dist,p1=(-f),Sh=H0)
    #sb=get_sample(base,dist=dist,p1=f,Sh=H0)
    sc=append((sa-f),(sa+f))
    #H=(log((f+1)/f)+H0)
    hist(sc,breaks=30,main=paste('factor=',f))
    H=H0+log(2)
    #print(paste('H_esp=',H))
    v=c()
    for (met in methods){
        a=ebc_sample(sc,method=met,okplot=F)
        v=append(v,a[1])
    }
    res=rbind(res,data.frame(factor=f,H0=H,rbind(v)))
    
}
names(res)[3:5]=methods
res[,3:5]=res[,3:5]-res$H0
par(mfrow=c(1,1))

plot of chunk unnamed-chunk-1

plot(res$factor,res$factor,type='l',col='white',ylim=c(min(res[,3:5]),(max(res[,3:5]))),xlab='factor',ylab='Sh-H0')
lines((res$MM)~res$factor,col=1)
lines((res$CS)~res$factor,col=2)
lines((res$shrink)~res$factor,col=3)
legend(x='bottomright',legend=names(res)[3:5],col=c(1,2,3),lty=1)

plot of chunk unnamed-chunk-1

summary(res[,3:5])
##        MM                CS              shrink       
##  Min.   :-0.6919   Min.   :-0.6919   Min.   :-0.6883  
##  1st Qu.:-0.5779   1st Qu.:-0.5778   1st Qu.:-0.5730  
##  Median :-0.1327   Median :-0.1327   Median :-0.1017  
##  Mean   :-0.2734   Mean   :-0.2733   Mean   :-0.2658  
##  3rd Qu.: 0.0007   3rd Qu.: 0.0010   3rd Qu.: 0.0060  
##  Max.   : 0.0010   Max.   : 0.0012   Max.   : 0.0096
res
##     factor    H0         MM        CS    shrink
## v        0 4.193 -0.6918576 -0.691943 -0.688310
## v1       2 4.193 -0.6621744 -0.662073 -0.657392
## v2       3 4.193 -0.6270026 -0.626889 -0.622132
## v3       5 4.193 -0.5287357 -0.528651 -0.523864
## v4       8 4.193 -0.3564810 -0.356429 -0.352334
## v5      13 4.193 -0.1327429 -0.132747 -0.101677
## v6      21 4.193 -0.0118106 -0.011461 -0.008249
## v7      34 4.193  0.0007141  0.001203  0.005828
## v8      55 4.193  0.0009705  0.001009  0.008400
## v9      89 4.193  0.0007618  0.001160  0.006246
## v10    144 4.193  0.0008960  0.001016  0.009642

bibliography

[entropy]: Jean Hausser and Korbinian Strimmer (2013). entropy: Estimation of Entropy, Mutual Information and Related Quantities. R package version 1.2.0. http://CRAN.R-project.org/package=entropy