Let’s be interested in guess the Shannon entropy of the populations from which a sample was drawn, and the best representation for this population is a continous random variable.
This estimation can be accomplished with a variation of the Box-Counting algorithm. This algorithm is a standard for the estimation of the fractal dimension of phenomena TÉL1989,Saa2007,Lopes2009,Hausser2009.
In short, hopefully \[ Sh=D_1 log(d_N)+H0 \] holds on some range of d_N, where:
d_N is the size of the bins when we took N bins over the sample. That is \[ d_N= {{(Max(sample)-min(sample))}\over{N}} \]
Sh is the Shannon’s entropy estimation for N bins. That is \[Sh=-\sum_N {p_i·log(p_i)}\] and \(p_i\) is the relative frecuency of the bin i
It is direct that Sh=H0 for d_N=1, that is for the measure in the unitary scale.
You need the ebc.R script at https://docs.google.com/file/d/0B6ZuqpeSKSqcWFlCN2hqT2tGSEk/edit?usp=sharing,
Let’s draw a sample of size 100 from a normal population with mean 0 and entropy 3.5 nats and test all the methods in the entropy package:
par(mfrow=c(3,3))
sample=get_sample(N=100,dist='normal',Sh=3.5,okgraph=T)
## [1] "N(mean= 0 ,sd= 8.01296990322366 ); Sh0= 3.5"
Sh=c()
for (met in c('ML','MM','Jeffreys','Laplace','SG','minimax','CS','shrink')){
Sh=append(Sh,ebc_sample(sample,method=met, bins=set_bins('dyadic',1e5),okplot=T,npts=6))
}
summary(Sh)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.23 3.30 3.48 3.44 3.57 3.63
methods=c('MM','CS','shrink')
par(mfrow=c(4,3))
H0=3.5
dist='normal'
p1=50
base=1e5
factor=set_bins('fib',100)
res=data.frame()
sa=get_sample(base,dist=dist,Sh=H0)
## [1] "N(mean= 0 ,sd= 8.01296990322366 ); Sh0= 3.5"
sb=get_sample(base,dist=dist,Sh=H0)
## [1] "N(mean= 0 ,sd= 8.01296990322366 ); Sh0= 3.5"
for (f in append(0,factor)){
#sa=get_sample(base,dist=dist,p1=(-f),Sh=H0)
#sb=get_sample(base,dist=dist,p1=f,Sh=H0)
sc=append((sa-f),(sa+f))
#H=(log((f+1)/f)+H0)
hist(sc,breaks=30,main=paste('factor=',f))
H=H0+log(2)
#print(paste('H_esp=',H))
v=c()
for (met in methods){
a=ebc_sample(sc,method=met,okplot=F)
v=append(v,a[1])
}
res=rbind(res,data.frame(factor=f,H0=H,rbind(v)))
}
names(res)[3:5]=methods
res[,3:5]=res[,3:5]-res$H0
par(mfrow=c(1,1))
plot(res$factor,res$factor,type='l',col='white',ylim=c(min(res[,3:5]),(max(res[,3:5]))),xlab='factor',ylab='Sh-H0')
lines((res$MM)~res$factor,col=1)
lines((res$CS)~res$factor,col=2)
lines((res$shrink)~res$factor,col=3)
legend(x='bottomright',legend=names(res)[3:5],col=c(1,2,3),lty=1)
summary(res[,3:5])
## MM CS shrink
## Min. :-0.6919 Min. :-0.6919 Min. :-0.6883
## 1st Qu.:-0.5779 1st Qu.:-0.5778 1st Qu.:-0.5730
## Median :-0.1327 Median :-0.1327 Median :-0.1017
## Mean :-0.2734 Mean :-0.2733 Mean :-0.2658
## 3rd Qu.: 0.0007 3rd Qu.: 0.0010 3rd Qu.: 0.0060
## Max. : 0.0010 Max. : 0.0012 Max. : 0.0096
res
## factor H0 MM CS shrink
## v 0 4.193 -0.6918576 -0.691943 -0.688310
## v1 2 4.193 -0.6621744 -0.662073 -0.657392
## v2 3 4.193 -0.6270026 -0.626889 -0.622132
## v3 5 4.193 -0.5287357 -0.528651 -0.523864
## v4 8 4.193 -0.3564810 -0.356429 -0.352334
## v5 13 4.193 -0.1327429 -0.132747 -0.101677
## v6 21 4.193 -0.0118106 -0.011461 -0.008249
## v7 34 4.193 0.0007141 0.001203 0.005828
## v8 55 4.193 0.0009705 0.001009 0.008400
## v9 89 4.193 0.0007618 0.001160 0.006246
## v10 144 4.193 0.0008960 0.001016 0.009642
[entropy]: Jean Hausser and Korbinian Strimmer (2013). entropy: Estimation of Entropy, Mutual Information and Related Quantities. R package version 1.2.0. http://CRAN.R-project.org/package=entropy