Morse Code
1)
Since a = 1 - p and b = p. We can find the binary entropy H(a, b) = alog2(1/a) + blog2(1/b). We want to get a rate measure for this so we need to divide by time. We since one of our symbols occupies two time steps (let’s arbitrarily pick a) we can say T = 2a + b
We can combine our equations so: a = 1 - b
T = 2(1 - b) + b
= 2 - 2b + b
= 2 - b
Our rate measure is therefore:
H(a,b) / T(a,b) = (a * log2(1/a) + b * log2(1/b)) / (2 - b)
entropy.rate = function(b) {
a = 1 - b
return((a * log2(1 / a) + b * log2(1 / b)) / (2 - b))
}
plot.new()
rate.plot = sapply(seq(from=.01, to=.99, by=.01), FUN=function(p){entropy.rate(p)})
plot(seq(from=.01, to=.99, by=.01), rate.plot, xlab="p.value",
ylab="entropy rate", main="Mapping entropy rate", col="blue")
max.value = rate.plot[which(rate.plot == max(rate.plot))]
max.index = which(rate.plot == max(rate.plot))
abline(v=(max.index / 99), col="red")
abline(a=max.value, b=0, col="green")
Our maximum value occurs as our p approaches ~~ .62 in the beta position with an entrop rate ~~ .69
Forwards and backward prediction
Forward looking: P(x1, x2, x3) = P(x3|x2, x1) * P(x2|x1) * P(x1)
Backward looking: P(x3, x2, x1) = P(x1|x3, x2) * P(x2|x3) * P(x3)
Cognitively it should be more difficult to predict the reversed text. The basis for this intuition comes from the fact that we have encoded a forward-looking language model (in learning language). This same difficulty should not apply from a purely (mechanical) stastical perspective - we could train a language model using reversed text and the same statistical properties (encoded as conditional probabilities) would be the same as in the forward read text, just reversed. This would be much more challenging for a human (to re-train). But computationally our language model would just be re-calculating conditional probabilities in a different order.
Question 3)
#plotting our cumulative probability function
binom.param = .25
P = function(n) {
return((1 - binom.param)^(n - 1)*(binom.param))
}
probs=sapply(1:100, FUN=P)
cdf = function(n) { sum(probs[1:n]) }
c.probs = sapply(1:20, cdf)
plot(c.probs, pch=16, col="blue")
We arbitrarily assign: P(h) = .25 P(t) = .75
Our Shano-Fano_Elias encoding becomes:
t = 1
h = 00
tt = 11
th = 100
ht = 001
hh = 0000
. . .
The sum from 1 to n of 2^(codelength(n) equals 1 - 2^codelength(n) which is the probability that we should have a code word of length k_i given i time steps.
We can estimate the average number of characters added to our code per time-step: 75% of the time we add a single bit (divide our space by 2) and the remaining 25% of the time we add two bits. On average we add 1.25characters per time step so i and k(i) are linearly related.
This estimate imrpoves as i increases so it should be least predicitve at the first time step i = 1 (we estimate a codeword length of 1.25). So for n=1 we have 2^-1 == .5 (this of course makes sense because we’ve divded our space in half and added 1 bit). Our estimate 2^-1.25 is slightly lower (.42).