7E1. State the three motivating criteria that define information entropy. Try to express each in your own words.

In information theory, information is defined as “the reduction of uncertainty when we learn the outcome”. To quantify uncertainty, there are certain properties that measures of uncertainty should posess. These are:

  1. Continuity. Meaning that a uncertainty needs to be measured on a continuous scale with equal intervals to ensure comparability.

  2. Additivity. Meaning that adding up the uncertainty of each prediction gives the total uncertainty.

  3. Scalability. Means that uncertainty scales with the number of possible outcomes.

Information entropy is the only function that contain these critera.

7E2. Suppose a coin is weighted such that, when it is tossed and lands on a table, it comes up heads 70% of the time. What is the entropy of this coin?

The true probability of this coin to come up as heads 70% of the time, that is 0.7. We can use the information entropy to calculate the entropy of this coin, giving the uncertainty contained in this probability distribution. The information entropy is the average log-probability of an event.

p <- c(0.3, 0.7)
-sum(p*log(p))
## [1] 0.6108643

The entropy is 0.61

7E3. Suppose a four-sided die is loaded such that, when tossed onto a table, it shows “1” 20%, “2” 25%, “3” 25%, and “4” 30% of the time. What is the entropy of this die?

Using the information entropy function again, the entropy of the four sided die is calculated as

p <- c(0.2,0.25,0.25,0.3)
-sum(p*log(p))
## [1] 1.376227

The entropy is 1.37

7E4. Suppose another four-sided die is loaded such that it never shows “4”. The other three sides show equally often. What is the entropy of this die?

Again using the informaiton entropy function:

p <- c(0.333,0.333,0.333)
-sum(p*log(p))
## [1] 1.098513

The entropy is 1.09