In information theory, information is defined as “the reduction of uncertainty when we learn the outcome”. To quantify uncertainty, there are certain properties that measures of uncertainty should posess. These are:
Continuity. Meaning that a uncertainty needs to be measured on a continuous scale with equal intervals to ensure comparability.
Additivity. Meaning that adding up the uncertainty of each prediction gives the total uncertainty.
Scalability. Means that uncertainty scales with the number of possible outcomes.
Information entropy is the only function that contain these critera.
The true probability of this coin to come up as heads 70% of the time, that is 0.7. We can use the information entropy to calculate the entropy of this coin, giving the uncertainty contained in this probability distribution. The information entropy is the average log-probability of an event.
p <- c(0.3, 0.7)
-sum(p*log(p))
## [1] 0.6108643
The entropy is 0.61
Using the information entropy function again, the entropy of the four sided die is calculated as
p <- c(0.2,0.25,0.25,0.3)
-sum(p*log(p))
## [1] 1.376227
The entropy is 1.37
Again using the informaiton entropy function:
p <- c(0.333,0.333,0.333)
-sum(p*log(p))
## [1] 1.098513
The entropy is 1.09