In everyday parlance, entropy refers to the inevitable deterioration of a system (including a society). As you may remember from a physics course, there is a far more formal definition that originates from thermodynamics. I’ll spare you the equation but in brief, entropy is a measure of the amount of thermal energy (heat) that is NOT available to do work.
Entropy is used in many other context such as cosmology, chemistry and for what interests us here, it is also used in information theory.
In information theory, entropy could refer to a measure of the uncertainty in a random variable or unpredictability or information content. Let us consider the latest. What means information ?
Let consider a simple example of a standard set of 52 cards. Consider 3 events E1, E2 and E3.
- E1 means the card is a heart. E1 probability is 1/4
- E2 means the card is a 7. E2 probability is 1/13
- E3 means the card is a seven of heart. The union of E1 and E2 has a probability of 1/52
An event that has a low probability is interesting from the information point of view. It has a high information content. Conversely, a high probability event has less information content. We are interested by low probability events.
Let us try to get a feeling of what kind of information contain the 3 events mentioned abov. If we try to sort them, it makes sense to write:
- I(E3) >= I(E2) >= I(E1) (based on the probabilities)
In addition, by pure intuition:
- I(E3) = I(E2) + I(E1)
Finally, an information function would require
- I(Ei) >= 0
It appears that there are few functions that satisfies the 3 conditions above and it can be shown that that they takes the following form:
I(E) = – K log_a P(E)