Entropy Of A Symbol

Question

Does the entropy of a symbol represent the most optimal average number of bits that can be used to represent a symbol? For example take the example of tossing a coin:

$$ H_{source} = -p_{head} log_2 p_{head} - p_{tail} log_2 p_{tail} $$

so if

$$ p_{head} = p_{tail} = \frac{1}{2} $$

then H = 1 bit, but if

$$ p_{head} = \frac{9}{10} , p_{tail} = \frac{1}{10} $$

then H = 0.468993 bits, meaning that if wanted to transmit the outcome of 10 coin tosses then I would know I use the most optimal coding scheme if my average number of bits I use to represent the outcome of the 10 coin tosses is:

$$ \text { Optimal Average Bits that would to be transmitted = Information } = NH $$

which is 10 bits in the first case but 4.7 bits in the second case.Is this right?

Tom Kealy · Accepted Answer · 2013-10-21 16:03:14Z

Shannon's Lossless Source Coding theorem states that the expected length of any code is lower bounded by the entropy of the probability distribution over the symbols for that code:

$$ H\left(X\right) \leq \mathbb{E}_{P_{X}}\left(l \circ f\right) $$

where $\circ$ denotes the function composition operation, and $l$ is the length of the codeword assigned by the encoding function $f$.

So any code you choose for your sequences of coin tosses should have its expected length greater than the entropy.

For a sequence of equally probable independent coin tosses, there is only one code we can choose (up to a permutation): encode H as 0 and T as 1. That is, we must use $n$ bits to encode the coin tosses. Since there is only a single encoding, it is optimal.

For a sequence of coin tosses, where a coin biased in favour of heads is used, then we could also encode H as 0 and T as 1 - but we would be wasting space as we know that sequences containing heads are more likely. We could save space by assigning more probable sequences (HHHHHHHTTT for example) shorter codewords. One such encoding strategy is Huffman Coding.

An optimal encoding, would be one where $\mathbb{E}_{P_{X}}\left(l \circ f\right)$ met $H\left(X\right)$ with equality - in your example where only 4.7 bits of space was needed to encode all 0/1 sequences of length 10.

Shannon's theorem says, we can't possibly do better than the entropy (with zero error), but we can do a lot worse. In the biased coin case, mapping H to 0 and T to 1 wasted 5.3 bits!

Sean Michael Dorian · Accepted Answer · 2013-10-21 15:48:35Z

My understanding of entropy (as it applies to a system) is that it represents the number of different states a system can have. In the case of a coin toss, there's only one bit of entropy to represent which side of the coin it lands on (Assuming it always lands flat on one side). In a series of 10 coin tosses, you would have 10 bits of entropy to represent the 10 outcomes of the coin toss. You can think of tossing the coin as the active process of determining the state of the coin toss.

In regards to entropy representing the "most optimal average number" of bits to represent something, I wouldn't say that it is. What is "most optimal?"

Ten tosses of a biased coin does not produce ten bits of entropy. — nispio
– nispio, Commented Oct 21, 2013 at 19:01

Stack Exchange Network

Entropy Of A Symbol

2 Answers 2

Hot Network Questions

Entropy Of A Symbol

2 Answers 2

Related

Hot Network Questions