Shannon's Lossless Source Coding theorem states that the expected length of any code is lower bounded by the entropy of the probability distribution over the symbols for that code:
$$ H\left(X\right) \leq \mathbb{E}_{P_{X}}\left(l \circ f\right) $$
where $\circ$ denotes the function composition operation, and $l$ is the length of the codeword assigned by the encoding function $f$.
So any code you choose for your sequences of coin tosses should have its expected length greater than the entropy.
For a sequence of equally probable independent coin tosses, there is only one code we can choose (up to a permutation): encode H as 0 and T as 1. That is, we must use $n$ bits to encode the coin tosses. Since there is only a single encoding, it is optimal.
For a sequence of coin tosses, where a coin biased in favour of heads is used, then we could also encode H as 0 and T as 1 - but we would be wasting space as we know that sequences containing heads are more likely. We could save space by assigning more probable sequences (HHHHHHHTTT for example) shorter codewords. One such encoding strategy is Huffman Coding.
An optimal encoding, would be one where $\mathbb{E}_{P_{X}}\left(l \circ f\right)$ met $H\left(X\right)$ with equality - in your example where only 4.7 bits of space was needed to encode all 0/1 sequences of length 10.
Shannon's theorem says, we can't possibly do better than the entropy (with zero error), but we can do a lot worse. In the biased coin case, mapping H to 0 and T to 1 wasted 5.3 bits!