Huffman encoding will perform best when the distribution of symbols of an alphabet that the string to be encoded uses is dyadic.
Given an arbitrary bit string S, how can we find the best alphabet for encoding? Suppose S is an ASCII file. Then given the regularity of 1-byte characters that such files exhibit, we would expect that an optimal, or at least pretty good, alphabet should contain, say, 8-bit or 16-bit words (which we then build codes for after constructing the Huffman tree).
Is there an algorithm for finding the optimal word width (assume we use constant-length words).
I would guess that to evaluate an alphabet, it would only be fair if we considered the costs of storing the actual encoding as well. This addresses the case where the alphabet is just one symbol - the entire original string. Technically the message would just be one bit, but the the encoding tree that's stored would have to indicate that the one bit used is a code for the original string, so we've just increased our message by two bits trivially!
(Constant-length encoding information such as width size, encoding table size, etc., need not be considered for the comparisons, of course).