Why padding is used in Base64 encoding? [duplicate]

Question

Possible Duplicate:
Why does base64 encoding requires padding if the input length is not divisible by 3?

...these padding characters must then be discarded when decoding but still allow the calculation of the effective length of the unencoded text, when its input binary length would not be a multiple of 3 bytes. ...

But the calculation of length raw data can easily be done even if strip the padding character.

 | Encoded |-------------------------------------- Raw Size | Total Size | Real Size | Padding Size 1 | 4 | 2 | 2 2 | 4 | 3 | 1 3 | 4 | 4 | 0 4 | 8 | 6 | 2 5 | 8 | 7 | 1 6 | 8 | 8 | 0 7 | 12 | 10 | 2 8 | 12 | 11 | 1 9 | 12 | 12 | 0 10 | 16 | 14 | 2 . . .

So given the real encoded size (third column) you can always correctly guess what padded size would be:

PaddedSize = 4 * Ceil (RealSize / 4)

So in theory, there was no need of padding. Algorithm would have handled it. Considering that Base64 encoding is a popular industry standard, it is used in many applications and devices. These would have benefited from reduced encoded size. So question is, why padding is used in Base64 encoding?

@Ignacio: That question is not very good at explaining why, though. — BastiBen
– BastiBen, Commented Dec 1, 2010 at 8:44
I thought some duplication was allowed (blog.stackoverflow.com/2010/11/…) as long as enough information was put in the question and it was asked with different perspective. — Hemant
– Hemant, Commented Dec 1, 2010 at 9:13

Angus · Accepted Answer · 2010-12-01 08:46:21Z

7

It makes the encoded message an integer multiple of 4 characters. This might make writing a decoder slightly easier. You can load and process characters in blocks of 4 and convert them to 3 output characters, and the padding makes it easy to do this without going off the end of the string.

answered Dec 1, 2010 at 8:46

Angus

1,36010 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Hemant Over a year ago

As mentioned in question, you can calculated the number of padding character just by the size of real encoded data. You can append therefore append it if you want before processing it. There is no need to actually transmit them over the wire!

Angus Over a year ago

The cost of transmitting them over the wire is very small (at most 2 bytes per message). I guess the designers thought that making it simpler (by making the encoded message a sequence of 4-byte blocks, rather than having a variable-length block at the end) was more important than making it slightly more efficient. If you were concerned about bandwidth you wouldn't design a system to use base64 anyway.

Hemant Over a year ago

Hmmm... I do tend to agree with simplicity part! Its just that I assumed there would be a technical need of padding...

Rowland Shaw Over a year ago

@Hemant if padding is not mandatory, you take away the possibility for basic error detection

Piskvor left the building · Accepted Answer · 2010-12-01 09:00:55Z

1

As you note, the end-padding is at most 2 bytes in length regardless of the length of the message, so it's not a really significant saving - more of a micro-optimization. If your application is both the producer and consumer of the encoding, you could strip out the padding, but it's not really worth the hassle.

edited Dec 1, 2010 at 9:00

answered Dec 1, 2010 at 8:42

Piskvor left the building

93.1k46 gold badges182 silver badges226 bronze badges

3 Comments

Angus Over a year ago

If that were its purpose, it would be able to reliably do that, and it can't.

Hemant Over a year ago

Yeah, in one third cases, valid base64 encoded string doesn't end with padding.

Piskvor left the building Over a year ago

@Angus,Hemant: Good point, edited.

BastiBen · Accepted Answer · 2015-03-31 07:05:38Z

Base64 is old and comes from days where there were limits on available RAM and CPU. Also writing software was more complex (today's SDKs and toolkits are much more user-friendly compared to the 80s or 90s) and Base64 had to run on many different system architectures.

That said, the developer could assume that the "real" data, after decoding the Base64 data, would be approximately n bytes long; which in turn allowed him/her to do better memory management.

Today it doesn't really matter anymore, but back in the day where resources were limited, this was a good thing.

Update: Never thought I'd get a downvote after 5 years, but now I can see the problem with my answer. I guess we all get older. ;) Dear visitors, enjoy this answer with a grain of salt.

Calculating the decoded data size (first column) is very easy using read encoded data (third column): firstColumn = thirdColumn * 3 / 4 (Assume firstColumn and thirdColumn integer variables. Seems like simple integer arithmetic that can be done on any platform)!

Collectives™ on Stack Overflow

Why padding is used in Base64 encoding? [duplicate]

3 Answers 3

4 Comments

3 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

3 Comments

1 Comment

Linked

Related