C#: Decoding Base64 string not returning full result / gibberisch

Question

We have a base64 encoded string that is saved in a database. (The result from that still needs to be decoded with bzip2, but that's another issue.) However when trying to convert it using C# we run into some issues.

 // get base 64 string from file string base64String = File.ReadAllText(@"D:\bzip2\base64text.txt", Encoding.UTF8); // decode from base64 var largeCompressedTextAsBytes = Convert.FromBase64String(base64String); // convert to string var decodedString = Encoding.UTF8.GetString(largeCompressedTextAsBytes);

First of all we think we have to remove the first part for it to even convert:

=base64begin line=73 size=142698 crc=

Next we get a result but it is way too small (and all gibberish, but that again might be because of the further encoding with bzip2)

��oW�k�_i�ۍ��ֶӽ^k��MZ�V�bzip2,7,16813,16573,16672,16636,15710,14413,7264,BZh61AY&SY�de�

We have tried removing the newlines from the text (without avail) text.Replace(Environment.NewLine, "");

Does anybody have any ideas here?

Thank you

Schoof

Yes, it appears to be compressed with bzip2... when you say it's "way too small" - what do you expect the compressed size to be? Note that when I converted that, the result started with "bzip2" rather than having anything before that... — Jon Skeet
– Jon Skeet, Commented Apr 17, 2019 at 13:03
Fundamentally, what created the data that was stored in the database? I very much doubt that the problem is with the base64 decoding. — Jon Skeet
– Jon Skeet, Commented Apr 17, 2019 at 13:05
If its base64 encoded bzip2 then there is no point in attempting to view it as a string, that's only useful after decompression ... — Alex K.
– Alex K., Commented Apr 17, 2019 at 13:07
@AlexK. You are correct, but because the decode from bzip2 fails I thought the issue was with the decode (because it returned such a small text), but it turns out visual studio wasn't showing the full text when debugging... :) — Schoof
– Schoof, Commented Apr 17, 2019 at 14:50

Jon Skeet · Accepted Answer · 2019-04-18 08:42:17Z

The first line of your data is effectively a header:

=base64begin line=73 size=142698 crc=1e0db1eda49fad0c242c2da2071ea521501a91ad

The rest is base64. After converting that base64 into binary, you end up with some text:

bzip2,7,16813,16573,16672,16636,15710,14413,7264,

... followed by a bzip2 file. I don't know what this "header" data is, but after removing that, the rest can be extracted using bunzip2. The result is an RTF file that contains some images.

Your next steps should be to get more information about what's storing the data in the database, and exactly what its steps are. They appear to be:

Compress the file
Add the "header" prefix starting "bzip2"
Convert the result to base64
Add another "header" prefix with the CRC and length
Store the resulting text

You should try to find out precise details of all of these steps so that you can undo them, performing any checks (e.g. CRC checks) along the way.

Here's a complete program that extracts the file from the sample you've given. I've guessed at the "inner" header form, but you should really try to find out what's creating the header so you can validate my assumptions.

using SharpCompress.Compressors.BZip2; using System; using System.IO; using System.Text; class Program { static void Main(string[] args) { string base64; using (var reader = File.OpenText(args[0])) { // Skip the first line, which has some header information // TODO: Use it instead, to validate the rest of the data. reader.ReadLine(); base64 = reader.ReadToEnd(); } byte[] bytes = Convert.FromBase64String(base64); int startOfBody = FindStartOfBody(bytes); using (var input = new MemoryStream(bytes, startOfBody, bytes.Length - startOfBody)) { using (var bzip2 = new BZip2Stream(input, SharpCompress.Compressors.CompressionMode.Decompress, true)) { using (var output = File.OpenWrite(args[1])) { bzip2.CopyTo(output); } } } } private static int FindStartOfBody(byte[] bytes) { // The file starts with a "header" of an unknown format, which we need to // skip. It looks like the format *might* be a sequence of comma-separated values // - Name of some kind (BZIP2) // - Number of further values // - The remaining values // That's what this code does. int offset = 0; // Skip the name GetNextHeaderValue(bytes, ref offset); // Find out how many more values there are string valueCountText = GetNextHeaderValue(bytes, ref offset); int valueCount = int.Parse(valueCountText); // Skip them for (int i = 0; i < valueCount; i++) { GetNextHeaderValue(bytes, ref offset); } // We'll now be positioned at the end return offset; } private static string GetNextHeaderValue(byte[] bytes, ref int offset) { StringBuilder builder = new StringBuilder(); // TODO: Validation that we're not going past the end of the data... // We assume all header data is ASCII. for (; bytes[offset] != ','; offset++) { builder.Append((char) bytes[offset]); } // Move the offset past the comma offset++; return builder.ToString(); } }

The result that you get is the correct one. It is indeed an RTS file with some images in it. How did you perform the conversion? I added the code I use to decode the base64 to a string to my initial post. How did you convert it further? We want to use .NET for this and only found [SharpZipLib] (github.com/icsharpcode/SharpZipLib) for this. Which crashes with the exception 'BZip2 input stream bad block header' when trying to decompress the decoded base64.
It seems like visual studio was not showing the full result, it does decode it correctly. When writing the result to a file we get the full result. The header data you are talking about, which part is that? Is it only the part you posted? bzip2,7,16813,16573,16672,16636,15710,14413,7264 or does it continue?
@Schoof: Yes, the header is just "bzip2,7,16813,16573,16672,16636,15710,14413,7264," - the part that bunzip2 wants starts with "BZ". I performed the conversion using bunzip2. I could look at trying SharpZipLib though - IIRC, I happened to write the bzip2 decoder for that :)
I tried with SharpCompress but there I get another exception when I try to convert the decoded base64 string: System.IndexOutOfRangeException: 'Index was outside the bounds of the array.' (Using using (Stream stream = new MemoryStream(Encoding.UTF8.GetBytes(decodedbase64string))) using (var reader = ReaderFactory.Open(stream)) Any ideas there? Thanks a lot already! :)

Collectives™ on Stack Overflow

C#: Decoding Base64 string not returning full result / gibberisch

1 Answer 1

10 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Related