0

So basically I'm writing an application that looks for PNG files in a binary file. It does this by reading in an entire binary in file into a byte array and then converting it to a string using the Convert.ToBase64String method and then using a regex that matches a PNG's header information and end chunk to find the images. Problem is using the ToBase64String method generates wildly different outputs depending on the length of the byte array and the documentation on MSDN doesn't seem to elaborate on it. Anyways here's an example of what I mean.

 byte[] somebytes = new byte[] { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 }; Console.WriteLine(Convert.ToBase64String(somebytes)); 

The output in this case is "AQIDBAUGBwg=" now if I skip a byte...

 byte[] somebytes = new byte[] { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 }; somebytes = somebytes.Skip(1).ToArray(); Console.WriteLine(Convert.ToBase64String(somebytes)); 

The output is now "AgMEBQYHCA==" so almost every character has changed from the previous example.

So am I hopelessly following the wrong path here for regexing a binary file or is there a method (maybe by padding?) I can guarantee more consistency across these conversion?

Update: Based on the feedback I've gathered it seems I should just move away from the Regex solution and manually search for the start and end byte sequences manually myself. Not sure why I'm being downvoted as I just wanted to understand why my other solution did work and there doesn't seem to be any other posts on this topic. Anyways thanks everyone for the quick feedback. I'll post the algorithm I used for finding images when I'm done in case it might benefit someone else.

12
  • 1
    Yes, you are. There's no point in doing base64 and then trying to find the header. Why not just find in the binary? Or if you must then use hexadecimal for consistency Commented Oct 27, 2016 at 19:04
  • 3
    "It does this by reading in an entire binary in file into a byte array and then converting it to a string using the Convert.ToBase64String method and then using a regex that matches a PNG's header information and end chunk to find the images." What the...? You have a byte array, so search the byte array. Commented Oct 27, 2016 at 19:06
  • 1
    As for why it changes, it changes because you are shifting the array. Base64 takes 6 bits at a time and translates in into a character. If you shift by a byte (8 bits) then you are going to get totally different characters. Commented Oct 27, 2016 at 19:11
  • 1
    @Thermonuclear This is a terrible idea, especially if you want to add support for other file formats. Just search for the appropriate format signature in the byte array. Commented Oct 27, 2016 at 19:33
  • 1
    The deleted answer was correct. You need to examine the bytes of the image, not convert it to something else and try to parse it some other way. Commented Oct 27, 2016 at 19:42

2 Answers 2

0

You confirmed in the comments that you are trying to pull resources from a C# structured file (EXE or DLL). You can use reflection methods to pull them out: GetManifestResourceStream, GetManifestResourceNames, GetManifestResourceInfo is a good starting point.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for that I will certainly try messing around with that. I may still continue pursuing reading the bytes manually however as ideally it would be nice if the image replacer worked on more than just .Net binaries but this information is still very useful indeed!
@Thermonuclear, native C++ binaries are also structured and have apis for messing with resources. You should not try to do this yourself as its more complicated then you think. What if the current PNG is 1000 bytes and you want to replace it with one thats 1010 bytes? You'll overwrite code / other resources. The APIs will take care of that for you.
My application was originally only going to allow image replacements on images that were either equal to or smaller than the original being replaced thus avoiding that problem at least it worked when I did it manually to a WPF application. However if these APIs give me the power to swap the image for a larger one that does give me a very compelling reason to reconsider my scope for that ability.
So having played with the 'GetManifest' reflection methods I don't see how it's possible to change the resources. I'm only seeing 'Get' methods and when I use GetManifestResourceStream to grab a stream containing an image the stream's 'CanWrite' property is 'false'. Please correct me if I'm wrong but I don't see how I can set or modify the manifest resource streams using reflection.
0

As promised here is the logic I've written to find the images in the binary in the event it might help someone else. However, I may ultimately use SledgeHammers method but it was important to me that I'm able to handle it using this method as well.

public class BinarySearch { public static IEnumerable<byte[]> Match(byte[] source, byte[] beginningSequence, byte[] endSequence) { int index = 0; IList<byte[]> matches = new List<byte[]>(); while (index < source.Length) { var startIndex = FindSequence(source, beginningSequence, index); if (startIndex >= 0) { var endIndex = FindSequence(source, endSequence, startIndex + beginningSequence.Length); if (endIndex >= 0) { var length = (endIndex - startIndex) + endSequence.Length; var buffer = new byte[length]; Array.Copy(source, startIndex, buffer, 0, length); matches.Add(buffer); index = endIndex + endSequence.Length; } else { index = source.Length; } } else { index = source.Length; } } return matches; } private static int FindSequence(byte[] bytes, byte[] sequence, int startIndex = 0) { int currentIndex = startIndex; int sequenceIndex = 0; bool found = false; while (!found && currentIndex < bytes.Length) { if (bytes[currentIndex] == sequence[sequenceIndex]) { if (sequenceIndex == (sequence.Length - 1)) { found = true; } else { sequenceIndex++; } } else { currentIndex -= sequenceIndex; sequenceIndex = 0; } currentIndex++; } return found ? (currentIndex - sequence.Length) : -1; } } 

Here's an example of it's usage for PNG files.

var imageHeaderStart = new byte[] { 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A, 0x00 }; var imageEOF = new byte[] { 0x00, 0x00, 0x49, 0x45, 0x4E, 0x44, 0xAE, 0x42, 0x60, 0x82 }; var matches = BinarySearch.Match(binaryData, imageHeaderStart, imageEOF); 

I'll add a link to the Github project upon it's completion in case anyone is interested in my 'complete' implementation.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.