I've been trying to reverse engineer the format of data files from the 1997 video game Helicops. I have experience reverse engineering binary files from games of this period but limited exposure to compression schemes.
I've been able to figure out the header format (which I've documented here; it uses an XOR "cipher" with the key 0xAA), but the individual data blocks within each file use a form of compression I haven't been able to figure out.
I'm certain the data are compressed: the header in each DAT file contains a table listing each data block's name, starting offset, and two values: one is the length of the data block in the DAT file and the other appears to be the data block's uncompressed length (notably, it's 768 for multiple data blocks that appear to be colour tables; 768 = 256 colours * 3 bytes/colour and these data blocks have names that end with "ACT", i.e., Adobe Color Table, and some include obvious grayscale ramps).
My full notes on the compression scheme are here, but I haven't been able to figure out much. Here's a summary:
- In multiple data files,
0xFFis followed by 8 uncompressed bytes. This is consistent across multiple files containing different types of recognizable data (ASCII text and colour palettes), so it seems likely that all the game's data files use the same type of compression. - Currently, I have no idea how
0xFFtranslates to "the next 8 bytes are uncompressed data". Since the header uses an XOR cipher, I've tried XORing it with various values (0xAAsince that's the key for the header data,0xF7since that gives0x08, etc.) but I haven't come up with any that make sense for0xFFand some of the other values that are clearly compression-related. My guess is that ranges of bits within each byte contain different compression-related info, but that the bytes need to be altered in some way first. - Based on bytes that are evidently uncompressed data, it's clear that there are many sequences of bytes related to data compression (see examples in the sample data below), but I haven't been able to figure out how to decode them.
- I suspect the compression scheme uses some form of run-length encoding and/or something along the lines of LZ77 where references to sequences elsewhere in the data are used to reduce overall size.
I've uploaded a sample DAT file, PIC.DAT, here. It contains substantial quantities of ASCII text that make identifying compression-related bytes fairly easy. A list of its data blocks (parsed from its header) is here.
Here are some sample data from the data block starting at offset 184456 in PIC.DAT (PIC.DAT's header identifies this block as being associated with the third mission set, "Data Space Demon", and the first mission in that set, "Tower Attack"; note that these strings appear in the data below):
| Bytes | Notes/ASCII |
|---|---|
| FF | |
| 52 45 4D 20 4D 69 73 73 | "REM Miss" |
| FF | |
| 33 31 2E 50 49 43 20 2D | "31.PIC -" |
| FE FD F4 | |
| 20 54 6F 77 65 72 20 | " Tower " |
| BF 28 | BF = 191/-65, 28 = 40/40 |
| 54 6F 6B 79 6F | "Tokyo" (not part of level name) |
| 05 03 29 FB 0D 0A EE F1 | |
| 44 61 74 61 20 | "Data " |
| FF | |
| 53 70 61 63 65 20 44 65 | "Space De" |
| EF | 239/-17 |
| 6D 6F 6E | "mon" |
| 2C 05 04 | |
| 41 74 74 | "Att" |
| F7 | 247/-9 |
| 61 63 6B | "ack" |
| 19 03 | |
| 43 6F 70 69 | "Copi" |
| FF | |
| 65 64 20 66 72 6F 6D 20 | "ed from " |
| 7F | |
| 48 43 5F 57 6F 72 6B | "HC_Work" |
| FC F6 FE 5E 0E | |
| 20 20 31 2D 32 33 2D | " 1-23-" |
| 7F | |
| 39 37 20 44 42 0D 0A | "97 DB\r\n" |
| 19 03 F1 2A 84 0F 8A 02 19 03 | |
| [remaining bytes follow] |
There are additional sample data here, including colour palettes (some of which appear to be minimally-compressed).
I'd appreciate any suggestions regarding how this compression format might work! I'll credit any assistance in the file format documentation I'm writing.