I started working on this. I'm posting my results so far here as a "community wiki" answer for two reasons: first, if someone else wants to join in, there's a place to talk; second, if I get pulled away from this project, there'll be hints for someone else to start working.
The backup logic on the host is entirely contained within https://github.com/android/platform_system_core/blob/master/adb/commandline.cpp, in the function named backup. The function is very simple: it validates the command line options, sends the command mostly as-is to the adb daemon on the phone, and writes the phone's output to the file. There isn't even error-checking: if, for example, you refuse the backup on the phone, adb just writes out an empty file.
On the phone, the backup logic starts in service_to_fd() in https://github.com/android/platform_system_core/blob/master/adb/services.cpp. The function identifies that the command from the host is "backup", and passes the unparsed command to /system/bin/bu, which is a trivial shell script to launch com.android.commands.bu.Backup as the main-class of a new Android app process. That calls ServiceManager.getService("backup") to get the backup service as an IBackupManager, and calls IBackupManager.fullBackup(), passing it the still-unused file descriptor (very indirectly) connected to the backup.ab file on the host.
Control passes to fullBackup() in com.android.server.backup.BackupManagerService, which pops up the GUI asking the user to confirm/reject the backup. When the user do so, acknowledgeFullBackupOrRestore() (same file) is called. If the user approved the request, acknowledgeFullBackupOrRestore() figures out if the backup is encrypted, and passes a message to BackupHandler (same file.) BackupHandler then instantiates and kicks off a PerformAdbBackupTask (same file, line 4004 as of time of writing)
We finally start generating output there, in PerformAdbBackupTask.run(), between line 4151 and line 4330.
First, run() writes a header, which consists of either 4 or 9 ASCII lines:
"ANDROID BACKUP" - the backup format version: currently
"4" - either
"0" if the backup is uncompressed or "1" if it is - the encryption method: currently either
"none" or "AES-256" - (if encrypted), the "user password salt" encoded in hex, all caps
- (if encrypted), the "master key checksum salt" encoded in hex, all caps
- (if encrypted), the "number of PBKDF2 rounds used" as a decimal number: currently
"10000" - (if encrypted), the "IV of the user key" encoded in hex, all caps
- (if encrypted), the "master IV + key blob, encrypted by the user key" encoded in hex, all caps
The actual backup data follows, either as (depending on compression and encryption) tar, deflate(tar), encrypt(tar), or encrypt(deflate(tar)).
TODO: write up the code path that generates the tar output -- you can simply use tar as long as entries are in the proper order (see below).
Tar archive format
App data is stored under the app/ directory, starting with a _manifest file, the APK (if requested) in a/, app files in f/, databases in db/ and shared preferences in sp/. If you requested external storage backup (using the -shared option), there will also be a shared/ directory in the archive containing external storage files.
$ tar tvf mybackup.tar -rw------- 1000/1000 1019 2012-06-04 16:44 apps/org.myapp/_manifest -rw-r--r-- 1000/1000 1412208 2012-06-02 23:53 apps/org.myapp/a/org.myapp-1.apk -rw-rw---- 10091/10091 231 2012-06-02 23:41 apps/org.myapp/f/share_history.xml -rw-rw---- 10091/10091 0 2012-06-02 23:41 apps/org.myapp/db/myapp.db-journal -rw-rw---- 10091/10091 5120 2012-06-02 23:41 apps/org.myapp/db/myapp.db -rw-rw---- 10091/10091 1110 2012-06-03 01:29 apps/org.myapp/sp/org.myapp_preferences.xml
Encryption details
- An AES 256 key is derived from the backup encryption password using 10000 rounds of PBKDF2 with a randomly generated 512 bit salt.
- An AES 256 master key is randomly generated
- A master key 'checksum' is generated by running the master key through 10000 rounds of PBKDF2 with a new randomly generated 512 bit salt.
- A random backup encryption IV is generated.
- The IV, master key, and checksum are concatenated and encrypted with the key derived in 1. The resulting blob is saved in the header as a hex string.
- The actual backup data is encrypted with the master key and appended to end of the file.
Sample pack/unpack code implementation (produces/uses) tar archives: https://github.com/nelenkov/android-backup-extractor
Some more details here: http://nelenkov.blogspot.com/2012/06/unpacking-android-backups.html
Perl scripts for packing/unpacking and fixing broken archives:
http://forum.xda-developers.com/showthread.php?p=27840175#post27840175