42

I am wondering if I can use node.js and levelup to access a copy of the blockchain database directly.

But from what I can see, I need to know the name of the key(s) before I can get any data, as this is used in the get method of levelup.

However, I have not been able to find out anything about the possible key(s) associated with the values in the key value pairs, so I'm at a loss as to how I might retrieve the data.

Firstly, what are the keys in the key value pairs, and secondly is it possible just to select the first n records?

5 Answers 5

59

Bitcoind since 0.8 maintains two databases, the block index (in $DATADIR/blocks/index) and the chainstate (in $DATADIR/chainstate). The block index maintains information for every block, and where it is stored on disk. The chain state maintains information about the resulting state of validation as a result of the currently best known chain.

Inside the block index, the used key/value pairs are:

  • 'b' + 32-byte block hash -> block index record. Each record stores:
    • The block header.
    • The height.
    • The number of transactions.
    • To what extent this block is validated.
    • In which file, and where in that file, the block data is stored.
    • In which file, and where in that file, the undo data is stored.
  • 'f' + 4-byte file number -> file information record. Each record stores:
    • The number of blocks stored in the block file with that number.
    • The size of the block file with that number ($DATADIR/blocks/blkNNNNN.dat).
    • The size of the undo file with that number ($DATADIR/blocks/revNNNNN.dat).
    • The lowest and highest height of blocks stored in the block file with that number.
    • The lowest and highest timestamp of blocks stored in the block file with that number.
  • 'l' -> 4-byte file number: the last block file number used.
  • 'R' -> 1-byte boolean ('1' if true): whether we're in the process of reindexing.
  • 'F' + 1-byte flag name length + flag name string -> 1 byte boolean ('1' if true, '0' if false): various flags that can be on or off. Currently defined flags include:
    • 'txindex': Whether the transaction index is enabled.
  • 't' + 32-byte transaction hash -> transaction index record. These are optional and only exist if 'txindex' is enabled (see above). Each record stores:
    • Which block file number the transaction is stored in.
    • Which offset into that file the block the transaction is part of is stored at.
    • The offset from the start of that block to the position where that transaction itself is stored.

Inside the chain state database, the following key/value pairs are stored:

  • 'C'+ 32-byte transaction hash + output index length + output index (v0.15 onwards) -> A single unspent transaction output (UTXO) record. Each record contains information about the UTXO at the specified output index of the given transaction. This information consists of:
    • Whether the transaction was a coinbase or not.
    • Which height block contains the transaction.
    • The scriptPubKey and amount for this unspent output.
  • 'c' + 32-byte transaction hash (pre-v0.14) -> unspent transaction output record for that transaction. Unlike 'C', this entry represents all UTXOs from a single transaction. These records are only present for transactions that have at least one unspent output left. Each record stores:
    • The version of the transaction.
    • Whether the transaction was a coinbase or not.
    • Which height block contains the transaction.
    • Which outputs of that transaction are unspent.
    • The scriptPubKey and amount for those unspent outputs.
  • 'B' -> 32-byte block hash: the block hash up to which the database represents the unspent transaction outputs.

Latest version of bitcoind(please add version compatibility) uses obfuscation of the value in key/value pair . So you need to XOR with the obfuscation key to get the real value.

I won't go into the specific serialization details of the particular records. They're often specially designed to be compact on disk, and not really intended to be easily usable by other applications (LevelDB doesn't support concurrent access from multiple applications anyway). There are several RPC methods for querying data from the databases (getblock, gettxoutsetinfo, gettxout) without needing direct access.

As you can see, only headers are stored inside this database. The actual blocks and transactions are stored in the block files, which are not databases, but just raw append-only files that contain the blocks in network format.

As to your second question: what is n? If you just want to access some records, sure, iterate over the keys and stop when you've read enough.

3
6

OK I know I shouldn't really answer my own question but... in the absence of a response to this question, I did a bit of hunting.

Github provided the answer in a file found in the bitcoin-leveldb repository.

The path to a text file containing the information leveldb->doc->table_format.txt

In short, there is no easily describable table like structure. There are several, shall I call them "nested" structures in the database including the fact that the database physically stores the data in seperate logical files.

Here is the table_format.txt file contents as of this post.

File format

 <beginning_of_file> [data block 1] [data block 2] ... [data block N] [meta block 1] ... [meta block K] [metaindex block] [index block] [Footer] (fixed size; starts at file_size - sizeof(Footer)) <end_of_file> The file contains internal pointers. Each such pointer is called a BlockHandle and contains the following information: offset: varint64 size: varint64 See https://developers.google.com/protocol-buffers/docs/encoding#varints for an explanation of varint64 format. (1) The sequence of key/value pairs in the file are stored in sorted order and partitioned into a sequence of data blocks. These blocks come one after another at the beginning of the file. Each data block is formatted according to the code in block_builder.cc, and then optionally compressed. (2) After the data blocks we store a bunch of meta blocks. The supported meta block types are described below. More meta block types may be added in the future. Each meta block is again formatted using block_builder.cc and then optionally compressed. (3) A "metaindex" block. It contains one entry for every other meta block where the key is the name of the meta block and the value is a BlockHandle pointing to that meta block. (4) An "index" block. This block contains one entry per data block, where the key is a string >= last key in that data block and before the first key in the successive data block. The value is the BlockHandle for the data block. (6) At the very end of the file is a fixed length footer that contains the BlockHandle of the metaindex and index blocks as well as a magic number. metaindex_handle: char[p]; // Block handle for metaindex index_handle: char[q]; // Block handle for index padding: char[40-p-q]; // zeroed bytes to make fixed length // (40==2*BlockHandle::kMaxEncodedLength) magic: fixed64; // == 0xdb4775248b80fb57 (little-endian) 

"filter" Meta Block

If a "FilterPolicy" was specified when the database was opened, a filter block is stored in each table. The "metaindex" block contains an entry that maps from "filter.<N>" to the BlockHandle for the filter block where "<N>" is the string returned by the filter policy's "Name()" method. The filter block stores a sequence of filters, where filter i contains the output of FilterPolicy::CreateFilter() on all keys that are stored in a block whose file offset falls within the range [ i*base ... (i+1)*base-1 ] Currently, "base" is 2KB. So for example, if blocks X and Y start in the range [ 0KB .. 2KB-1 ], all of the keys in X and Y will be converted to a filter by calling FilterPolicy::CreateFilter(), and the resulting filter will be stored as the first filter in the filter block. The filter block is formatted as follows: [filter 0] [filter 1] [filter 2] ... [filter N-1] [offset of filter 0] : 4 bytes [offset of filter 1] : 4 bytes [offset of filter 2] : 4 bytes ... [offset of filter N-1] : 4 bytes [offset of beginning of offset array] : 4 bytes lg(base) : 1 byte The offset array at the end of the filter block allows efficient mapping from a data block offset to the corresponding filter. 

"stats" Meta Block

This meta block contains a bunch of stats. The key is the name of the statistic. The value contains the statistic. TODO(postrelease): record following stats. data size index size key size (uncompressed) value size (uncompressed) number of entries number of data blocks 
4
  • This explains what LevelDB's internal storage format for tables it. It doesn't answer the question, which is which key/value pairs are stored in it by bitcoind. Commented Jul 9, 2014 at 17:12
  • @PieterWuille this is the closest I could find. Are you aware of an answer? Commented Jul 9, 2014 at 20:14
  • See txdb.cpp in the source code :) Commented Jul 9, 2014 at 20:17
  • @PieterWuille if you make this an answer I will change the tick. Commented Jul 9, 2014 at 21:01
2

This post is pretty old and is really a reference to many, as it's been to me, but it took me quite a while to grasp it totally.
So I wrote this blog post to clarify certain points, particularly the key parameter endianness: https://imil.net/blog/posts/2020/bitcoin-leveldb-debugging/

One specific point is that the accepted reply states that the unspent transaction output record for that transaction is a format of c+32 bytes transaction hash, when it is really C (capital C, i.e. 0x43) instead.

1
  • The difference is due to a change in version 0.15 onwards. In v0.14 and prior, the UTXO database operated as per-transaction rather than per-output. Thus, a c record represents all that transaction's outputs in a pre-upgrade DB, while C records in newer databases store individual transaction outputs. I have updated the accepted answer with this. Commented Dec 12, 2021 at 5:22
1

I have been googling around a bit and keep seeing the following:

'b' + 32-byte block hash -> block index record. Each record stores: The block header. The height. The number of transactions. To what extent this block is validated. In which file, and where in that file, the block data is stored. In which file, and where in that file, the undo data is stored

This is fantastic...thanks..but...

What is the structure of this data?? How many bytes is 'block header'? How many bytes is 'the height', etc. Are these 32 bit integers? 64 bit? Are they big-endian, little-endian?

What would it look like, say, as a C style struct using stdint formats?

0
import binascii BLOCK_HAVE_DATA = 8 #!< full block available in blk*.dat BLOCK_HAVE_UNDO = 16 #!< undo data available in rev*.dat def encode_varint(number): # * Variable-length integers: bytes are a MSB base-128 encoding of the number. # * The high bit in each byte signifies whether another digit follows. To make # * sure the encoding is one-to-one, one is subtracted from all but the last digit. # * Thus, the byte sequence a[] with length len, where all but the last byte # * has bit 128 set, encodes the number: # * # * (a[len-1] & 0x7F) + sum(i=1..len-1, 128^i*((a[len-i-1] & 0x7F)+1)) # * # * Properties: # * * Very small (0-127: 1 byte, 128-16511: 2 bytes, 16512-2113663: 3 bytes) # * * Every integer has exactly one encoding # * * Encoding does not depend on size of original integer type # * * No redundancy: every (infinite) byte sequence corresponds to a list # * of encoded integers. # * # * 0: [0x00] 256: [0x81 0x00] # * 1: [0x01] 16383: [0xFE 0x7F] # * 127: [0x7F] 16384: [0xFF 0x00] # * 128: [0x80 0x00] 16511: [0x80 0xFF 0x7F] # * 255: [0x80 0x7F] 65535: [0x82 0xFD 0x7F] # * 2^32: [0x8E 0xFE 0xFE 0xFF 0x00] """Encodes a non-negative integer using the MSB base-128 scheme.""" if number < 0: raise ValueError("Only non-negative integers can be encoded.") result = [] while True: byte = number & 0x7F # Extract lower 7 bits number >>= 7 # Shift right by 7 if number: byte |= 0x80 # Set high bit for continuation result.append(byte) if number == 0: break return bytes(result) def decode_varint(stream): """Decodes a variable-length integer from the MSB base-128 format.""" n = 0 while True: chData = ord(stream.get(1)) n = (n << 7) | (chData & 0x7F) if chData & 0x80: n += 1 else: return n def read_int(stream, bits): data = stream.get(bits//8) data.reverse() return binascii.b2a_hex(data) if bits > 64 else int(binascii.b2a_hex(data), 16) class Stream: '''Class to handle byte stream''' def __init__(self, hexdata): self.data = bytearray(bytes.fromhex(hexdata)) self.data.reverse() def get(self, n): result = self.data[:n] self.data = self.data[n:] return result class BlockHeader: def __init__(self, stream): self.nVersion = read_int(stream, 32) self.hashPrev = read_int(stream, 256) self.hashMerkleRoot = read_int(stream, 256) self.nTime = read_int(stream, 32) self.nBits = read_int(stream, 32) self.nNonce = read_int(stream, 32) class VarintCBlockIndex: def __init__(self, stream): self.nVer = decode_varint(stream) self.nHeight = decode_varint(stream) self.nStatus = decode_varint(stream) self.nTx = decode_varint(stream) self.nFile = decode_varint(stream) if self.nStatus & (BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO) else -1 self.nDataPos = decode_varint(stream) if self.nStatus & BLOCK_HAVE_DATA else -1 self.nUndoPos = decode_varint(stream) if self.nStatus & BLOCK_HAVE_UNDO else -1 if __name__ == '__main__': data_hex = '572fe3011b5bede64c91a5338fb300e3fdb6f30a4c67233b997f99fdd518b968b9a3fd65857bfe78b260071900000000001937917bd2caba204bb1aa530ec1de9d0f6736e5d85d96da9c8bba0000000129ffd98136b19a8e00021d00f0833ced8e' # Usage stream = Stream(data_hex) varint_cblockindex = VarintCBlockIndex(stream) block_header = BlockHeader(stream) # print all data from classes: print('varint_cblockindex:') print('\tnVer = ', varint_cblockindex.nVer) print('\tnHeight = ', varint_cblockindex.nHeight) print('\tnStatus = ', varint_cblockindex.nStatus) print('\tnTx = ', varint_cblockindex.nTx) print('\tnFile = ', varint_cblockindex.nFile) print('\tnDataPos = ', varint_cblockindex.nDataPos) print('\tnUndoPos = ', varint_cblockindex.nUndoPos) print('block_header:') print('\tnVersion = ', block_header.nVersion) print('\thashPrev = ', block_header.hashPrev) print('\thashMerkleRoot = ', block_header.hashMerkleRoot) print('\tnTime = ', block_header.nTime) print('\tnBits = ', block_header.nBits) print('\tnNonce = ', block_header.nNonce) 

I hope this will help...

Key Structure (b + 32-byte block hash):

The key for each block index record begins with the letter 'b' to distinguish it from other types of entries in the database (e.g., transaction index entries might start with 't').

Following the 'b' is the 32-byte (256-bit) hash of the block. This hash serves as a unique identifier for the block.

The block hash is typically represented in little-endian byte order in the database.

Value Structure (Block Index Record):

Each block index record associated with a specific block hash contains a combination of data:

Block Header:

80 bytes in total Contains the following fields: Version (4 bytes) Previous Block Hash (32 bytes) Merkle Root (32 bytes) Timestamp (4 bytes) Bits (difficulty target, 4 bytes) Nonce (4 bytes)

Height:

A variable-length integer (varint) representing the block's height in the blockchain. Indicates how many blocks precede this block in the chain.

Number of Transactions:

A varint indicating the total number of transactions included in the block.

Validation Status:

A varint representing flags that indicate:

  • Whether the block's data is fully available (BLOCK_HAVE_DATA)

  • Whether the block's undo data (for transaction reversal) is available (BLOCK_HAVE_UNDO)

File Location and Position:

If the block's data is available:

  • A varint indicating the file number (e.g., blkXXXXX.dat) where the block data is stored.

  • A varint indicating the byte offset within the file where the block data starts.

If the block's undo data is available:

  • A varint indicating the file number (e.g., revXXXXX.dat) where the undo data is stored.

  • A varint indicating the byte offset within the file where the undo data starts.

Important Notes:

Varints: Bitcoin uses varints for space efficiency. Smaller numbers are encoded with fewer bytes than larger ones. This makes the size of the block index record variable depending on the block height, transaction count, etc.

Data Availability: Not all block data and undo data might be available in the LevelDB index. The validation status flags indicate whether the data is present and where to find it in the actual block files on disk.

Endianness: The block hash and other fields within the block header are stored in little-endian byte order in LevelDB. This means the least significant byte comes first.

Example (Simplified):

Let's say a block index record looks like this:

Key: b<block_hash> Value: <block_header_bytes><varint_height><varint_tx_count><varint_status><varint_file_num><varint_data_pos><varint_undo_file_num><varint_undo_pos>

This record tells you:

The block hash (block_hash) The block header data (block_header_bytes) The block height (varint_height) The number of transactions (varint_tx_count) The block's validation status (varint_status) Where to find the block data and undo data on disk (if available)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.