Skip to main content
4 of 7
edited body
Pieter Wuille
  • 113.6k
  • 10
  • 208
  • 327

There are some misunderstandings here. First, transactions are hashed in several ways for different purposes, but actually they are kept in the original in the blockchain: As you said, this is needed to check the blockchain later, or to provide the information to other participants for them to catch up with the network.

So, when and to what purpose are they hashed, then?

Transaction Identifier (txid)

Standard transactions range in size from roughly 192 bytes to 100,000 bytes, although the smallest possible transaction is 61 bytes, and the largest one ever seen was 999,657 bytes. Anyway, I think we can agree that some of them can be unwieldy to send around in full, just to reference them. ;)

This is where transaction IDs come in. The transaction ID is the digest of performing the SHA-256 hashing function twice on the serialized transaction. The resulting digest or hash is always 256 bits (hence the name), which can be represented with 32 bytes. Using the txid to reference transactions is obviously much less bandwidth intensive which is why this is how peers exchange information about transactions: when a peer learns about a new transaction, this will be in form of an INVENTORY message that presents him a list of txids, to which the node may respond by requesting the unseen transactions with the corresponding txids.

Merkle tree

Transactions in a block have a fixed order. This can be used to create a Merkle tree from the transactions. Merkle trees are useful in that they allow lightweight nodes to confirm the membership of a transaction in a block without having full knowledge of the block's content. The Merkle tree is created by hashing transactions pairwise recursively until only a single hash results. This Merkle root can be used to represent all transactions in a block and prevents anyone from changing the block's content.

Proof of work

A third application of hashing in Bitcoin is the consensus mechanism 'proof of work'. Bitcoin's central innovation is using the blockchain to create consensus about the order of transactions in the system. To that end, miners collect unconfirmed transactions into block templates for which they then evaluate whether they resolve to a valid block. These block templates consist of a block header (which among other information collects the aforementioned Merkle root) that is subjected to a doubly applied SHA-256 function. If the result surpasses the difficulty, a new valid block has been found.

Addresses

While private key and public key together make a point on an ECDSA curve, the address is actually also derived by hashing: It is a RIPEMD-160 hash of the public key. This allows the public key to remain unknown until received money is spent again.

The point of hashing

As we can see, while addresses are indeed kept in full in the blockchain, hashing allows us to:

  • Save bandwidth for transaction relay
  • Verify transactions without knowing full blocks
  • Limit block discovery and introduce digital scarcity
  • Protect sensible information, while proving our possession of it
Murch
  • 79.6k
  • 36
  • 193
  • 667