Merkle Trees and the Taproot Protocol

November 15, 2024 • 9 min read • by Jose Storopoli, PhD

Math Equations

This post has KaTeX enabled, so if you want to view the rendered math formulas, you'll have to unfortunately enable JavaScript.

Dedicated to John Peter, since I was tired of having to explain this to him every time we met.

This post gives an intuition to the Taproot protocol in Bitcoin, specifically how Merkle trees are used to hide the complexity of several possible spending conditions.

Taproot

Taproot was activated as a soft fork in the Bitcoin network on November 2021.

The design goals of Taproot are:

Increase privacy: hide the spending conditions and also hide the fact that you are using a multisig.
Reduce the amount of data on-chain: you only need to commit to the root of a Merkle tree, and not the leaves.
Use Schnorr: Schnorr signatures are more efficient and allow for signature aggregation.

My focus is on the middle point: how to use Merkle trees to hide the complexity of the spending conditions. I’m not gonna cover Schnorr signatures here, but you can check conduition’s excellent post on Schnorr signatures.

So let’s start with Merkle trees.

Merkle Trees

A Merkle tree is a binary tree where the leaves are the data and the internal nodes are the hash of their children. The root of the tree is called the Merkle root.

Here’s an example:

              root
             /    \
            /      \
           /        \
          /          \
         /            \
        /              \
    H(0 | 1)        H(2 | 3)
      /   \           /    \
     /     \         /      \
    0       1       2        3

In the picture above, the leaves are the numbers 0, 1, 2, and 3. Consider these as data that you want to commit to. We construct the tree by hashing (applying the hash function $H$) the leaves and then concatenating the hashes, and hashing the result until we reach the root.

Merkle Trees as Commitment Schemes

In cryptography, we have something called a commitment scheme.

A commitment scheme allows you to commit to a value without revealing it. This property is called hiding. Commitment schemes are designed so that a party cannot change the value or statement after they have committed to it. This property is called binding.

The classical example is a hash function. Say you have a value $x$ and you want to commit to it. You can hash $x$ and send the hash to the other party. In the future, you can reveal $x$ and the other party can hash it and check if it matches the hash you sent.

This is a commitment scheme because you cannot know the value of $x$ by looking at the hash. Hence, it is hiding. And you cannot change the value of $x$ without changing the hash, hence it is binding.

However, this is a commitment scheme for a single value. What if you have multiple values you want to commit to? This is where Merkle trees come in.

Merkle trees are commitment schemes. You commit to the root of the tree, and you can prove that a leaf is in the tree by revealing the path from the leaf to the root.

It is hiding because you cannot know the value of a leaf by looking at the root. And it is binding because you cannot change the value of a leaf without changing the root.

Note that the inclusion proof is logarithmic in the number of leaves, hence the complexity of the inclusion proof is $O(\log n)$, where $n$ is the number of leaves, or the depth of the desired leaf in the tree. This makes Merkle trees a very efficient commitment scheme.

Taproot and Merkle Trees

Now that we understand Merkle trees, let’s see how they are used in Taproot. The anatomy of a Pay-to-Taproot (P2TR) address is as follows:

Internal key: the public key of the owner.
Merkle root: the root of the Merkle tree of spending conditions.

These are also called the key path and the script path, respectively. You can find more about the Taproot soft fork in the BIP 341 that describes Taproot spending rules.

Note that there are ways to tweak the internal key that I will not cover here for simplicity. They are mainly used to disable the key path in a verifiable way and force the spending to only use script path conditions. Again, check BIP 341 for more details.

Here’s an example of a Taproot address:

               +------+
               | P2TR |
               +------+
                  |
         +--------+--------+
         |                 |
     +-----------+   +-------------+
     | Internal  |   | Root of the |
     |   Key     |   |  Merkle Tree|
     +-----------+   +-------------+
                         |
               +---------+---------+
               |                   |
           +-------+           +-------+
           |   S1  |           |  Node |
           +-------+           +-------+
                                   |
                           +-------+-------+
                           |               |
                       +-------+       +-------+
                       |   S2  |       |   S3  |
                       +-------+       +-------+

Here we can see that we have the internal key and the root of the Merkle tree. The internal key is the key path, and the Merkle tree is the script path. If you want to spend from this address, you can either use the internal key or any of the spending conditions $S_n$ that are leaves in the Merkle tree.

Let’s focus in the spending conditions $S_n$. We have 3 conditions in the example above. These are vanilla Pay-to-(Witness)-Script-Hash P2SH scripts, so you can have multisig, timelocks, etc. in these conditions. P2SH scripts are not revealed on-chain, you just commit to the hash of the script. They are only revealed when you spend from the address, where you need to reveal the script and Bitcoin consensus will not only check if the script is correct, but also that it matches the hash committed.

Yes, P2SH is a commitment scheme. It is hiding because you cannot know the script by looking at the hash. And it is binding because you cannot change the script without changing the hash.

In a Merkle tree, it takes $O(\log n)$ space to prove inclusion, where $n$ is the depth of the leaf that we want to prove, we order the leaves in the tree in such a way that the most likely conditions are closer to the root.

In this case we have $S_1$ as the most likely condition, and $S_2$ and $S_3$ as less likely conditions.

Suppose you want to spend from the address using $S_2$. How would you prove that $S_2$ is in the tree? Well, you need to reveal the path from $S_2$ to the root. This entails revealing the hash of the sibling of $S_2$, that is the hash of $S_3$, Ok now we got the “Node” in the picture above, but we still need to reveal the hash of the sibling of “Node”, that is $S_1$. This is enough to prove that $S_2$ is in the tree. See that we had to reveal the hashes of $S_1$ and $S_3$, since $S_2$ has depth $n = 3$ in the tree it took $\lceil O(\log 3) \rceil = 2$ steps to prove inclusion.

Now, suppose you want to spend from the address using $S_1$. Same thing, you need to reveal the path from $S_1$ to the root. This is easily done with just revealing the hash of “Node”. So a single operation is enough to prove inclusion. This is due to the fact that $S_1$ has depth $n = 2$ in the tree, hence it took $\lceil O(\log 2) \rceil = 1$ step to prove inclusion.

This is the beauty of Merkle trees.

Contrast this with other script addresses formats such as P2SH. In P2SH, you are only tied to a single script. You could have a bunch of nested IFs in the script, to emulate the same behavior as the Merkle tree, but good luck paying the fees for that monstrous script when you want to spend from the address.

Why is this useful?

I work at Alpen Labs, where we are developing Strata, a BitVM-based bridge for Bitcoin. To put it simply, BitVM is a computing paradigm to express Turing-complete Bitcoin contracts.

BitVM was only possible due to the Taproot soft fork. Before we dive into details, just one minor detail about Merkle trees in Taproot: they can have a maximum depth of 128. This means that you can have up to $2^{128}$ spending conditions. And each of this spending conditions is a script that follows the Bitcoin consensus rules. Mostly important of these is that the transaction size must be less than 4MB. So, you can have a Taproot address that encodes a Turing-complete contract with up to $2^{128}$ clauses. And each of these clauses can be a complex script up to 4MB in size. Hence, we can hide the complexity of a Turing-complete contract in a single Taproot address. This allows us to encode $2^{128} \cdot 4\text{MB}$ of data which is more than the estimated data content of the surface web, according to wolframalpha.

More specifically, we can encode a gigabyte-sized Groth16 verifier in Bitcoin script as a Taproot address by splitting the execution of the verifier into 4MB chunks and encoding each chunk as a spending condition as a leaf in a Taproot Merkle tree. And we can pass state between these chunks by using one-time signatures, such as Lamport Signatures. This involves encoding all the ellyptic curve operations and pairings required by the Groth16 verifier along with a way to express Lamport signature verification in Bitcoin script. But this is a topic for a future post.

If you want to know more about how to encode a Groth16 verifier using Bitcoin script, check the alpenlabs blog.

Jose Storopoli, PhD