Verifying a Block

In this lesson, we are going to deep dive into one component of the blockchain, and discover how you can verify, for yourself, a mined block. This gives you assurance that the blocks on the blockchain are legitimate, and that the transactions are appropriately mined. If you do this for all blocks in the blockchain, you can be sure that all transactions in the chain are fair. Open auditing is a great feature!

You'll need to first know what a blockchain is, and what a hash function is.

There are two steps we will cover, over two lessons. In the first lesson (this one), we will cover how to verify the block from the top-level information, specifically the information discoverable in the headers of the blockchain. The main critical part that will be missing from our analysis is verifying the transaction's merkle tree, which we will deal with in the second lesson (coming soon!). For now, we will assume the merkle tree root value is correct. In short, the merkle tree root is a hash of the list of transactions, allowing us to verify the transactions, their order, and their values. If a single transaction in a merkle tree is changed, the root will change, allowing us to see something has been tampered with. (By the way, this has the flow on affect of meaning the nonce doesn't work for the block, and therefore is it not correctly mined.)

Useful Python functions

We will be using Python 3 in this lesson. If you are more familiar with another programming language, the general concepts will transfer over, but I will be talking about some specific python features as well. If you use Python 2, I recommend using Python 3, as it makes the data types more explicit and its more obvious what exactly is happening at each step (but it is a bit more verbose).

First, a slight detour. Python 3 makes a distinction between a bytes object and a string object. A string is a list of character, such as the text you are reading now. A bytes object is a sequence of 1s and 0s. We can choose to represent a bytes object as a string object, which we do with a mapping called an encoding. You may be familiar with ascii, which maps the numbers 0 to 255 to the English characters, punctuation and some other control characters (such as space and the newline character). With this encoding, we can take a singe byte, map it to its character, and show that character in a process known as decoding.

Another encoding is hexadecimal. It takes a four bit sequence and maps it to the characters 0123456789ABCDE, in that order. The bit sequence 0010 maps to the character 2. The sequence 0111 maps to 7, while 1011 maps to B (it's the decimal number 11 and B is the eleventh character in that list).

That's all a little theoretical, let's have a look at it in practice.

s = "I am a string"
b = s.encode("ascii")

print([bb for bb in b])
print([bin(bb) for bb in b])

The first line creates a simple string. The second decodes it into a bytes object. If you print(b), the result will look like the string, except for a b in front of the opening quote. Instead, its just a sequence of 1s and 0s in memory, and when we print it, it uses ascii to represent it. The third line shows a bit clearer what the data structure is like, a sequence of numbers on disk, which are just ones and zeros as shown on by the fourth line of code.

Representing the same integers as hexadecimal in Python 3 requires the binascii library:

import binascii

h = binascii.hexlify(b)
print(h)
print(len(h))

print(int(h[:2], 16))

The output from this shows the hexademical representation of the same numbers. Note they are in "pairs", which is why there are twice as many hexadecimal numbers as letters in the string. The first pair, "49" is the hexadecimal representation of the number 73, which is converted in the last line of the above code block.

Bitcoin, and most blockchains, share their data either in a raw binary format or in hexadecimal format, which you'll be familiar with already, of very shortly after starting to use blockchains. Let's now have a look at a block in detail.

Block 123

You can view information about this block on blockchain.info, but I'll copy the relevant information to this page. The hash of this block is 00000000000000001e8d6829a8a21adc5d38d0a473b144b6765798e61f98bd1d, which matches the general criteria of "starting with lots of zeros". One small issue is that bitcoin puts its hashes "backwards" by pairs, compared to Python (and many other implementations). This is known as the endianess of the output, and is a topic for another day. The same hash in a "python friendly format" is 1dbd981fe6985776b644b173a4d0385ddc1aa2a829688d1e0000000000000000.

To convert between two strings with hexademical formats, here is a useful function.

def swap_endianess(h):
    if h[:2] == "0x":
        h = h[2:]  # If the hash starts with "0x", remove it
    return str.join("", [h[i:i+2] for i in range(len(h), -1, -2)])

If you aren't familiar with python, this code splits the string into chunks of length two, reverses the order, and joins them back together. You can call it like this:

new_hash = swap_endianess("00000000000000001e8d6829a8a21adc5d38d0a473b144b6765798e61f98bd1d")
print(new_hash)

Finally, remember what we said about bytes objects? If you actually want to use this in our later code, i.e. to compute the hash-of-the-hash, you need to convert to a bytes object.

g = new_hash.encode("ascii")
print(g)

The type of g is a bytes object (try print(type(g))), and can be put into a hashing function (strings cannot), like this:

import hashlib

h = hashlib.sha256(g)
print(h.hexdigest())

Verifying a Block

Now that the necessary Python is out of the way, let's have a look at how we can use those functions and concepts to verify a block from top-level information. You can see information about the schema on bitcoin.org, but note that it may change in the future. Bitcoin is consensus, not formality, and if the general consensus decides to change, then the format changes.

The linked site specifies the data needed, and the format its in. The main pieces of the puzzle are these:

  • Version number, as a 32 bit (4 byte) value
  • Previous block hash, as a 32 byte value
  • Merkle root hash, as a 32 byte value
  • Time, as a 4 byte number
  • nBits, the target threshold for the block as a 4 byte number
  • nonce, the value that needs to be mined as a 4 byte number

Let's grab the relevant information from the block and then work on changing the format, one-by-one, to the format needed to verify this block's hash.

version = 1
previous_block_hash = "00000000000008a3a41b85b8b29ad444def299fee21793cd8b9e567eab02cd81"
merkle_root = "2b12fcf1b09288fcaff797d71e950e71ae42b91e8bdb2304758dfcffc2b620e3"
time = "2011-05-21 17:26:31"
bits = 440711666
nonce = 2504433986

First, we convert the version number to a four byte hexadecimal number. If we just use the hex function, it will not pad it out to four bytes:

hex(1)  # prints "0x01"

Instead, we can represent it ourselves like this:

version = "{0:#0{1}x}".format(version,4)

Then, we use our function to change the endianess:

# Before: 0x000001
version = swap_endianess(version)
# After: 010000

Next, we convert the two hash values to the required bitcoin format, by swapping the endianess:

previous_block_hash = swap_endianess(previous_block_hash)
merkle_root = swap_endianess(merkle_root)

The next value is the time, which is a little tricky. The string v is just a representation of the time, but we need to find out the timestamp, which is the number of seconds since January 1st, 1970. Timestamps are useful, as they let us represent dates and times as a single integer. This code parses the timestamp:

from dateutil import parser
datetime_object = parser.parse(time)


This code convert it into a timestamp, we work out how many seconds since the 1st of January, 1970 (GMT/UTC).

from datetime import datetime
timestamp_as_int = int((datetime_object - datetime(1970, 1, 1)).total_seconds())

We then convert to hexadecimal and swap the endianess, as we did with the version number:

time = swap_endianess("{0:#0{1}x}".format(timestamp_as_int,8))

The last two values are the required data are just two numbers, and we now know how to convert those:

bits = swap_endianess("{0:#0{1}x}".format(bits,8))
nonce = swap_endianess("{0:#0{1}x}".format(nonce,8))

Each of these now needs to be encoded as bytes objects, and then concatenated together (joined) to form a long bytes object.

version = version.encode("ascii")
previous_block_hash = previous_block_hash.encode("ascii")
merkle_root = merkle_root.encode("ascii")
time = time.encode("ascii")
bits = bits.encode("ascii")
nonce = nonce.encode("ascii")

header_as_bytes = version + previous_block_hash + merkle_root + time + bits + nonce

Finally, we take all this data, convert to bits, and then compute the SAH256 hash twice, which gives us:

import hashlib

block = binascii.unhexlify(header_as_bytes)

round1 = hashlib.sha256(block).digest()
round2 = hashlib.sha256(round1).hexdigest()
verify = swap_endianess(round2)

We can then verify that the result is the same:

assert verify == "00000000000000001e8d6829a8a21adc5d38d0a473b144b6765798e61f98bd1d"

We are done! If you got an AssertionError, check your code and try again. If nothing bad happened, then the hashest match, and you have verified a block!

Sign up to be notified when new lessons arrive!

* indicates required

Published: by