We have already covered the basics of Blockchain in our previous post. Blockchain is a digital distributed ledger where data in each block is immutable and the blocks are ordered by timestamp. Today, the concept of Blockchain is being applied to not just hundreds of crypto-currencies but also to many use-cases in the Banking and Finance industry and each has its own implementation. In fact Blockchain has been extended to every industry -supply chain management, real estate, retail to name a few. In my opinion, one should always start with Bitcoin, to learn a practical implementation of Blockchain. Bitcoin is the first and time-tested implementation used by millions. Rest all blockchain implementations are still in proof of concept phase.
Bitcoin’s Block Structure
To learn more on Bitcoin read one of my initial post. Every block in the Bitcoin network has the exact same structure as per the above diagram. Each newly created block is ‘chained’ to the last added block of the blockchain and stores its digital finger print. Let us examine the fields of a Block -
- Magic number (4 bytes): This is an identifier for the Blockchain network. It has a constant value of 0xD9B4BEF9. It indicates a) Start of the block b) Data is from production network. You can read more on this concept on wiki.
- Block size(4 bytes): Indicates how large the block is. Since the very beginning till as of today (Dec 2016) each block is fixed to 1 MB. However a proposal might soon have the consensus of the core development team (who can change protocol rules) and this will be increased to 2 MB. The maximum capacity is 2 GB so scalability factor has already been taken care of.
- Version (4 bytes): Each node running the Bitcoin protocol has to implement the same version and it is mentioned in this field.
- Previous block hash (32 bytes): This is a digital fingerprint (hash) of the block header of the previous (last added) block of the blockchain. It is calculated by taking all the fields of the header (version, nonce etc) together and applying a cryptographic function (SHA-256) twice by rearranging the bytes of the individual fields (Little-endian format). You can check the technical details at Bitcoin wiki.
- Merkle Root (32 bytes): We will see this in the next section of the blog post
- Timestamp (4 bytes), Difficulty Target (4 bytes), Nonce (4 bytes): We will see them in the next article on Bitcoin mining
The Block header is composed of the fields from Version to Nonce
- Transaction Counter (Variable: 1–9 bytes) : This is the count of transactions that are included with the block
- Transaction List (Variable: Total block size is 1 MB): Stores the digital finger-print of all the transactions in that block. Each individual transaction has its own structure but we will cover Bitcoin Transactions as a separate topic in future.
You can check the fields mentioned in human readable format for a random block (# 442424 where 442424 is the height of the block aka count of block since the first block was created) and genesis block (i.e the first block that was mined)
Merkle Root
As we have already seen in the above section, each block contains a list of summary of all transactions. Once the block is part of the blockchain it is an immutable record i.e the transaction entry in it is permanent. It also means that if one transaction is present in one block it will not be present in any other block of the blockchain. The transactions are listed as merkle tree or a binary hash tree. It is a very popular data structure used in programming languages.
The root of the tree is the topmost node and hence this tree is represented upside down. The bottommost nodes are called as leaf nodes. Each node is simply a cryptographic hash of a transaction. In the above diagram, Transaction A,B,C,C form the leaves of the tree.
You might be wondering that why is Transaction C repeated. Well, 2 consecutive nodes form 1 parent node. Hence the total number of leaves should always be even and if that is not the case then the last leaf is repeated twice.
The merkle tree does not contain a list of all the transactions, rather a hash (digital fingerprint) of all transactions as a tree structure.
Hash of Transaction A = Hash[Tx(A)] = SHA256 (SHA 256 (Transaction A))
Each hash is calculated by applying the SHA256 algorithm twice.
Similarly to construct a parent node Hash(AB). The 32 byte Hash[Tx(A)] and 32 byte Hash[Tx(B)] is concatenated as a 64 byte hash string and then SHA256 is applied twice to give a 32 byte Hash(AB).
This concept can be further expanded to any size. The biggest advantage is that it is very easy and highly efficient to determine whether a particular transaction has been included within a block (since the block contains the merkle root — which is digital fingerprint of all transactions contained in it)
What is SHA256?
SHA stands for Secure Hash Algorithm. This is used to prove data integrity. The same input(s) will always produce the exact same output. This output is always 256 bits or 32 bytes in length regardless of the length of the input (even if input is millions of bytes).
As an analogy consider this. We know that 2 atoms of Hydrogen and 1 atom of Oxygen always gives one molecule of water. Now assume that from different chemical processes, we obtain 2 atoms of Hydrogen and 1 atom of Oxygen separately. If we have to prove that indeed we have obtained the expected result then we just have to mix the atoms and if we get 1 molecule of water, then our results were right.
Any change in the input(s) will result in a change of output. The same output can never be derived from different input(s).
However, from the output we can never determine the inputs. That is why this is the highly secure. As an example consider you have 3 jars of paint — red, blue and green. The combination of them will create countless colors. However, if you have been given single jar which has a color formed after mixing the 3 jars then there is no way you can find out the exact proportion of red, blue and green by just looking at it.
Bitcoin’s Blockchain
I have been stressing on how Bitcoin implements Blockchain. The reason is that the concepts of Blockchain can be implemented in different ways. Blockchain implementations can be categorized based on many parameters some of which are — consensus mechanism (i.e how non-trusted peers validate each other), public/private, mining involved (Yes/No), permission (anyone can join Y/N), authority (all peers have same level of authority) etc.
Bitcoin has its own implementation method.
We have already covered the structure of each individual Block in the Bitcoin Blockchain. The very first block created is called as the Genesis block. You can find its details here. The field of ‘previous block hash’ is 32 byte string of zeros. It has just 1 transaction. However, as the blocks continue to add up one by one the transaction list will increase till it reaches the maximum size of 1 MB. Then all blocks will have variable transactions but approximately block size will be 1 MB. Every time a new block is generated then it is appended to the last added block of the blockchain. Of course a crucial step of ‘Mining’ is still pending and we will cover it in our next blog.
Forking
There always exists one and only one path from the last added block to the genesis block (first block) in the network. The reverse is also true, but multiple paths exists of which only 1 is valid. When 2 blocks are created around the exact same time then only 1 is accepted by the network . In the above figure, Block with height = 2 has 2 children. But finally only 1 is accepted. So a block will always have only 1 parent but may have multiple children (temporarily). Eventually ,the ‘orphan’ blocks are found out and its transactions are picked up for another block and hence each transaction gets a chance to be included in the network.
Prakash M
BTC:1LpjDuu3jzJivUX8ejNJXHaxPzdhKBWDZD
ETH:0x5eb2EBc0DD9b2570Fea8e11a7d3fdD166FbfD0D0
BCH:qrvhztuwfw8eql4xswkfh7w6m32mamhdgu2ndflv2a
0 comments:
Post a Comment