Blockchain – Principles and Practices
-
Introduction
Course Introduction
Hi. My name is Stephen Haunts, and welcome to my course, Blockchain - Principles and Practices, here at Pluralsight. In this course, we are going to look at the underlying principles and practices that are employed to create a blockchain. We are not going to look at a particular implementation like Ethereum, but we are going to look beneath the covers at the algorithms and data structures that make up a blockchain. This course is broken up like follows. In this module, I'm going to do a quick high-level overview of what blockchain is all about. Then, I'm going to look at some of the cryptographic primitives that we need to implement to blockchain. Don't worry if that sounds scary. The level of cryptographic knowledge to implement a blockchain is actually very easy to understand as you will see. Then I'm going to talk about how we store transactions in a block. We'll build this up slowly in our case study, and I'll show you lots of code demos. Next, we're going to talk about a technique or proof of work. This aims to solve a problem called the Byzantine Generals' problem. Then in the final module of this course, we'll talk about how nodes in a network can maintain consensus between each other. I will then wrap up this course with a summary of what we've achieved with a recap of the key principles and practices discussed. The blockchain is getting a lot of attention at the moment, and I believe that as developers, when you are looking to use a new technology, it's important to have a good grasp of how the underlying technology works. You can then use this information to create your implementation, or you can use a third-party implementation. This course is primarily aimed at developers, but it is not limited to developers only. Whilst this is a technical course, I'll be talking about the principles of blockchain by fully explaining them first and then looking at the code demos. If you are not interested in the code slide, then you can skip those videos, and will still be able to follow along as I explain all the principles and practices up front. While discussing how a blockchain works, I'm going use a fictional company called Globomantics in my examples. As part of their digital transformation, they want to use a blockchain to recall details of settlement payments to their claimants. Globomantics wants to roll this out in their motor vehicle division first, so their implementation will include recording payments to claimants for repairs of damaged cars, stolen cars, and complete settlements for if the car is written off in an accident. Globomantics have joined a consortium of insurance companies who are planning on writing their payments onto a blockchain, which means every business can validate and verify each other's payment details while still maintaining a level of anonymity with the data. This is what it is known as a private consortium blockchain, and we'll discuss that in more detail in a moment. Before we dive into a deeper discussion about the principles and practices of the blockchain, let's first look at what a blockchain is and some of its use cases.
-
Thinking About Trust
So, what is a blockchain? Let's look at some different definitions from different sources. First is by Don and Alex Tapscott who are the authors of the book, Blockchain Revolution. The blockchain is an incorruptible digital ledger of economic transactions that can be programmed to record, not just financial transactions, but virtually everything of value. Now let's consult our good friend Wikipedia. A blockchain is a continuously growing list of records called blocks, which are linked and secured using cryptography. These are two perfect definitions, and highlighting certain words helps your environmental model. First of all, we have digital ledger. So we know this is a ledger like in an accounting system that is backed up with the way of transactions. Next, we have continuously growing list of records. So we know this is a ledger of transactions in an ever-growing list. Hopefully this is forming a good picture in your head. Next, we have the word incorruptible. So we have a continuously growing ledger of transactions that cannot be corrupted. Next, we can also see that these transactions are linked together in some way using cryptography. So we can see that a blockchain is an incorruptible, digital ledger that is a continuously growing list of records, linked and secured using cryptography. These definitions go a long way to covering what a blockchain is. Of course, these are quite simplistic as there is much more to it, but fundamentally, we have a good idea. A continuously growing list of transactions that are linked together and are non-corruptible. This sounds quite appealing, doesn't it? But why do we need this? Well, this comes down to one word, trust. One of the big benefits of blockchains is to get trust on the internet by using decentralization. But what do we mean by this? Let's look at an example from the physical world. Most people have a bank account, and a bank can be seen as a large, monolithic institution that you trust to handle your money and transactions. You may be thinking, what is the big deal? I've been with my bank for years and not had any problems. But that's not really the issue. The thing we are talking about here is that as an individual, we are putting a lot of trust into an organization to manage our money. That is also a huge burden for the bank in question. They have to keep your data safe and secure to help maintain that trust. Because you are putting your trust in one organization, what is stopping them from accidentally losing some of your money as we saw with the collapse of the Icelandic banking system, or someone fraudulently changing or writing transactions on their systems. These are all threats that the banking system has to deal with, and as consumers we trust them to get it right. But they don't always get it right and things frequently do go wrong. Banks have also proven to be single points of failure when they have large IT failures, as has been shown in the past where deployments have taken down transaction processing systems. I was personally affected by that when the bank that I use had a deployment problem after outsourcing their software development, and we couldn't access our accounts or transactions for nearly 7 days. That shows just how reliant we are on these big monolithic organizations. If we take this concept to a higher level than just banks and look at traditional fare currencies like the US dollar, the Euro, or the British pound, these again are very centralized concepts as they are controlled and regulated by governments. Generally this might not pose too much of a problem, but look at what happens with countries like Greece where the economy ran into severe trouble and the government put caps or controls on the banks and stopped people withdrawing their own money. So as we have seen, once we are used to centralized control from organizations, there is another option, and that is to be decentralized. With that, let's take a very quick look at Bitcoin.
-
Bitcoin
Bitcoin is a form of digital currency that is created and held entirely electronically instead of being printed currency. Unlike banks and traditional fare currencies, no one controls or runs Bitcoin. The Bitcoin algorithms control the rate at which new Bitcoins are introduced, and people are incentivized to help maintain and mine new coins by being paid themselves in Bitcoin. Bitcoin is a first in a new category of currencies called cryptocurrencies. Bitcoins can be spent electronically for both digital and physical goods. There are even vending machines that will let you purchase goods by using your mobile phone to pay for products with the Bitcoin that you own. But the most important feature of Bitcoin is that it is entirely decentralized. No one company or government owns or has any control over Bitcoin. So who created Bitcoin? This was a software developer going by the name of Satoshi Nakamoto. This is an assumed identity and today nobody really knows who he is, although there have been many claims. There's also a theory that Nakamoto is a group of people, but at the moment we just don't know. This may change though. Nakamoto originally published a paper in 2008 that described the currency independent of any centralized authority where the money could be transferred and traded digitally virtually instantly. Although Bitcoin is a digital currency, the underlying technology and data structures used are what we know of as a blockchain, and it is this data structure that we are going to explore in more detail in this course. If you wish to learn more about the cryptocurrency Bitcoin and how to use it, then I recommend the course, Introduction to Bitcoin and Decentralized Technology by Scott Driscoll.
-
Introducing Blockchain
So we have just introduced the term blockchain whilst discussing Bitcoin. Before we proceed with the rest of the course, I want to do a high-level introduction to what blockchain is, the different types that can be used, and a few typical applications. This will help you build up a mental model before we go more technical and start looking at implementation. Blockchain has often been talked about as the next revolution in computing, and it's redefining how we interact and transact on the internet. This may sound very grandiose, but as we discuss what a blockchain is and where it can be used, you'll start to see why these claims make sense. At the center of the blockchain we have a record of transactions much like a traditional accounting ledger. These transactions could be a movement of money between people or companies, or it could be any piece of information that is transactional, like the transferring of property deeds or the tracking of movement of inventory between different companies. We'll look at some more of these use cases later in this module. Data stored in a blockchain is designed to be kept in a way that makes it virtually impossible to change the data once it is in the blockchain without being detected by other users. As we discussed earlier, traditional banking transactions are verified by a central bank or authority. Blockchain applications could replace these more centralized systems with these centralized ones where verification comes from the consensus of multiple users participating in the blockchain. So this all sounds great, but how does it work? Let's look at this at a higher level now, and we'll dive into this in more detail within the rest of the course. A blockchain has to do two main things. It needs to gather data or transactions together and put them in blocks. Then those block needs to be chained together securely using cryptography. When the transactions are put onto the block, we use cryptographic hashing or digital fingerprinting to link the transactions together. This means that if any part of the transaction changes, the entire will block will fail verification, which will flag it to the other users. At the top of the block, we have a hash that represents all of the transactions in that block. I'm going to call this the block address. It's these block addresses that we use to chain the blocks together because each block knows who his previous block address is, which is how we form the chain. For a block to be entered in the chain, the person creating the block has to solve a complex mathematical puzzle when calculating the block address. This is called mining the block with proof-of-work. The puzzle was designed to be computationally expensive and has to happen before each block in the chain. Let's say this problem takes 10 minutes to calculate. Now imagine we have 1000 blocks in our chain. So the total time spent calculating all the block addresses is 166 hours. Now imagine someone wants to overwrite the data in the first block. They have to recalculate that block's mining puzzle, which takes 10 minutes. But because all of the other blocks are cryptographically linked, if you change a block, then you need to recalculate the hashes and addresses for the next block and so on and so on. So to change data in block 1 means that you need to spend 166 hours or nearly 7 days recalculating the whole chain. Just imagine how long we are talking if the chain is longer and the puzzle takes longer than 10 minutes. The longer the chain, the more secure it is. When these blocks are added to the chain, a copy is sent to everyone participating in the network. This is how we establish trust because so many people have a copy. Let's say 100 people have copies of the whole chain and someone tries to change one of the blocks. They might add this into their chain, but there will 99 other people who do not agree with their change. Now don't worry too much if you don't completely understand all this now. We are going to explore this in much more detail as we work through the remainder of the course.
-
Private vs. Public Blockchain
Another way of taking a brief look at how blockchain works, I want to talk a little bit about the difference between public and private blockchains. So far we have talked about Bitcoin as a blockchain implementation, and that is very much what we call a public blockchain. Let's look at this definition a little closer. In a public blockchain, anyone can write to the blockchain, and you have a public network of nodes all contributing to the creation and mining of blocks. Every node on the network contains a copy of the blockchain and can verify the full chain. Any internet user with a computer has the ability to set up as a node on the Bitcoin network, get a copy of the Bitcoin ledger, and start mining blocks. Public blockchains offer the best security and trust between peers because everyone can be involved in policing the system. Having so many people taking part in the network, competing to mine blocks with the proof-of-work algorithm is very wasteful though. The complex hashing puzzles are very time consuming to create and require a lot of electricity, which means miners are paid in Bitcoin for creating successful blocks that are added to the chain. The overall benefit is every transaction and generated block is completely public, and the users on each node can maintain anonymity. I think we can all agree that public blockchains offer the best level of trust and security, but this has led to nervousness with some large companies who you want to adopt blockchain techniques because they are not happy about having so much data opened and in the public for anyone to see or verify. The worry of having all this data public has led to the next level of blockchain called private blockchains. There's been a fair bit of controversy around the existence of private blockchains. Blockchain purists, bitcoin advocates, and online activists have maintained that private blockchains are not needed and they don't offer the full anonymity and openness of the public blockchain. Members of different organizations and industries, like financial services and healthcare to name a few, disagree and see the benefits to maintaining an immutable ledger that is private. So exactly how does a private blockchain differ to a public one? In a private blockchain, the company will write transactions, mine, and verify blocks. This means they can be much more efficient as blocks can be mined much quicker than opening it up to a huge network of mining nodes. Whilst running a private blockchain doesn't have the same decentralized security as a public blockchain, trusting the business to maintain their own chain is no worse than trusting it with your current data and transactions. With a private blockchain, the company that owns it can also decide who can read the blockchain transactions or have the ability to verify them. This means they have control over the privacy of the data that is recorded onto the blockchain. This is very important in regulated industries such as financial services and healthcare who have very strict rules they have to adhere to around how visible certain data is. Now that we've looked at the differences between a public and private blockchain, let's look at a few general use cases for blockchains before we start diving in under the covers of this fantastic technology.
-
Blockchain Use Cases
Some people regard blockchain as enabling Web 3.0. By this, I mean that by bringing in the elements of trust and decentralization onto the internet, we can open up lots of other possibilities. Before we move on to the next module, let's take a quick high-level look at some of the use cases for blockchain. The first and most well-known use for blockchain, as we have discussed previously, is digital currencies like Bitcoin. We've already touched on this in this module, but this is the most common use at the moment for blockchain technology. Another exciting use for blockchain, the one I'm particularly excited about, is electronic voting. At the moment, when you vote for a president or a prime minister, you have to go to a polling station, put a piece of paper in a box, and then trust a team of people to count the votes correctly. Even in this day and age, this is a terrible waste of your voting and it can be easily rigged. Using a blockchain, votes for candidates can be registered anonymously from your personal mobile device, and then many nodes on the network can all be verifying and validating the votes so nobody can tamper with the vote data. This could completely revolutionize elections. Another exciting news is that of protecting intellectual property. Digital information is copied and redistributed with ease on the internet. This makes retaining and proving copywrite quite hard. A blockchain can be used to store digital signatures of people's work along with the timestamp in a blockchain, which means there is an immutable proof of ownership for their work. Because we have the timestamp baked into the digital signature, we can prove when a piece of work, whether written work or an image was placed onto the blockchain. Next, we have some great use cases for our financial services companies. Anti-money laundering and know your customer compliance practices for financial services companies have a strong case for being applied to the blockchain. And I know a bit about this one as I am currently at the time of writing this course, building a blockchain-based payment system. Currently, financial companies have to perform labor intensive and expensive processes for each new customer. Know your customer costs could be slashed dramatically for your cross company client verification, where once a person has been identified, the results along with scans of the main identifying documents are placed onto the blockchain and digitally signed. This makes it easier for that person to be verified in the future as their details are easily baked onto the blockchain. By having this data on the blockchain, it can be utilized by many financial companies, but they are all each contributing to the identity data, as well as verifying its integrity. Extending on from the idea of anti-money laundering and know your customer identity checking, we have more general identity management. Identity management has traditionally been a hard problem to solve on the internet. Having the ability to verify your identity is the cornerstone of banking transactions that happen online. Distributed blockchain ledgers give a much better way of proving who we are, along with the possibility to digitize personal documents. While this identifying data for a person can be digitally signed and installed onto a blockchain which can be used by multiple organizations to verify people's identity, the permanent nature of the blockchain makes this data impossible to modify and forge once it's on the blockchain, which is a massive plus point. Finally, we have the recording of land registry deeds or any other type of public and official documents. Using accessible public ledgers like a blockchain can be useful for any record keeping. A good example for this is property titles and deeds. These types of documents can be victims of fraud, as well as expensive to administer. As properties are bought and sold, the details of their sale when transfer of ownership are permanently written onto the blockchain. This is by no means an exhaustive list of the potential applications of blockchain, but it should give you a great idea of the potential benefits. In the next module, we'll start off laying some of the groundwork we need to build out our example system for Globomantics. In the next module, we'll look at the cryptographic primitives that we are going to need for building our blockchain.
-
Understanding the Cryptographic Principles Used with Blockchain
Introduction
Hi. My name is Stephen Haunts, and welcome back to my course, Blockchain - Principles and Practices, here at Pluralsight. In this module, we're going to take a closer look at some of the cryptographic primitives you will need to build out a blockchain data structure. We're going to look at hashing, authenticated hashing, and digital signatures. These don't represent all the cryptographic primitives available in .NET, but they are the ones we need for blockchain. Cryptography is a fascinating subject, and if you want to understand this subject even further, then I recommend my Pluralsight course called Practical Cryptography in .NET. This course will go into a lot of depth about all of the cryptographic primitives available in .NET and how to use them all together. If you want understand more about how encryption key management works, which we'll touch on later in this course, then I recommend another of my courses called Play by Play: Enterprise Data Encryption in Azure Revealed. In this course, I explain how to use the Microsoft Azure Key Vault to protect your encryption keys. Let us start off by looking at what hashing is.
-
Hashing
The process of hashing data is a very important technique using cryptography, and it forms a backbone of what we're going to look at in regards to building up our blockchain implementation. A cryptographic hash function is an algorithm that takes an arbitrary block of data and returns a fixed-sized string, the cryptographic hash value, such that any accidents or intentional change to that data will change the hash value. The data to be encoded is often called the message, and the hash value is also sometimes called the message digest, or simply the digest. The ideal cryptographic hash function has four main properties. First, it must be easy to compute the hash value for any given message. This means that with any block of data, it should be easy to run a hashing function to calculate the hash. Next, it should be infeasible to generate a message that has a given hash. This means it should be infeasible to generate some original data that will result in a predetermined hash code or digest. Next, it should be infeasible to modify a message without changing the hash. This means if you change just a single bit in the data that you want to hash, then the resulting hash code is completely different. And finally, it should be infeasible to find two different messages with the same hash. This means you've got two different blocks of data that you want to create a hash code for. They should not both end up with the same final hash code. This is referred to as a hash collision. Another way of thinking of a hash function is that of creating a unique fingerprint of a piece of data. Generating a hash digest of a block of data is very easy to do in .NET. There are various algorithms you can use, such as MD5, SHA-1, SHA-256, and SHA-512. A hash function is a one-way function. That means that once you've hashed in data, you cannot reverse it back to the original data. On the flip side to this, encryption is designed to be a two-way operation. Once you've encrypted some data using a key, you can then reverse your operation and decrypt the data by using the same key. When you're hashing data, the hash will be the same every time you perform the operation, unless the original data changes in some way. Even if the data only changes by one bit, the resulting hash code will be completely different. This makes hashing the perfect mechanism for checking the integrity of data. This is useful when you want to send data across a network to another recipient. Before sending the data, you calculate the hash of the original data to get its unique fingerprints. You then send the data and the hash to the recipient. They then recalculate the hash of the original data they have just received, and then compare it to the hash that you sent. If the hash codes are the same, then the data has been successfully received without any data loss or corruption. If the hash codes do not match, then the data received is not the same as the data originally sent, and you shouldn't trust that data. The two most common hashing methods used are MD5 and the SHA family of hashes, SHA-1, SHA-256, and SHA-512. MD5 is generally not used much these days for new software, but it is still relevant if you are integrating with older systems that still use MD5 hashes. For this course, we will not be using them. Instead, we'll focus just on SHA-256. The SHA family is a family of cryptographic hash functions published by the National Institute of Standards and Technologies, or NIST for short. The SHA family covers some different variants including SHA-1, which is a 160-bit hash function which resembles the early MD5 algorithm. This was designed by the National Security Agency to be part of the digital signature algorithm. Cryptographic weaknesses were discovered in SHA-1, and this standard was not longer approved for most cryptographic uses after 2010. Next, we have SHA-2, which is a family of 2 similar hash functions with different block sizes known as SHA-256 and SHA-512. They differ in word sizes. SHA-256 uses 32-bit words, whereas SHA-512 uses 64-bit words. These versions of the SHA algorithm were also designed by the National Security Agency. Finally, we have SHA-3, which is a hash function formally called Keccak chosen in 2012 after a public competition amongst non-National Security Agency designers. It supports the same hash lengths as SHA-2, and its internal structure differs significantly from the rest of the SHA family. At the time of writing this course, SHA-3 is not directly supported by the .NET Framework; although, third-party implementations are available. Implementing secure hashes into your applications is a very straightforward process. Let's take a look at some code to calculate the SHA-256 hash. First, we pass in a byte array of our data to hash into the method. If there is a string that you wish to hash, then you'll need to convert that string into a byte array. We'll be doing this later in our blockchain implementation. Next, we create an instance of the SHA-256 class by calling the static Create method. Once we have done this, we simply call ComputeHash and pass in our byte array of data to be hashed. Once we have done this, we have then returned a byte array of our hashed data. What is then common to do is to then convert this into a Base64 encoded string to make it human readable and easy to store in a database. You'll see examples of doing this later in the course.
-
Authenticated Hashing (HMAC)
There is one step further that we can go with SHA-256 hashing, and that is to create what is called a hashed message authentication code. Fundamentally, this is the same as hashing, but is taken a step further. If you combine a one-way function such as SHA-256 with a secret cryptographic key, then you get what is called a hash message authentication code, or an HMAC for short. Like a hash code, an HMAC is used when you want to verify the integrity of a message. An HMAC also allows you to verify the authentication of that message because only a person who knows the key can calculate the same hash of that message. An HMAC can be used with different hashing functions like MD5 or the SHA family of algorithms. The cryptographic strength of an HMAC depends on the size of the key that is used. In our case in this course, we are going to be using an HMAC with a 256-bit or 32-byte key. Hash message authentication codes are used when you need to check both the integrity and the authenticity of a message. For example, consider a scenario in which you are sent a piece of data along with its hash. You can verify the integrity of the message by recomputing the hash of that message and comparing it with the original hash that you have received; however, you don't know for sure that the message and the hash were not sent by someone you know or trust. If you used a hashed message authentication code, you could recompute the HMAC by using a secret key that only you and the trusted third party know, and compare it with the HMAC you just received. This serves a purpose of authentication. In the following example, I'll show you how to use an HMAC that uses a 32-byte or 256-bit key and the HMAC based on the SHA-256 hashing algorithm. In addition to the hashed MAC SHA-256, you can use SHA-1, SHA-512, and SHA of MD5. The interface is all the same, but for the purpose of this course, we'll just be using the SHA-256 variant. On the screen now, you can see a snippet of code that calculates a SHA-256-based hashed message authentication code. Like the standard SHA-256 hash, the hashing class deals with byte arrays, but in this example, I've put in the necessary string to byte array conversions to show you what these look like. First, this method takes a string of our data to be hashed and the string representing the key. The key can be anything that you'd like, but it shouldn't be longer than 256 bits or 32 bytes in this case. Next, we use the Encoding.UTF8.GetBytes method to convert the data to be hashed and the key strings into byte arrays. Then we create an instance to the hash MACSHA-256 class and pass the byte array of our key in to it. Next on the HMAC object, we call ComputeHash and pass in the byte array of the message that we want to create the authenticated hash for. The result of this is also a byte array, which we convert back into a Base64 encoded string using the Convert.ToBase64 string method. Now that we have covered hashing and authenticated hashing, let's look at the final cryptographic primitive we need in this course, which is digital signatures.
-
Digital Signatures
An important function of cryptography is to ensure non-repudiation of a sent message. This is where the receiver of that message cannot deny that the message is authentic. Additional signatures is a technique used to help demonstrate that there's authenticity of the message. A valid digital signature gives the recipient a reason to believe that the message was created by a known sender, such that the sender cannot deny having sent that message. Digital signatures give you both authentication and non-repudiation. Authentication because the signatures had to be created by a user with a valid, private key, and non-repository udiation as a receiver can trust that the message was signed by a known sender, as only they know the private key. So how do digital signatures do all this? For the receiver of the message, a digital signature allows the receiver to believe that the message was sent by the correct sender. This can be thought of as the digital equivalent to a signature on a letter, except a digital signature is much harder to forge. A digital signature consists of the following three algorithms: the public and private key generation using RSA, a signing algorithm that uses the private key to create the signature, and a signature verification algorithm that uses the public key to test if the message is authentic. The private and public keys are two keys that are mathematically linked. The public key, as the name suggests, can be known by anyone, and the private key should only be known by you, the sender of the message. So therefore, it's this private key that has to be kept secret. If you want to learn more about asymmetric cryptography in private and public keys, then I highly recommend you watch my other course on Pluralsight called Practical Cryptography in .NET. For this course, we are only dealing with digital signatures instead of full asymmetric cryptography. Let's now look at an example of how a digital signature works. In this example, Alice has sent in a message to Bob, and that message will be signed with a digital signature. First, Alice encrypts some data that she wants to send to Bob. For this example, it doesn't matter whether the data was encrypted with a symmetric or asymmetric encryption algorithm. Once this data has been encrypted, Alice takes a hash of that data. Next, Alice signs the data with her private signing key. This creates a digital signature. Then, Alice sends the encrypted data, its hash, and the signature to Bob. First, Bob recalculates the hash of the encrypted data. Bob then verifies the digital signature using the calculated hash and the public signing key. This will tell Bob whether the signature is valid or not. If it is valid, Bob can be confident and see it was Alice that sent him the message in the first place, as it could only have been signed using her private signing key, which only Alice knows. If this signature was not valid, then Bob should not trust the origin and authenticity of the message. As we just mentioned, the digital signature signing is based on RSA. But with RSA, we need to encrypt some data. You encrypt the data with the recipient's public key, and then the recipient decrypts it with their private key. With a digital signature, signing and verifying is the other way around. When the sender signs a message, they use their own private key, and then the recipient verifies the signature using the sender's public key. It is due to the fact that we've signed the sender's private key that a recipient can trust that a message was sent by that sender, as it will only be them that knows that private key. Let's now take a quick look at how this is implemented in the .NET Framework. First of all, we need to generate our key pair. In this example, I'm storing keys in local memory variables. Typically, asymmetric keys are either stored in certificates or in a hardware security module, which is a hardware appliance stored in a data center. If you want to see how to do this in a hardware security module, such as the Key Vault in Azure, then please watch my course, Enterprise Data Encryption in Azure Revealed. This code generates two keys, a public and a private key. It is the privateKey that we will use to generate our digital signature, and a publicKey that any recipient can use to validate the signature. To generate these, use the RSACryptoServiceProvider class. This is done by calling the ExportParameters method by passing in false to generate the publicKey and true to generate the privateKey. Next, we need to be able to generate a digital signature of some data. But in creating a digital signature, we don't calculate it on the actual piece of data itself. First of all, we calculate a hash or an authenticated hash of that data, and we then generate the signature off of that hash. The reason for this is because the digital signature algorithm that we are using is based on RSA, which has a limit on the amount of data that it can work on in one go. And this limitation is that we cannot calculate the signature of the data that is larger than the key length, which is 2048 bits in our case here. Of course, you can break your large piece of data into smaller chunks, but it really is much easier and more efficient just to create a hash of the data and use that instead. If you look at the method on the screen at the moment, you can see that we pass in a byte array of the hash data. Then we create an instance of the RSACryptoServiceProvider class and pass our key size into the constructor. In this case, we are using a 2048-bit key. Then we need to import our private key with the ImportParameters command. This is because with a digital signature, we use the private key to calculate the signature. Once that is done, we then have to create an instance of the RSAPKCS1SignatureFormatter class. That names just rolls off the tongue, doesn't it? Into the object we pass in the instance of our RSACryptoServiceProvider class that we just created, which knows which private key to use. Next, we tell it what hashing algorithm we want to use. In this case, it's SHA-256. Now we are good to go. We call the CreateSignature method and pass in the byte array of the hash, which represents the data we want signed. This is returned as a byte array. Great, we have now created a digital signature. Now imagine that signature has been sent to our recipient, and they want to verify that signature. Well they do that with a piece of code that you can see on the screen. This method takes two parameters. The first is the original byte array of the data that was hashed, and the second parameter is a byte array of the signature that was generated by the previous method that we just looked at. Again, we create an instance of the RSACryptoServiceProvider class, and we pass in the key size of 2048 bytes. Next, we import our publicKey. The public key can be known by anyone, which means anyone has the ability to verify the signature. Next, we create an instance of the RSAPKCS1SignatureDeformatter class, and pass in the RSA instance we just created. Notice that this time we are using RSAPKCS1SignatureDeformatter and not RSAPKCS1SignatureFormatter. Next, as before, we set the hashing algorithm to be SHA-256. Then we call VerifySignature by passing in the byte array of the original hash and the signature. This will return a Boolean. If the signature is valid, then we get true, or false otherwise. In this method, we have covered the three cryptographic primitives that are required for us to build our blockchain implementation: hashing with SHA-256; authenticated hashing using the SHA-256 and a secret key; and finally, digital signatures using the RSA-based PKCS1SignatureFormatter and Deformatter. These are the three building blocks that we need for the rest of this course. As the course progresses, you will see how we use these three powerful primitives to build our essential data structures and algorithms of a simple blockchain implementation.
-
Storing Transactions in Blocks
Overview
Hi, my name is Stephen Haunts, and welcome back to my course, Block Chain Principles and Practices. In this module, we're going to start looking at our implementation of the block chain data structure and algorithms. As the module progresses, we're going to walk through an application that will start off simple and build up over time. We're going to build our example as follows. First of all, we're going to have blocks with a single transaction. We'll create the ability to link blocks together, so that if any of the blocks are modified after it has been added to the block chain, then the remainder of the chain will fail to verify. This will form the cornerstone of everything else that we'll build on for the rest of this module. Then we have blocks with multiple transactions, so that each block contain various different transactions instead of just one. We'll do this using a technique and a data structure called a miracle tree. Finally, we'll extend this version further by creating a transaction pool that feeds a block chain, which newer implementation might be a message queue or a remote repository of transactions. We'll also extend the applications or use authenticated hashes and calculate a four-digital signature for each block that is entered onto the chain. Finally, we'll talk about versioning of blocks to support the rotation of authenticated hashing keys. For each stage of our implementation, I'll first explain all the theory and work through it with diagrams. Then, I'll present a working implementation as a demo and walk you through how it works from a code perspective. If you are a software developer, then both the explanations and the code demos will be very helpful to you, as the code will help you build up your own mental model and give you something to experiment with. If you are not a software developer, but a solution designer or software or enterprise architect, then you may not necessarily be interested in watching the code demos. If that's the case, and that's fine, you can watch the explanations on their own right and still follow along. The code demos are there to support the explanation sections for developers, but they are not required if you just want an understanding of how blockchain data structure and algorithms work. If you are interested in the code demos, then the approach I'm going to take is to present a working solution for each stage of implementation, where I'll fully explain it along with call outs and annotations on the screen. Then we'll run the demos and play around with them a few times. I thought this was a better way to learn the subjects then you watching me type out the code in front of you. If you want to get the best out of these demos, then I encourage you to download the source code from Pluralsight, build the projects and play around with them. Start by setting breakpoints and stepping through the code execution, and then add additional data to the transactions to support your use cases. Each of the sample applications is implemented as a console application, so they can just build and run them without any additional configuration. As I mentioned earlier in the course, all the sample applications have been written to support .NET Standard 2.0 and .NET Core 2.0 and higher. This means you can execute the samples on Windows, MacOS, and Linux, using IDE, such as Microsoft's Visual Studio on Windows, Visual Studio for Mac, Visual Studio Code, or even the new JetBrains Rider IDE. You can, of course, build them on the samples from the command-line using the .NET's term on our application. Now let's revisit our scenario with Globomantics, the insurance company.
-
Overview of the Case Study
Globomantics is a fictional insurance company. It's part of their digital transformation that you want to use the blockchain to record details of settlement payments to their claimants. Globomantics wants to roll this out in their Motor Vehicle Division first, so their implementation will first include the recording of payments to their claimaints. This is where settlements are being made for the vehicles that are being written off. This could either be for an accident where the car is beyond economical repair, or the car has been stolen. Globomantics have drawn a consortium of insurance companies, who are all planning on writing their payments onto a blockchain, which means every business can validate and verify each other's payments while still maintaining a level of anonymity with the data. Once a settlement has been agreed with the person making a claim, Globomantics wants to enter their details onto a blockchain, so there is a permanent record of that agreement, permanently embedded into that blockchain. The details I want to record are as follows. First of all, we have the claim number. This is the internal reference number at the insurance company that ties the data back to their insurance systems. Then we have the settlement amount. This is the value that Globomantics have agreed to pay out to the claimant once the settlement has been approved. Next we have the settlement dates. This is the date that the settlement amount was finally agreed with the claimant. Then we have the car registration. This is the number on the registration plate for the vehicle and this uniquely identifies that vehicle. Then we have the mileage. This is the mileage amount that is recorded on the odometer when the car was inspected after the accident. If the vehicle is stolen, then its number may just be an estimate, but it is a mileage amount that was submitted along with the claim. And then finally, we have the claim type. For our example scenario, this will just be set to total loss. But if the system is being built up from insurance company, then it might include things like repairs for when the vehicle is not being written off, but is actually going to be repaired. Let's now move onto the next part of the course where we're going to look at storing blocks in a blockchain where each block contains just a single transaction.
-
Blocks with a Single Transaction
Now that we understand the requirements of our fictional company, let us dive in and build up our first example. This first example is a simple blockchain with a single transaction per block. To start with, we are going to keep things simple and build up a blockchain where the blocks only contain a single transaction. This means that the block will contain the details of an individual insurance claims settlement from our example scenario. The block will be split down into two parts. First is the transaction details for our claim settlements. This contains the claim number, the settlement amount, the settlement date, the registration plates, the mileage, and the claim type. This represents a business domain data that we want to store on the blockchain. Every time Globomantics makes a settlement for a claim to one of their customers, those details will be entered into a block, as well as the transaction details. We also have a series of fields called the block header. These header fields do not contain any transactional details from the insurance claim, but they are needed to link the blocks together and to make them cryptographically secure. So let's go through them one by one. First there is the block number. This is the number of the block in the chain. As you write new blocks into the chain, the number increments. The first block in the chain is referred to as the genesis block. Next, we have the creation dates. This is a date stamp for when the block has been created. Next we have a reference to the hash of the previous block. I'll talk about this in more detail in a moment. And finally we have a reference to the next block in the chain. I've left the very first item in the block header that you can see on the screen now to the end deliberately. This is the block hash. The block hash is a unique fingerprint for all of the data that's going to be contained in the block. Let's look at how this is calculated. To calculate the block hash, you follow a process like that shown on the screen. A block has the transaction details, such as the claim number, settlement amount, settlement date, registration plates, mileage, and claim type concatenated together, along with the block number, the creation date, and the previous block hash. These concatenated fields are run through our hashing function to create our unique hash code. The results of which is then stored in the block hash field. The block has is one of the most essential and fundamental parts of the blockchain ledger data structure. The hash function is a one-way operation that is calculated and becomes a unique fingerprint for the data that was passed into the hashing function. If just a single binary bit of any of the originating information in the transaction is modified, then the hash will be completely different, as this property of a hashing function is important in detecting any changes to the data in a block chain. Now that we have calculated the hash with single block, let's look at what we need to do to link the blocks together. In the block header, we have a field called the previous block hash and the next block. It points to the next block in the chain. The previous block hash contains a block hash from its parents. Let's visualize this by constructing a chain. First we have block number 1. We calculate the block hash with this block and store the result in the block hash field. Because this is the first block in our chain, the previous block hash field will be empty, as there are no previous blocks. Then we create the second block and we pass in the block hash of the first block. This block hash is stored in the previous block hash. Then we calculate the block hash for block 2 as we did previously. By including the previous block hash into that second block's hash code, we are in effect cryptographically linking those two blocks together. This will be important when we look at verifying the chain in a moment. Now let's add a third block into the chain. First of all, we create the block and pass in the transaction details that are required. Then we pass in the block hash of the current head of the chain and store it in the previous block hash. Now we calculate the block hash for the new block, which as before creates a cryptographic link between the two blocks. If we go back and look at the very first block again, you will notice that we never set the previous block hash. Again, this is because as the head of the chain, there is nothing before it. In this case, the previous block hash is null. In a block chain, the first block, or genesis block, is the only block that will have an empty previous block hash. For anyone that has taken a computer science class before, this may feel familiar, because conceptually this is pretty much the same as a double linked list. On the screen now you can see a textbook example for double linked list. Each node in the list has two fields that need to be provided. I referenced in the next node in the linked list, and I referenced in the node before it. It is pretty much the same as what we have looked at, except we are using cryptographic hashes to represent the link back to the previous block. Now to follow the block chain when we traverse it, we'll only be going in one direction, which is to follow the next block reference. But conceptually they are very similar. Similarly, with the first node in the linked list, because it doesn't have a node before it, then the previous node is null. If you are from a computer science background, then this mental model of the double linked list is an excellent way of understanding the basic premise of how a blockchain data structure works. Now that we've built up a chain of blocks containing the block header and the transaction details, now we want to perform a verification process in the blockchain to ensure that all the blocks have maintained their integrity, and that no data has changed after the block has been inserted into the blockchain. If, for example, we had 5 blocks in our chain, and some days that are in a transaction has been changed at block 2, when we verify that chain we would expect blocks 2, 3, 4, and 5 to file verification. This is because a block hash from a block is also included in the block hash of the next block. So if that block's data changes in any way, then that block hash will change, which means the block hash of the next block will also be difference, because of the inclusion of the previous block hash. Let's walk through this with our previous example by looking at Happy Path case. First of all in our first block, we recalculate the block hash, and compare this value to the hash of the version it was stored in the first block. If they match, then we know that none of the data has been changed. Then we do the same for block 2. We recalculate the block hash of block 2 and we provide it with a newly calculated block hash for block 1 that we have just calculated. If the block hash on block 1 hasn't changed, and none of the transactional data from block 2 has changed, then a newly created block hash will match the block hash that was stored in block 2. Then we do the same for block 3. We recalculate the block hash for block 3, and we provide it with a newly calculated hash from block 2 that we just calculated. If the hash from block 2 hasn't changed, and none of the transactional data from block 3 is changed, then a newly calculated block hash will match the hash that you stored in block 3. So we've just looked at a success case. None of the data has been changed or tampered with, so therefore we have integrity across the entire chain. Let's now look at a failure case where the data in the first block gets changed after the block hash has been committed. This might be through data corruption or deliberate tampering of the data in the block. First of all for block 1, we tamper with the data that has already been committed, and then recalculate the hash and compare this new version to the version stored in block 1. As the data has been tampered with and changed, the newly calculated hash will not match the hash that was committed into the block. This means we have a problem as we have lost data integrity. This is bad enough, but let's carry on verifying the chain. So now we do the same for block 2. We recalculate the hash for block 2 and provide it with a newly calculated hash from the previous block that we just calculated. As the previous hash is changed, when we recalculate block 2's hash, it will no longer match the existing hash for that block. And as we can see, the tampering of the data from block 1 has cascaded down to block 2. Even though the transactional data from block 2 hasn't changed in any way, because the hashes have changed, we no longer have integrity. So let's follow this through and look at block 3. We recalculate the hash for block 3 and provide it with a newly calculated hash from block 2 that we just calculated. It will no longer match the existing hash for that block. So now not only has block 1 found verification, but this is also cascaded down to blocks 2 and 3, even though their transactional data hasn't been changed in any way. This is one of the great features of the blockchain, because we are using cryptographic hashes and proofs to check the integrity of our chain. Change the integrity of a single block, and the rest of the chain that follows after that block will not verify. This makes tampering very easy to spot.
-
Demo - Blocks with a Single Transaction
Now that we have looked at how blocks are arranged, linked into chains and verified, let's translate that into code. The code demos in this course are always in C# and conform to .NET Standard 2.0 and .NET Core 2.0 or higher. This means that all the demos work across Windows with Visual Studio and on the Mac with Visual Studio for Mac. You can also use JetBrains new Rider IDE if you so wish. I will show you the demos in Visual Studio for Mac, but they work the same across Visual Studio for Windows and Rider. The community editions of these IDEs are perfectly suitable. In the following demo, I will present a code solution to implement what we have just talked about by creating a links chain of blocks where each block contains one transaction. I'll explain how the code works and step through it a few times. I suggest you download the code and play around with it, as this will help you learn the concepts best by just experimenting with it. So let's take a look at the demo. So what I want to do first is just to look at some of the code that we've got, so in our solution here we have the BlockWithSingleTransaction. So first of all, we'll look at some of the interfaces I've got. So this is the data structure for a block, so at the top here, we can see that we have our transaction data that relates to the insurance claims settlement by Globomantics. So we have the claim number, the settlement amount, the settlement date, the car registration, the mileage, and the claim type. So these are our business domain objects or pieces of data. So then below that we have the block header data, so we have the block number, we have the block creation dates, we have the block hash, and then we have the previous block hash. And then below that we have some utility functions which our block is going to implement, so we have CalculateBlockHash, and that does as you'd expect, so when we pass in a previous block hash, it will calculate the total hash for the entire block, which we'll say factors in the hash at the previous block. I have a method called SetBlockHash, and this method will actually commit that block into the chain. And then we have a reference to our next block, and then we have another helper function called IsValidChain, which will help us validate our chain's validity. So the next interface we have is the IBlockChain, and this is just kind of a wrapper class, which helps us encapsulate the block chain together. So in here we have a method called AcceptBlock, and this is what we'll use to accept a block into our chain, and then we have VerifyChain, which will kick start the verification process. Okay, so let's now look at, have a quick look at some implementation code. So first of all, we have the block, and this implements our IBlock interface. So as we've just seen, we have the data that represents our transaction at the top here, and then we have the block header information. Then we have our constructor, so instead of the constructor, we're taking the insurance claim data, as well as the block number. And we also have a reference to our parent block. And in the constructor we just assign that data, then at the bottom here we call SetBlockHash. What this will do is it will calculate the block hash along with the parents. So the actual method which calculates our hash is what we can see on the screen here. What we're doing is we're calculating a series of string fields, so first of all, we're taking our transaction data, and we are concatenating them together just as strings, so that's the claim number, the settlement amount, the settlement dates, car registration, mileage, and claim type, and then that gives us a string here called txnHash. Then what I'm doing is I'm doing something similar, but for the block header data, so I'm concatenating together the block number, the creation date, and the hash of the previous block. And that gives us a string called blockheader. Then I'm combining those two together, so we're getting a third string with both those bits of information combined. And then I am calling ComputeHashSha256, and then we get our string of our combined data and we convert it into a byte array, pass that into our ComputeHashSha256, and then the result, which comes after that, which is also a byte array, we are then converting into a base 64 string, which we can then store within our block. So we have a helper method here called SetBlockHash, and this does exactly that, it sets the block hash, it assigns the previous hash if we have one, and it will say calculate a block hash and stores it within the block, so call in the utility function that we've just gone through. Then a final method we have here is called IsValidChain, so this will effectively from this block it will traverse down the chain checking wherever it's valid. So we'll step through this in a moment, and then we just have the utility function, which is going to print some data out into the console for us. So next we have a class called BlockChain, and this is just really a wrapper that kind of encapsulates everything that we're doing. So in here we have two block references, so we have a reference to the CurrentBlock, so whichever the CurrentBlock in the chain is will be referenced here. We have a HeadBlock, which is going to store our genesis block, so the block at the very beginning or the head of the chain. And we also have a generic list of blocks, which we store in here as well. So in our AcceptBlock method, we take in a block, we check wherever the HeadBlock is null, so is this the first block that's been created. If it's null, then we set the HeadBlock to be this block. And we will say set the previous hash of that HeadBlock to be null, because there is nothing before it. Then we also set the CurrentBlock to be this block as well, because this is going to be the first block since the genesis block is also the CurrentBlock in the chain. And then we just add it into our list. Then we have a VerifyChain method, and literally what this does is it just goes to the HeadBlock and it calls the IsValid chain, and then this will kick start our validation process, which we'll go through in a moment. Okay, so if we now look in the main method, because this is a console application, I'll just quickly run through what we're doing here. I'll run it and we'll step through it and just look at the results, and then we'll step through it in more detail. So what I'm doing here is I'm first creating an instance of our BlockChain class, and I am then creating eight blocks. So if we look here, so put the first one, we're passing in the block number 0. We then pass in some business domain data, which are these fields here. And then this final parameter here is a reference to the parent block. So in this case, for the first one, the parent is null, because there isn't one. And on the second one, which we've called block 2, we're saying that the parent is block 1, and so on. And then down here, once we've created those blocks, we then accept them into our chain, so this effectively just stores them in that generic list inside the BlockChain object. And then we call Verify. And what we'll do for the moment, is I'm just going to quickly comment out these lines here. Okay, so now let's quickly run the application. Okay, so let's create our BlockChain, let's create our 8 blocks. Let's quickly accept them into the chain. Then let's call Verify, so let's look at what happens here. Okay, so you can see that we've created a block chain with our 8 blocks, and then we've called the verification on them, and then all of them have passed, so that's fantastic. So let's now stop this. Now what we're going to do is I'm just going to quickly uncomment these lines here. So what I'm doing is once we've verified the chain and we know that the chain is intact and passes verification, I'm now going to go to my fourth block and I'm going to mess around with the data. I'm going to change the creation date that's in there. So I'm deliberately tampering with the data that's in that chain, or in that block, rather, and then I'm going to re-verify the block and see what happens. So let's run this again. So let's create our blocks, set them into the chain. So we do our first verification, which we've already seen all of the blocks pass verification. And they're going to be very naughty and mess with some of the data in block 4. And then I'm going to call VerifyChain again. So let's see what's happened when we look at our console output. Okay, so at the top here, this is where we first ran the verification across the unmodified blocks, and we can see that everything has passed verification. So then we messed around with the fourth block in the chain, which is block number 3 in this case, because we're starting at 0. So the first three passed as we'd expect. But then when we've modified the fourth block, that block has indeed failed verification, because when we recalculate the block hash, it is not going to match what we originally stored in the block. So that's failed verification, but then when it goes onto the next block and it passes its block hash in as the previous hash for the next block, we recalculate its block header and then compare them and we can see that it's also failed, because the chain has cascaded down, and so on, until we get to the very end of the block chain. So you can see that by a change in the fourth block, those failures have cascaded down, which is kind of one of the main points of the block chain. Okay, so what I'll do now is I will step through it again, but we'll step through it in a bit more detail this time. Okay, so we've created our block chain. Now let's have a look at the creation process of one of the blocks, so let's go in here and step into it. So here we're assigning the data that we're passing in just into that class. And then we'll call in SetBlockHash, and we're passing in the parent. Now in this case, it's going to be null, because this is our first block. So we're setting our previous block has to be null, because we don't have a parent, because we're the genesis block. And now we're going to calculate our block hash. So first of all, let's concatenate all of the transaction data together. Now we do the same for the BlockHeader. And then we just quickly combine those two strings together. So that data there represents all of the data in our block. So now let's calculate a Sha256 hash of that data. So if I step into here, we create our sha256 object, and we call ComputeHash on that object. Now, because it's a string, we turned it into a byte array, so this now represents all of our data. So I'll calculate the hash and return it, and then we convert that hash, which is a byte array at this point into a base 64 string and then we'll return it, and then assign it to the block hash. So if we look at the contents of BlockHash now, that is a sha256 hash which is being converted into a base 64 string. Okay, so that's that block created. Now we do the same for all eight blocks. I won't step into each one individually, because it's basically the same. Okay, so now let's set a block into the chain. So this first one here, so is it the HeadBlock? Yes, it is, because HeadBlock is null, so we quickly assign that and we set its previous block hash to be null, and then we set the CurrentBlock to be this block, and then we just add it into our list. And if we look at the second one, so is this the HeadBlock? No, it's not. But we do set it to be the CurrentBlock, and then we add it into our list, and so on. Okay, let's now verify the chain, so let's see what that looks like. So we come into VerifyChain. So what we're going to do is we are going to step into the HeadBlock, because that's the starting point for our block chain, and we're going to call IsValidChain. So first of all, what we want to do is we want to recalculate the block hash into our local variable. So we'll go in as before. We concatenate all of our data together, create a sha256 hash, convert it to a base 64 string, and return it. So in here we've created an in memory version of our block hash. Now we compare this newly created block hash to the one that's stored in our block, so they're not different. So we then compare the previous block hash that we've got stored to the one we just passed in. And in this case, we are still valid. Print some data to the screen. We check if we have a next block. If we do, we then call IsValidChain on the next block, and we pass in the newly created block hash that we've just created. So now that newly created block hash from the previous block we are now referring to as our previous block hash. So when we recalculate the block hash, we're going to pass that previous block hash in, and then that, as you can see here, gets included in our BlockHeader. So if anything has changed in the previous block, because we are taking into account the hash from the previous block as the previous block hash, if anything is changed, then we'll get a different hash here. But in this case, we haven't, so nothing's changed. So our block is still valid. So I've skipped over the rest of them, because in this case we know that all of these are valid, because we've already seen them. Here we print out a message saying, "Blockchain integrity intact." Okay, so now what I'm doing is I'm going to go to block 4 and I'm just changing the data that's in block 4. So I'm now going to call isValid on the HeadBlock. I'm going to skip through some of these quite quickly. So, we know that this first block is valid. So let's now check the next block. So again, this one is valid. That's fine. So let's go to the next block in the chain. Okay, and this one is valid. Okay, so this next block should be where we start having the problem. So let's go into it. Recalculate the BlockHash. Now if we compare, so we've got the newly created block hash, and we can see it starts with capital Aoe, and compare that to the BlockHash that's been stored, and this one starts with 62JS, so we know that they're different. So this is a problem. So we set isValid to false. So we know that this block is now invalid. So now what we're going to do is we're going to pass through that newly created block hash, you know, the one that starts with Aoe, and we're now going to pass that into the next block. So we pass that previous block hash into our block creation, or our block hash creation, so now let's check here. So this one starts with itM5, and we compare that to the previously stored block hash, and that equals ZZKG at the beginning, so we know that this one is also different. So our problem has cascaded down. So let's try this just once more. So again, we've passed in that newly created previous block hash. We include that into our calculation with the new block hash here. So this one starts with f1M6, and we compare that to the block hash that's been previously stored, and again, it's different. This one starts with KY066. And again, this one is invalid. So I'm not going to go through the rest, because we know what's going to happen. So we can see here that we don't have blockchain integrity. So if I go back and look at our console app, we can see here that at block 4, because that was the block that we modified the data on, has failed verification and then cascaded down. Okay, so that's the sample code for the first part of this module where we have a block that has a single transaction, so what I encourage you to do is to download the sample code from Pluralsight, load this up as we've done here, and just have a play around with it, just get familiar with what's in there. So the code is relatively simple, it's quite straightforward for you to follow through. And I think that's probably going to be the best way for you to kind of learn what's going on by just playing around with that code. So let's now move onto the next part of this module.
-
Blocks with Multiple Transactions
Now that we have looked at building a basic blockchain containing a single transaction per block, let's move onto the next level where a single block will contain multiple transactions What we have built so far of a single transaction per block is perfectly fine and it works, but it is much more common to have multiple transactions contained within a block. The mechanics of how the block operates with the block hash being used to link the blocks together doesn't change. It's exactly the same. What you need to do is generate a single hash code that represents all of the transactions that you want to store in that block. You could do this by generating a unique hash for each transaction like in the single transaction example, and then concatenate all these hashes together and then calculate a single hash to represent them all. This will work, but there is a data structure that is tailor-made to creating hash codes for multiple data points. That data structure is called a Merkle tree. The Merkle tree is a data structure that lets you take a number of pieces of data and then calculate a cascading tree of hashes of that data. The result will be that you will have a single hash that represents all of the different pieces of data. Let's look for a simple example. On the screen you can see four transactions. In those nodes we create a hash using our SHA-256 hash function like we did before. This means we end up with four hashes. Then we take the hashes or transactions 1 and 2 and we hash them together to produce a new hash, which we have called hash (1,2). This box represents a single hash of the 2 hashes from transactions 1 and 2. Then we do the same for transactions 3 and 4, which produces a new hash code called hash (3,4). Once we have created this layer in the tree, we then create a new node at the top, which is a hash of the combined hashes for hash (1,2) and hash (3,4). This is a simple example, and it's three layers deep. As we have more transaction nodes at the bottom of the tree, more layers will be introduced into the Merkle tree, because each of those individual transactions are hashed in pairs. In our previous examples, we used a single hash to represent the transaction to help create the overall block hash. Incorporating a Merkle tree means that the mechanism is exactly the same. Instead of a single hash representing one transaction, we'll use the hash at the top of the tree, which in our example is called hash(1,2,3,4), and that hash instead represents all of the transactions. If we are at the hash at the top of our tree using our block for that block chain, and then we tampered with the data in transaction 2, if we recalculate the tree, the final hash at the top of the tree will be completely different to the hash that we used before, meaning that our blockchain would fail validation. The tree structure using a Merkle tree also has an interesting property when it comes to recalculating the final hash in the tree. Now say, for example, we modified a data in transaction 1, and then we want to recalculate the final hash at the top of the tree, we don't have to recalculate the hashes for the entire tree. We only recalculate hash(1), hash(1,2), and hash(1,2,3,4). The rest of the tree remains completely unchanged. Let's try another example. If we go and change the data stored in transaction 3, we only need to recalculate hash(3), hash(3,4), and then finally hash(1,2,3,4). Now imagine we are 500 transactions. At this scale, an optimization like this becomes very important as recalculating all of those sha256 hashes from a tree that size will be very slow and very inefficient. So the Merkle tree data structure allows us to be much more optimal. Let's now incorporate this into our blockchain. As I just mentioned, the mechanism for calculating the block doesn't change; we still have a single hash representing all of the transactions in the block, whereas before we just had a single transaction represented by a single hash. The key thing here is that it is still just a single hash. If you look on the screen, to illustrate this, we have a hash at the top of the tree feeding into our SHA-256 hash function, along with the data from the block header, so that's the block number, the previous block hash, and the block creation date. Once a SHA-256 hash has been calculated, we then store the result in our block hash. So as you can see, the way we calculated the hash for our overall block is identical and exactly the same as what we looked at earlier in the course. But we now have support for multiple transactions per block, which is much more efficient.
-
Demo - Blocks with Multiple Transactions
Now that we've looked at how to incorporate multiple transactions into our block using the Merkle tree data structure, let's take a look at this in action by sharing our .NET solution upgraded to take this into account. Merkle trees are fairly complex to implement, and this course isn't really about how to build a Merkle tree. I'll introduce you to an open source implementation that's incorporated into our solution. So let's now take a look at our code solution. Before we look at the code, I just want to talk about a piece of open source code I've incorporated into the project. As we've already discussed, to use multiple transactions within our block and to calculate the hashes, we're going to use a data structure called a Merkle tree. Now it's beyond the scope of this course to sit down and write a Merkle tree from scratch, so what I've done is I've incorporated an open source one, which I found on GitHub. Now this is by a guy called Marc Clifton, and he has some sample code here, which is very helpful for us. So first of all, I just want to talk about the license, so the license for this code is the MIT license, so we are quite free to use this and incorporate it into our solution. Now what I've done is I've taken the code from here and I've incorporated it directly into the solution that we're going to walk through in a moment, mainly because at the time of writing or recording this course, the code here was contained in project files which were for traditional .NETs, or the traditional .NET Framework as opposed to .NET Core. So all I've done is I've just ported those files across into a .NET Core project, which works absolutely fine. Okay, so let's go back to our solution. We now have a project called BlockWithMultipleTransactions, so what I want to do is just quickly look at some of the changes to the interfaces. So if we look at the block, so previously we had all of the insurance company transactional details contained within the block, because we just had a single transaction for that block. But what I now have is a list of ITransactions, so we're now storing a list of transactions within this block. So if we have a look at what an ITransaction looks like, and you know, you'll recognize it from the previous example, but we have the claim number, the settlement amount, the settlement date, the car registration, mileage, and claim type, and we also have a method called CalculateTransactionHash, which calculates a hash of just the transaction details. And the iBlockChain hasn't changed from the previous example. Now in the Merkle folder, I have all of the code from GitHub, which Marc Clifton has written, so I've incorporated that into this project, and that's what we're going to go and use in our implementation. In the block then we're implementing our interface, and we have the generic list of transactions. Then we have the block information or the BlockHeader information that we had previously, so BlockNumber, CreationDate, BlockHash, PreviousBlockHash, and a reference to our NextBlock. But then also what I have here is a private member, which is our Merkle tree. So as we come in and initialize the block, and we set the block number, set the creation days, and then we initialize our list of transactions. Then we have a method here called AddTransaction, where we just pass in a reference to our ITransaction, and then we add it into the list. So our CalculateBlockHash method looks a little bit different to what we previously had, so when we calculate the block header, we again concatenate the block header information together, so the block number, creation dates, and the previous block hash. And then we create our combined string, instead of concatenating all the transaction details together, what we do is we used a Merkle tree root node, so by the time we call this, we have already calculated the node or the hash from that Merkle tree, and that final resulting hash at the top of the tree will be stored in that root node member. So we concatenate that together along with our block header, and then as before, we call ComputeHashSha256 by turning our combined blockheader string into a byte array. We calculate the hash, and then we convert the results from a byte array back into a base 64 encoded string. So looking at our SetBlockHash method, which is similar to what we had before, except down here we have a call to build the Merkle tree. So what we're doing here is we iterate through our list of transactions, because we've already loaded the transactions into the block. And then on the Merkle tree, we call AppendLeaf, and then what we're doing here is we're manually calculating the hash for each of those transactions, because if you remember, we had the CalculateTransactionHash method on our ITransaction. So we're passing those individual hashes into the Merkle tree as leaves of the tree. So if you remember back to our diagram that we discussed previously, we had the transactions at the bottom, and then as we calculated the Merkle tree, it built that pyramid up to the final hash, which is our root hash. So once we've added all of those transaction hashes into the tree, we then call merkleTree.BuildTree, and that goes away and does all of the calculations, which means we then have that root node hash, which is our final hash. Then again, we have the isValid chain method, which again, because we're recalculating the block hash internally in this method, we also need to rebuild the Merkle tree, so if we've gone and tampered with any of the data in our transactions, we need to update the tree. So that's what this call to BuildMerkleTree does here. Then we calculate the new block hash, and then everything else is exactly the same as what we looked at before. So exactly the same process of validating our chain. Okay, so if we go to program.cs, so what I'm doing here is I'm creating a whole series of transactions. So we've got 16 transactions here. As we create it, we pass in our transaction data, so this is the business domain data from Globomantics. So we create those transactions. I then create four blocks. Then what I'm doing is out of those transactions I'm basically assigning four transactions per block. So transactions 1, 2, 3, and 4 going to block 1, then transactions 5, 6, 7, and 8 is going to block 2, and so on, and so we've just populated some transactions into each block. Then what we're doing here is we're calling SetBlockHash. So if you remember back to the previous demo, the SetBlockHash process was incorporated as part of the constructor for the block, but what I've done is I've pulled this out just so we can see what was going on. So we could set block hash, and that's going to go and basically recalculate the block header for each of our blocks and then commit the blocks together, so by this point we've actually built the chain. Then we call VerifyChain, and then at this point we should see that the entire chain verifies, okay. And what I'm going to do is I'm going to take one of the transactions, so in this case, transaction 5, and I'm going to mess around with the claim numbers. I want to put an invalid ClaimNumber in there, and then we're going to re-verify the chain. So if we just step into our transaction, literally what we're doing is assigning the data that we pass in just into member variables, so very straightforward. So now let's create a block. So we assign the block number. We assign the created dates, and we just initialize our generic list of transactions, and we do that four times. So now what I'm going to do is I'm going to add some transactions in. So the first four transactions we'll go into block 1. And literally all we're doing is we're just putting that into the list. Okay, so now let's set the block hash, so block 1 is our genesis block, so we don't have a parent, so we pass null into it. Now what we need to do is we need to calculate the block hash. So to do that, we first of all need to calculate our Merkle tree. So we reinitialize our Merkle tree, then we assign all of the transactions into the Merkle tree. But to do that, we call CalculateTransactionHash on the transaction. So if we just look at that. So this will probably look familiar. So we're concatenating all of our transaction details together into a string. So that's our transaction details there. And then we are calculating a hash. So, we convert that string into a byte array, we create our sha256 hash function, hash the data, return it, then convert it to a base 64 string. Then that resulting string with a hash of that transaction then gets added into our Merkle tree as a leaf. So we're going to do that four times, because we have four transactions. Okay, so now we have four transactions loaded into our Merkle tree, so we call BuildTree on the merkleTree objects, and this is in the open source code that we took off GitHub. So at this point we will have a root calculated for our Merkle tree. So now we want to calculate our block hash. So we're going to pass in the previous block hash, but remember this is the genesis block, so the previous block hash is null. So we calculate our BlockHeader, so this is just a string, so it contains our BlockNumber, CreatedDate, and the previousBlockHash. And then what we're going to do is we're going to concatenate that along with the root node of our Merkle tree, so that root node is the master hash, which represents each of those four transactions. So if we look at that combined string, we can see that we have some hash data at the beginning and then the block header information. So now we're going to calculate another sha256 hash of that combined string, so again, convert it to a byte array, pass it into the hashing function, convert it to a base 64 string, and then we commit that block hash into our block. And again, we do that for each of the blocks. And then we just accept those blocks into our block chain objects. Then we call VerifyChain. So, the verification process is basically identical to what we looked at in the past demo. So we start off with a head block. We call it IsValidChain. Now we rebuild the Merkle tree, just as what we've just seen, so again, we re-instance a Merkle tree, pass in our four transactions tree, call BuildTree, and then recalculate our block hash. That's fantastic. The previous block hash is also a match, and then we go onto the next block. Now we already know what's going to happen here, we know that these are all going to be valid. So we can see that our block chain integrity is intact. Okay, so now let's mess around with transaction five. So we've changed the claim number in transaction five to be something different. We're now going to call VerifyChain. Now we know that the first chain is going to be valid, because we assigned 4 transactions per block, and we changed block, so we've changed transaction 5, which is onto block 2. So this one we know is going to be fine. So we'll just skip over that. Then we go to our next block. Okay, so we rebuild the Merkle tree, and we recalculate the block hash. So the block hash we've just created starts with xHV, but the one that's stored in the actual block is iR0D, so we know that the block hashes now do not match. Clearly that's false, and we now go onto validate our next chain, so we're going to pass this newly created block hash into IsValidChain for the next block. We're going to recalculate the Merkle tree, breaking change from block 2 is cascaded down to the next block. And so on. I'm not going to go through each of them. As we can see here, it says Blockchain integrity is not intact.
-
Transaction Pools, Authenticated Hashes, and Digital Signatures
So far we built up an example of the blockchain data structure that can support multiple transactions per block. Now upon tampering with any of the data in the blocks, when we verify the chain, we can detect where that tampering has happened and ensure that the rest of the chain fails to verify. And now once we extend upon what we have built in three areas, first of all I want to discuss how transactions are handled. Then I want to look at incorporating some of the other cryptographic pillars into our solution that we discussed earlier in the course. But first, let's talk about how transactions are handled. The first example I want to talk about when it comes to handling transactions is the fully distributed public blockchain. This is what we generally associate with blockchains and implementations at Bitcoin and Ethereum, which we touched on earlier in the course, in these examples, you have many nodes on a network. Each node is responsible for both mining, or creating new blocks, and also verifying their own copy of the blockchain. To be able to mine blocks, each node needs to know about the transactions, all of the transactions. This means that whenever your Bitcoin transaction happens, these transaction details need to be communicated to each of the nodes in the network, so they will have a chance at trying to mine the next block. We'll talk about mining a little bit later in the course in more detail. The key point here, though, is that the transactions have to be communicated to every node. This means if you have a large network, like Bitcoin or Ethereum where you could have upward to 5000 nodes on the mining network, that transaction would need to reach all 5000+ nodes. You may be running a private, but fully distributed blockchain for your organization, which is essentially the same as a public blockchain, except the only authorized nodes are allowed on the network. This might be a consortium of banks or insurance companies, for example, but the principle is still the same. The transactions must be communicated to each node to allow them all a chance of mining that next block. Another style of blockchain that is in use today, which is a much more paired down variation of a fully distributed blockchain, this is more of a partial distributed blockchain. This is where you have a central authority that processes the transactions and mines the blocks, but the nodes on the network are there to only verify their copy of the blockchain. This means those other nodes won't be creating blocks, but they readily take copies of the blockchain so that they can verify the blocks that are contained within it. This is a much simpler variation of a blockchain, and you lose a lot of the trust benefits of the fully distributed blockchain, because you are reverting back to having a central authority, which controls the consortium of blocks by allowing third parties to verify what their company is doing. In a more private and restricted environment, this might be the way you have to go. In this case, it's fair to call it more of an immutable ledger with distributed verification as opposed to a fully distributed blockchain. Blockchain purists have disputed that this is even the type of blockchain. So unlike the fully distributed blockchains where the transactions have to be sent out to each node, in this example it's only the master blockchain, or the central authority, that deals with the transactions. This is a key difference, because we only have the central authority here to mind them in the blocks. Let's now take a look at adding some more of our cryptographic features into our blockchain. So far when we've been creating blocks, we've been using the SHA-256 hash to create the new block hash of the transactions, and then storing that in the block header. The reason we have done this is to allow us the main integrity of the chain. And also detect when we lose integrity if data is modified after the block has been created. In cryptography there are four core principles that you want to look for, and these are confidentiality, integrity, authentication, and non-repudiation. In this course, we're not going to cover confidentiality, which encompasses encryption of the data. If you want to learn more about encryption, then check out my other course, Practical Cryptography in .NET. For integrity, we've already covered that in the course, so the construction of the block hashes and the calculation of the Merkle trees using SHA-256 hashing. What I want to look at now is adding authentication and non-repudiation. Let's look at authentication first. Earlier in the course, we discussed hashed message authentication codes, or HMAC as they are sometimes known, and how they can be used for authentication. In their use, HMAC is very similar to use as a SHA-256 hash. The only difference is that you have to provide a secret key to the hashing function to calculate the hash. So for anyone who wants to recalculate and verify that hash, they also need to provide that same key. This means we can quite easily substitute the SHA-256 hash function in our blockchain. So far we have a hash message authentication code which is based on SHA-256. For our example company Globomantics, this is something they wish to do, because their blockchain is a private blockchain. Unless they have other organizations verifying the blockchain data, they only want certain companies to have access to do that. So controlling the hashing operation with a key works out well for them. The format of the output of the HMAC is exactly the same as the SHA-256 hash outputs. It's a string of data that we store into our block. So the substitution to a HMAC doesn't require us to change anything in the block header data structure. Mostly it's very easy for me to see, just substitute the SHA-256 hashing function with the HMAC. The reality is a bit more complicated. Sure, the code for doing this is quite straightforward, and you'll see that in the demo, but a complexity comes from how you protect the HMAC key. Keys are hard to share between multiple parties securely. If you cannot get that HMAC key to your blockchain verify securely, then you'll lose the benefits of the HMAC in the first place. One very robust way to share a key is by using what is called a hardware security module, which is a hardware appliance in a datacenter designed to securely store encryption keys. You could also use a cloud-based hardware security module like the Azure key vault, or any equivalent in any of the other cloud providers. Implementing this is beyond the scope of this course, but if you want to learn more about key management and Azure key vault specifically, then please check out my courses Practical Cryptography in .NET and the Play by Play: Enterprise Data Encryption with Azure Revealed. Now we have looked at adding authentication to support the blockchain, let's now add in non-repudiation. Non-repudiation is a guarantee that the original offer of a piece of data can't further down the line refute that authorship. It's a way of proving that you did something. In this case, adding transactions to a block. We do this using digital signatures, which we discussed earlier in the course. A digital signature is a form of public/private key encryption. If you remember back to what we discussed earlier, we have standard encryption such as RSA. We encrypt using a recipient's public key, and the recipient decrypts these in their private key. This means that only they can read the message, as they are the owner of the private key, which they have to keep secret. With a digital signature, this is the other way around. To create the digital signature, you use your private key to create the signature. And then a recipient can verify that signature is valid by using the originator's public key. The fact that the signer has to use their private key means that only they could have assigned that data in the first place, as they are in possession of the private key. This is the principle that we're going to use to add digital signatures into our blockchain example. We want to assess that it was indeed Globomantics that created that block in the chain and no one else. To do this, we're going to need to add an additional piece of data into our block header. The field we need to add is a digital signature field in the block header. Then, once we have calculated the block hash using the SHA-256 hash or HMAC if you want to authentication, we feed that block hash into the digital signature algorithm using our private key. This will output a digital signature, which we then include in the block header. If you remember earlier in the course, we stated that there is a size limit to the amount of data that you can calculate with a digital signature, so you have to use a hash of that data in the first place. Well, that's exactly what we have done here. We have used the block hash to feed into our digital signature algorithm. So we have collapsed our transactions into a single hash using a Merkle tree, created a block hash by hashing the root of the Merkle tree along with the block number, creation date, and previous block hash from the block header, and then we have calculated a digital signature using that resulting block hash. So let's review what we end up with here. So we have the ability to create blocks that are authenticated to accompany for a user using a HMAC. That also gives us the integrity protection we require for validating the blockchain. And then we've added in support for non-repudiation by using digital signatures. This means that Globomantics cannot refute in the future that they created that block, because that block was digitally signed with their private key, which only they know. Imagine a scenario where a claimant who is a customer of Globomantics agrees a settlement for their vehicle that is being written off for say $3000. At the time the settlement is agreed, Globomantics adds a transaction into the transaction pool, and a block is created and entered into the blockchain using their HMAC key and private digital signature key. Then, six months down the line, the claimant gets in touch with Globomantics and claims the settlement amounts should have been for $5000. Globomantics can now easily turn around and prove that the settlement was indeed for $3000, as there was a timestamped entry in the blockchain that was created with their HMAC key and their private signing key. As Globomantics has other organizations verifying blockchain data as well, those are additional proof that the chain hasn't been tampered with. There is another measure that we can take to make this even more secure, and we'll discuss that later in the course. Before we move onto to looking at the code demo that incorporates these features, I first want to talk about block versioning. So if we're going to be using HMAC and digital signatures, it is good practice to routinely rotate those keys. This means that you're changing those keys over time. Now imagine Globomantics has a blockchain that is a 1000 blocks long, and now you want to change the HMAC digital signature keys from block 1001 onwards. This is fine and encouraged, but if you just replace the keys, you'll have a big problem in that you will no longer be able to successfully verify blocks 1 to 1000, because the keys used to create those blocks no longer exist. This means you will need to version the blocks so that even after the key rotation, blocks 1 to 1000 will still verify using the original keys, and then blocks 1001 onwards we use the new keys. So you need to store some kind of historic key ID. If you use something like the Azure key vault, or an equivalent to store your keys, then you get key versioning already supported. You will just need to store a key ID that represents that key in the block header. Now, I know what you're probably thinking. If we change the keys, why don't we just recalculate the block hashes for the entire chain? With what we have built up so far, we could indeed do that, even though it is quite inefficient. But when we move onto to talking about proof of work later in the course, we'll make this pretty much impossible to do. The same also goes for bugs. Let's again imagine that you have a blockchain that's 1000 blocks long. But then you find a bug in the way the blocks are calculated and entered onto the blockchain, so your engine is fix the issue. If that fix gives a different block hash result for the first 1000 blocks, then again the existing chain will fail to verify. So this means you still need to include that original bug when validating the first 1000 blocks, and then use the bug fixes for block 1001 onwards. I'll give you a moment to sit there and think about that and shout, "That stinks!" It's not ideal for sure, but remember the blockchain is an immutable ledger of blocks that cannot change after the facts, so you need to maintain the ability to verify those existing blocks, even if that means you need to carry forward bugs in your code.
-
Demo - Authenticated Hashes and Digital Signatures
Let's now look at our demo solution again, but this time I've added in a simple transaction pool that represents a queue that the transactions are placed on. I'm just using an in memory queue structure for this example, but this could quite easily be pulling transactions out of a replicated database or off of a distributed message queue like RabbitMQ. I'll also upgrade this solution to use hash message authentication codes, and digital signatures. Right. In this final demo for this module, we're just going to extend the previous demo, which was the multiple transactions per block, and we're going to extend it in three ways. So first of all we're going to introduce the concept of a transaction pool. Then we're going to swap out our SHA-256 hashing function for a hash message authentication code. And we're also going to add in digital signature support for our block. So let's just take a quick look at our interfaces again. So the block looks very similar to what we had before, except we have a new string in here, which is the BlockSignature. This is where we're going to store our digital signature further down the demo. And we also have an interface here for a KeyStore. Now the KeyStore, just for demo purposes, is very simple. So it's going to contain the hashing key for our hash message authentication code, and then we have a method for creating the digital signature for the block, where we pass in the block hash and then we can create the signature. And then we have another method to verify the digital signature, so we can tell whether it's valid or not. So the transaction pool that we discussed a moment ago, now for the purposes of this demo, I'm just using a queue, and it's an in memory queue. Now if you were doing this for real, you might use a database where you just pull the latest transaction out of the database, or you could use a distributed queue like RabbitMQ or Azure queues, or that kind of distributed queueing system. But just for illustration purposes, I'm just going to use an in memory queue. Okay, so we have a look at our block implementation, so again, it's very similar to what we had before, but we've got the digital signature in here. We now have the second constructor where we can pass in a reference to a KeyStore. And then when we calculate the block hash, again, this is the same as what we had before, so we create the hash based on the block header and the root of the Merkle tree. But then I also have a little bit of logic here, so if the KeyStore is null, then I'm just going to use the standard SHA-256 hash that we had before. But if I've provided a KeyStore, then I'm actually going to use the HmacSha256. Now the other difference here is when we have the SetBlockHash method, so again, it's very similar to what we had before, where we are setting the block hash, but then a small bit of logic below here where if the KeyStore has been provided, then we're going to create a digital signature for that block hash, and we're then going to store it in the BlockSignature field. So remember, as we said before, when we create a digital signature, we're creating a signature on a hash of a piece of data, not the actual data, because this limits the amount that you can sign in one go, which works perfectly for what we have here, because we've already created that hash. It's the block hash. And that's what we're going to create our signature for. And this is going to help us prove that it was actually us that created that block, because to create that digital signature, you have to have a private key, and it is only me, or Globomantics in this case, that has that private key. Okay, so I think what we'll do now is we'll just quickly run through this. Again, we'll prove that it works, and then we'll step through it in a little more detail. Okay, so we set up all of our transactions, and then I'm just returning a reference to transaction 5, because that's the transaction that I'm going to mess around with in a moment. We then create our KeyStore, and I'm just going to pass in a random number, so I'll show you how that's done. So we're going to call GenerateKey on our Hmac class here, and this is just generating a random number using the RNG crypto service provider that we talked about earlier in the course. And this is just our first 2 bytes or 256 bits key, which is going to be returned as a byte array, and that's what we're going to use as the secret key for our hash message authentication code. Now we're not really storing it anywhere, we're just using it in memory, because that's fine for the purposes of the demo. But in real life, you'd probably want to store that somewhere safe. Okay, so I'm going to store that key, which is our first 2 bytes array. I'm going to create our digital signature class, which we'll step into in a moment, and I'm then going to generate some digital signature keys. So as we discussed earlier in the course, we use our RSA crypto service provider for creating those keys, and I'm going to create a public key and a private key. And again, we're just storing those in memory, but in a real world scenario you'd probably want to store those, even a certificate or a hardware security module as your key vault. So I'm now going to create our four blocks, and I'm going to pass in the reference to the key store, which is so each block knows where to get its keys from. And I'm now going to go through and just add each of those transactions to our blocks. Then we're setting the block hash. We build our Merkle tree, we recalculate the block hash. And I'm going to create a digital signature of that block hash, so I'm going to go in and call SignData. Now what I'm doing here is I'm taking that block hash and I'm converting it from a base 64 string back into a byte array, and then from that byte array of the hash data, I'm creating a digital signature. But I'm passing in our private key, because with a digital signature, you use your private key to sign the data. Then we return that as a string by converting it to a base 64 string, and then we store it in the block. Then we go through and accept the blocks. What I'm just going to quickly do is I actually forgot to show you where it's using the authenticated hash, so I'm just going to quickly re-run this. I previously skipped over this by mistake. So, we're going to go in and calculate the block hash, so as before, we concatenate all of our data for the block header into a string, and then we concatenate that along with the root of the Merkle tree, which represents all of our transactions. Now what we're going to do here, this is the bit I forgot to show you previously, is instead of creating a SHA-256 hash, we're going to create a HMAC instead. So what we do is we pass in a byte array of that combined string, so we convert it back into a byte array, that's exactly the same as before, but we're also going to pass in our random key. And then we create the HMAC SHA-256 object, and then we call ComputeHash. So as you can see, the actual process for creating the hash is virtually identical to what we did with the initial SHA-256 hash, except we have the inclusion of this key here. Now if you remember back to what we discussed, what this means is that anyone who's going to verify our chain, so one of the people in the consortium of companies working with Globomantics, for them to verify that chain, they have to be in possession of that key. If they don't have the key, then they can't verify the chain, because then they're authenticated, or if they have the wrong key, they're not going to be able to verify the chain. And then we create the digital signature, which I've already shown you. Okay, so now we accept those blocks onto the chain and then we call VerifyChain, so exactly the same as before. We come in, we rebuild the Merkle tree. Now what we're going to do now is we're going to validate wherever the digital signature is correct. So we come into VerifyBlock in our KeyStore, then we call VerifySignature. Now what we're doing is we're passing in our blockHash, which we're converting back into a byte array. And we're passing in the signature. So this is the signature that we want to create, and this is the block hash, and we're going to check if they're valid. So in this case, when we look at what gets passed back, it's true, so they are valid. So now if we recalculate the block hash, so this is our new block hash, and we're then going to verify if that digital signature is valid and if that matches the previous digital signature. So in this case, yes it does, because we haven't tampered with any data. So we know that that's all valid. So I'm not going to go through all of those now. Okay, so as before, we're going to take transaction 5 and I'm going to mess around with the claim number. So now let's look at what happens when we verify the chain. So let's start off with the head block, recalculate the Merkle tree, check that the signature is valid before the initial block hash so you know that it is, so that's good. We're then going to recalculate the block hash again and regenerate this signature and check the IsValid. Now because it's the first block, we know we haven't touched the first four transactions, so we know this one's okay, so we know that this is valid, and that isValid is true. So that is good. Now let's go and look at the next block. And this is where we know that we're going to have the issue. So we rebuild the Merkle tree, and we validate this signature for the previously stored digital signature, and we know that that's true. So the original digital signature matches the block hash that was committed into the block. So now let's recalculate the block hash, because we know that we've changed some of the transaction data, so we're expecting this block hash to be different. So this starts NxIN, as we know that the previous one starts qczZ. So now if we verify the digital signature for this newly created block hash against the signature that's already stored in the block, we know that that is false. But also if we check that the block hash is wherever they match, we know that they don't, so that's false as well. So really what I could have done at this point is because we know this is false, we could have just immediately gone and started looking at the next block. So again, let's just skip down here, so we can see that the blockchain integrity is not intact, so we can see down here that from the second block down we've started getting invalid digital signatures and we've started failing verification. So if we just look at what we've just done for this final demo, so we've extended it so as the concept of a transaction pool, which essentially in our case is just a simple in memory queue. We've added hash message authentication codes into our example, and to calculate those hash message authentication codes, we need a key. And we've also incorporated digital signatures into the block header as well.
-
Summary
Okay, we've covered a lot in this module. I hope you are excited about the possibilities. Let's quickly recap what we've achieved so far. We started off by implementing a basic blockchain where each block contains one transaction. We calculated the block hash by hashing the transaction data, and also the block number, creation dates, and previous block hash. We then formed the blocks into a chain by including the block's previous block hash into the block hash calculation for the next block. This means we have mathematically links those blocks together. We also implemented a simple blockchain verification mechanism, so that we can detect tampering of the blockchain data. Next we upgraded to solution by including multiple transactions per block. To achieve this, we used a Merkle tree data structure that efficiently allows this to collapse multiple transactions down into a single hash. Then we upgraded our blockchain solution to incorporate authentication and non-repudiation into the block construction process. To add an authentication support, we swapped out our SHA-256 function for a HMAC based on the SHA-256 algorithm, where we need to provide a secret key. Then we added in non-repudiation support by using a digital signature. This gives us the property that once a block has been placed onto the blockchain, we can then prove who created that block by the fact that they would have to know their private key in the first place to create a digital signature. Finally, we talked about block versioning by stating that whatever mechanism you use to create a block, has to be maintained for the life of those blocks. This means you have to use the same encryption keys to verify those blocks, even if you want to rotate the keys over time. Also, if you fix any bugs in the block creation process, you need to keep those bugs in place for the already existing blocks, and then use the fix for new blocks only. This can add greatly to any code complexity if you find defects, but it is something you need to keep in mind. In the next module, we'll start to extend our block chain solution further by incorporating a technique known as proof of work.
-
Applying Proof of Work
Overview
Hi, my name is Stephen Haunts, and welcome back to my course, Blockchain Principles and Practices. In the last module, we built up a working blockchain example that allows us to have multiple transactions per block. I did identify a problem with this, that if someone was to modify a transaction in the block, it would be quite easy to recalculate all of the block hashes for the chain. This ruins the point of us having an immutable ledger, as we can in fact go and change data in the blockchain. It's just inefficient to do so. So in this module, we are going to add to our blockchain implementation by making it immutable, we're going to do this using a technique called proof of work. This is a mathematical solution making it infeasible to recalculate the blocks once they are committed into the blockchain. First, though, we are going to look at a problem called the Byzantines Generals' Problem. So without further ado, let's look at how to make the blockchain more immutable.
-
Byzantine Generals' Problem
The Byzantine Generals' Problem was first described by Leslie Lamport, Robert Shostak, and Marshall Pease in a 1982 paper, The Byzantine Generals' Problem. The problem states that you have a group of generals who each command a unit in the Byzantine army. For our example, let's assume we have two armies on opposite hills and a city in a valley. The armies both want to launch an attack on the city, but there is a problem. To do this successfully, they will have to attack at the same time; otherwise, the city's defenses could overpower a single army in the attack. To coordinate the attack, the generals have to send messages to each other on each hill. The first general from army A will send a message to army B saying we will attack at 5 a.m. on Tuesday. Then, army B will send a reply to army A, confirming that they have received the message. This sounds fine in theory, but what happens if guards in the city capture and imprison the messenger, and then send a spy who delivers a different message, which is attack at noon on Friday, there is no need to send a reply. This is nearly four days later. It has been said that this problem has no real true solution, even when we factor in computers and modern communications for stopping message intercepting. Computers can communicate reliably, but there is never 100% degree of certainty that computers communicate reliably. We can make our communications very reliable, but never totally foolproof. In the example we just looked at with the armies, it was intended for the recipient of the first message, army B, to send a response to confirm the original attack days. But what happens if the messenger with that reply gets caught, and the message is modified? You could request an additional acknowledgement response, but that messenger could get caught, so you end up in this loop of waiting for responses. There are many practical solutions to this problem which are good enough for real-world communications. When Satoshi Nakamoto first described bitcoin, he wasn't claiming to completely solve the problem. But he proposed a method that makes it very hard for the city to scupper the plans of the generals and their armies by changing their message. The message from the generals of the other armies has the same bearing as additional currency or a transaction on a blockchain. Instead of a message like, attack at 6 a.m. on Tuesday, the message could be, Pay Keith 200 pounds, or settle a claim for Globomantics for a vehicle at $4000. The key thing here is that once a message has been sent or a transaction has been put onto the blockchain, it can't then be changed after the fact. Let's look at how to solve this for the generals based on the same technique that we're going to use in our blockchain implementation.
-
Solving the Problem with Hashing
First, they will take their message of attack at 6 a.m. on Tuesday, and I will append some additional to the end of the message. This piece of extra data is called a nonce. So now the message looks like attack at 6 a.m. Tuesday 3FrgH542dFe. The generals then agree to set a message policy where the hash of that complete message must result in a hash that starts with six 0s. If the hash doesn't start with that number of 0s, then the message is fake. The nonce we add to the end of the message is what determines how many 0s a hash starts with. Now, example case, a nonce is 3FrgB542dFe. Treat that nonce as a number. They didn't just guess that nonce, they had to calculate it. They start a nonce at 0, and they keep adding 1 to it until the resulting hash has the desired number of 0s at the front of it. This means in our example the final hash looks like what you can see on the screen now, so we hashed a message, attack at 6 a.m. Tuesday without a long nonce, and the resulting hash code starts with six 0s, which you can see in green on the screen now. This takes time to calculate; it's not an instant operation. When they first start out, the nonce would have been 0, so the hash operation might rule out what you can see on the screen now, by instead of starting with six 0s, it starts with 384723. So they add 1 to the nonce and try again, and in the example on the screen now you can see that that resulting hash has one 0 at the start of this. That's not suitable, so they keep going. They keep doing this until they reach a desired goal. So again, as you can see on the screen, we have the attack at 6 a.m. Tuesday with the long number appended to it, and then the resulting hash has the six 0s at the beginning, so this is the correct solution to the puzzle that we're looking for. To find the correct nonce for the hash requires a lot of computing power, as you have to keep incrementing the nonce by 1 each time until you find the solution to the hash puzzle, which in this case is six 0s at the beginning of the hash. If you did three 0s, then the problem is significantly easier to recalculate. If you did ten 0s, then the problem is considerably more complicated to calculate. So let's go back to our generals. They write the message and then they try to get their communication experts start calculating the hash. This takes six hours to figure out, because they have to try billions of combinations. Once they have found the correct nonce, they send the message over to the other generals and their waiting armies. To validate that the message is correct, the general merely calculates the message again with the given nonce and checks the number of 0s. If it is six then the message is genuine, if it is not six then the message is not valid. If the message were intercepted and changed, the person changing a message would have to go through the same process of calculating the nonce to get the hash that starts with six 0s. Once the second general has received a message, if they wish to send a reply back to the other general, they have to go through exactly the same process of finding a correct nonce to give the required number of 0s at the start of the hash. If you look at the properties that we have here, the nonce is time-consuming to calculate for the message sender, but a message is straightforward to verify the recipients. This technique is known as proof of work. This means the message sender proves their authenticity by spending a lot of effort doing the work to calculate the nonce to solve the hashing puzzle. Let's go back to the idea of the town intercepting the message. Let's assume that the city is now familiar with the technique being used to hash a message. So to help protect themselves, they buy a massive super computer. They then catch a messenger and have the computing power to change the message and recalculate the nonce, but faster than the generals of the armies. It may take the generals 10 hours to calculate the correct hash, but the town can now do it in 1 hour with this powerful super computer. Now the general of the other army is getting conflicting messages, but when they verify the hashes, they all work out correctly. Is their plan wholly flawed, or is there something else that we can do?
-
Strength in Numbers
Let's modify the example a bit. Instead of only two generals in their armies, we have many generals and more cities. The overall goal is the same for the generals. They want to get their message to the other generals without fear of them being modified. The cities want to scupper these plans just as they did before. What you're about to look at is the fundamental idea that Satoshi Nakamoto provides with Bitcoin. The generals all combine their messages into a single block. They then calculate a hash for this block using the idea of the nonce to compute the hash with a fixed number of 0s at the beginning of it. If you remember back to what we discussed earlier in the course, this is the same. A single block of multiple transactions, but we are going to write a step into the block hash generation process. I will cover this in more detail later in the module. First of all, back to the generals. Previously we were calculating the hash puzzle, so that we had six 0s at the beginning of the hash. In this new scheme, the generals increase this to 10. This means it takes a significant amount longer to calculate the hash. Each army is from a different country, and they each have their own various computing resources available to them. Each of the armies received all of the messages for the block, and they each race to calculate the hash and find the nonce that allows them to solve the hash puzzle. In cryptocurrency terminology, this race to solve the hash puzzle is called mining. It only takes one army to find the correct nonce. And once I have found it, they share it with their allies in the other army straightaway. From the perspective of the cities, the idea of outperforming the combined efforts of the armies seems impossible, as there is so much computing power looking for the mined hashes. This could be possible if only a few armies were competing to find the block hash, and the city has purchased significant computing power and may have teamed up. This in Bitcoin terms is referred to as the 51% attack. It works when there are different cities that have the same amount of computing power or more as the armies combined. But if you, say, have 20 armies, this starts becoming feasible. This is why we have a public blockchain like Bitcoin or Ethereum. The more nodes you are trying to calculate the block hash, the more secure it is. Imagine 5000 nodes of miners working to calculate and validate blocks. As you can see, it is not impossible to defeat the system, but it has been significantly harder with strength in numbers and computing power. This proof of work technique does have its problems, though, and that by its very nature of solving this complex hashing puzzle, it consumes a considerable amount of both computing power and electricity to power these computers.
-
Preventing Block Tampering
Earlier in the course, we discussed that even after transactions were placed into a block and the block hash is calculated, it's still possible for someone to modify a transaction in the block, as it is easy to recalculate all of the block hashes for the chain. This is the point of us having an immutable ledger, as we can in fact go and change data in the blockchain. It's just very inefficient to do so. Using the same techniques we discussed with our generals safely communicating, we can modify our existing blockchain solution, so that once a block has been confirmed, we can make it infeasible to change that data in the blockchain. To do this, we are going to extend the blockchain calculation process to add in the concepts of incrementing nonce to calculate a hash that has a predetermined set of 0s at the beginning. Once this hash puzzle has been solved, we then link the blocks together in precisely the same way as before. The good thing of this is, the longer the blockchain gets, the more secure it becomes. If it took around 10 minutes to calculate a block hash for each block, and we have 5000 blocks, and we modify the data in the first block, we'd have to recalculate all of the block hashes for every single block that follows it. So for 5000 blocks at 10 minutes per block, that will be 833 hours of effort, or only 35 days to recalculate all of the block hashes. These days to block mining processes performed on high-powered graphics hardware or custom ASIC processes, they can calculate billions of hashes per second. In our following example, we might be using custom hardware, but instead the code will be written in standard C# and will be more CPU bound, but this code is more to show the general principles. Let's take a look before we plug this into our existing blockchain implementation.
-
Proof of Work Demo
In this demo, we're going to look for some code that will perform the block hash calculation process that our generals used. But we'll have a message that we want to hash, and we'll keep repeating the hashing process and increment in the nonce until we satisfy the difficulty level set for the hashing puzzle, which is a predetermined number of 0s at the beginning of the hash code. Let's take a look at the code now. What you can see on the screen now is a very simple example program that illustrates proof of work. In this project, we have a class called ProofOfWork in the ProofOfWork.cs file, which we'll take a look at in a moment. In program.cs, you can see an example of where we're using this ProofOfWork class. So in this case, we have six instances set up, and in those instances we pass in two pieces of data. The first is the information that we want to hash, and the second is the level of complexity we want for the hashing puzzle. So in our case here on this first line, we pass in the string Mary had a little lamb, which is the data we want to hash, and then on the first example, we have a hashing complexity of 0. So by 0, this is effectively the same as saying we're not doing any proof of work, because we don't care how many 0s are at the beginning of the hash. On the next line, we pass in the same message, Mary had a little lamb, and the complexity is set to 1, so this means we want one 0 at the beginning of the hash puzzle. And so on back down to the bottom one where we have, again, the same message, Mary had a little lamb, but the hashing complexity of 6, which means we want six 0s at the beginning of the hash. So before we go in and take a look at the content of the ProofOfWork class, let's just quickly run this and see what happens. So as you can see, the application is now running. For each level you can see the number of the nonce that was created to find the correct hash, and the resulting hash. So on the first line, we have difficulty level 0, which means there is no 0s at the beginning of the hash. Difficulty level 1, the nonce value is 24, and that resulted in us having a hash with one 0 at the beginning. Then for level 2, the nonce went to 9478, and this gave us a hash which had two 0s at the beginning. Then for level 3, we resulted in a nonce which got to 93,521, and this resulted in a hash with three 0s at the beginning. And then for level 4, we had a hash which was 2,286,428, and this resulted in a hash which has four 0s at the beginning. As you can see, the timings get gradually longer, so for level 4, we ended up taking just over 3 seconds to calculate that hash. Now it's still running at the moment and it's trying to get to level 5. So when I've reached it, it seems to take about 5 or 6 minutes, so I'm not going to leave it running now, but I just wanted to run the application just to show you what it's doing. Okay, so let's stop the application and go back. Now let's look at the ProofOfWork.cs class. So in this class, we are storing three pieces of information, so we have the data that we want to hash, and we're storing that as a string. We have the difficulty level, which is the number of 0s we want at the beginning of the hash, and then we have the nonce value, which starts at 0. So we go into the constructor, we store the data to hash, and we store the difficulty. Then we have a public method called CalculateProofOfWork. Now what we're doing in here is I'm setting up some stopwatches, and this is what we're using to get the timing values, so for us to actually debug through this manually, all the timings would be skewed, because obviously we're breaking in the debugger. But what we're doing here is we're going to awhile truly, so we're just going to keep iterating around until we solve the puzzle. So first of all, what we want to do is we want to hash data. So what we are doing here is we are getting the nonce and the data we want to hash, and we are concatenating them together. Then I get the result of this, and I convert it into a byte array, and I then compute the hash. And once I have done this, I convert the hash back into a base 64 string. And what I do is I then check to see wherever the hash starts with the desired number of 0s, so if we are looking for a difficulty level of 3, we are checking to see whether the hash starts with three 0s. If it doesn't, then we add 1 onto the nonce, and then we loop back around and we try again, and we keep on going until that if condition is satisfied, at which point I stop the stopwatch, calculate the elapsed time, and then return the hash data. So as you can see, it's a fairly trivial piece of code, it's not very complicated what it's doing. In a real-world scenario where you're doing proof of work, you would be using high-end GPUs and ASIC mining rigs to calculate this, and that's not the purpose of this particular example. Example is CPU bound, but it's just to illustrate the purpose of what's going on. So now that we've taken a very quick look at how the proof of work process works, let's look to now integrate this into our blockchain example.
-
Integrating Proof of Work
Now that we have taken a look at the broad concepts of how a proof of work calculation operates, let's plug it into our existing blockchain scenario. On the screen at the moment, you can see what we've built so far. We can include multiple transactions in our blockchain using the Merkle tree structure. The root of the Merkle tree contains a single hash that represents all of the hash transactions in our block. To create the final block hash, you take the hash at the root of the Merkle tree, and we hash it along with the block number, the block creation date, and the previous block hash. This creates our master hash of that block, and it's also the hash we pass into our next block to use as the previous block hash. We also added an extra layer of security to our block hash by using authenticated hashes and digital signatures to sign the authenticity of the block hash. Luckily none of this changes when we want to incorporate proof of work. The actual work we have to do is straightforward, and this is represented with the amended diagram you can see on the screen now. As before, we calculate the initial block hash with our authenticated hashing function and digitally sign it. Before we store the resulting hash, we insert our additional step, which is to calculate the proof of work for that hash. To do this, we need to specify the level of complexity we want, say, three 0s. And we then pass the block hash into the proof of work function. This will iterate through, adding 1 to the nonce on each pass until the resulting hash contains three 0s at the beginning. Once this has been satisfied, we write the resulting hash into that block hash field. Let's remind ourselves of why this is important. Earlier in the course, we were in a situation where if we had, say, 1000 blocks in our chain, and we modified a transaction in an earlier block, we could then go and recalculate each block hash in turn along the chains to maintain the blockchain integrity. This was possible, but it wasn't efficient. What we have done here by adding the proof of work process after the initial block hash calculation makes it much slower to recalculate the block hash. Before we take a look at a demo, I just want to show you a screenshot of a sample proof of work calculation. This illustrates timings at different levels of complexity. At level 0, this is the same as not having any proof of work at all, it's just a straightforward hash. Level 1 has a nonce count of 24. This means it took 24 iterations to get a hash that started with 0. Level 2 took 9478 iterations to find a correct nonce. Again, at this level, the timings are still nominal to find a hash with two 0s at the start. Level 3 took 93,521 iterations in 12 ms. While still a short timeframe, this took significantly more time to calculate. Level 4 took 2,286,428 iterations and took 3 seconds. But it's still quite short to get the hash with four 0s at the front. Level 5 took 380,372,972 iterations and 8 minutes 25 seconds to calculate a hash with five 0s at the start. This is much more significant. Level 6 I never got to, as I shut down the laptop after 12 hours. As I mentioned before, this is running CPU bound as a C# application. In real mining rigs, they are set up to use high-end GPUs or custom ASIC hardware, so these timings will be much shorter. But the point here is to show how the level of complexity increases with the more 0s you want to add to the beginning of the hash.
-
Integrating Proof of Work Demo
Let's now integrate the proof of work algorithm into our code base from earlier in the course. As you've seen already, the changes to make this happen are quite simple from the theoretical point of view. So let's now take a look at the code base and see this working. So first of all in our example, I just want to start by looking in the Block.cs file, and this contains our block class. Now this is virtually identical to what we saw earlier in the course, except for a few small differences. First of all, we are storing the difficulty level, which again is the number which represents the number of 0s we want at the start of the hash, and we are storing an integer for our nonce. Now the other difference is we also have the inclusion of the method here, CalculateProofOfWork. This is the same as what we saw earlier in this module, when we looked at the proof of work demo. So again, we're iterating around appending 1 onto the nonce every time we try and calculate the hash to find the hash of the correct solution to the puzzle. So here you can see we have the blockHash, which is the head of the Merkle tree with the nonce appended to it. So if we create the hash of that, and we then check to see wherever that hash contains the correct number of 0s. So that's really the only sort of real difference that we've done here. So after we calculate the Merkle tree, we then go and calculate the proof of work. So if we look at this in action in program.cs, So what we do is we first of all start off by setting up some transactions and the KeyStore, just the same as what we did before. We then create our four blocks that we want for this example, and as you can see here, the third parameter on the constructor is our level of difficulty, so we're setting that to 3. We then add a whole bunch of transactions to our blocks for the demo, and we accept those blocks onto the chain, and we then verify the chain. So at that point, we should have a completely intact blockchain where every single block verifies correctly. And then as we did in previous demos, we're then going to go mess around with one of the transactions and we're then going to re-verify the chain. So what we'll do first is just execute this and show you it working. Okay, so here's the results of it running, so at the top here, you can see that we have created four blocks, and you can see the timings it took to calculate the hashing puzzle. Now for the purpose of this demo, I've kept the hashing puzzle nice and simple. I've set it to three 0s that we want to find. And then you can see how we've verified that each block is intact. Then you can see we've modified one of the transactions in the second block this time, so block the very first block 0 is verified correctly, and then you can see that blocks 1, 2, and 3 have failed verification and have invalid digital signatures because we've gone and messed with that block. So let's go back and we'll just quickly step through this to illustrate the point. So first of all, we just want to set up our transactions, and this is exactly the same as what we have done in previous demos. We then add those transactions into our transaction pool. We then create our KeyStore, which contains the keys for our authenticated hash and our digital signature. I then create the four blocks, and then we just want to go through and add all those transactions into our blocks. So that's transactions for block 1, block 2, block 3, and block 4. And then we call SetBlockHash. So I'm just going to go step into one of these, and this again, it's the same as what we had in previous demos, except instead of calling our original CalculateBlockHash method, we now call our CalculateProofOfWork. And we end up with a block hash, which then gets stored into our block. Now, as you can see here, that block hash has three 0s at the beginning of it. That's because we set the difficulty level to be 3. So if I set that to 4 or 5, then we'll see that'll take a lot longer to calculate. But just for the purposes of the demo, I've kept it nice and simple. And we then calculate the digital signature and store it. Let me do that for the other blocks. We then accept those blocks into our chain. And then we verify the chain. So as you can see up to this point, we have four blocks and each block in that chain passes verification. So what I suggest you do is get the source code for this module and just have a step through it yourself and have a bit of a play around with it. So we have covered quite a lot that we built up over the last few modules, and it'll be a good exercise for yourself just to repeat the tests that we've done here and sort of step through and just see how all this works together. So let's now go and wrap up what we've learned in this module so far.
-
Summary
Earlier in the course, we built up a working blockchain example that allowed us to have multiple transactions per block. But we did identify a problem with this, that if someone was to modify a transaction in an earlier block, it would be quite to recalculate all of the block hashes for that chain. This ruins the point of us having an immutable ledger, as we can, in fact, go and change data in the blockchain, which is quite inefficient to do so. This module was about looking for solutions to fix that problem. First of all we looked at the classic Byzantine Generals' Problem. That talked about the problem as sending a message from one army to the other without it being changed. We then looked at a process to solve this called proof of work. This is a process where when you write a message, you then create a hash for it, but this hash has to have some special properties in that it must start over a set amount of 0s. To calculate this hash, you need to have a number called a nonce that starts at 0, and you append this to the message you want to hash. Once you've created the hash, you check for the number of 0s at the start of the message. If their hash doesn't satisfy the requirement of a set number of 0s, you then add one to the nonce and you try again. This is designed to be computationally expensive, so that the hash cannot be easily recalculated if the message changes. But this is quite easy to verify for a third party. Proof of work is a way of mining in which blockchains like Bitcoin and Ethereum mine and create blocks. Proof of work is computationally expensive and therefore uses a lot of electricity. The people mining the blocks are rewarded if they solve the hashing puzzle. This is great for them, as they are using lots of power to create the blocks. This is a core principle for public blockchains. Private blockchains, on the other hand, can still use cryptocurrency awards for mining, but if you are limiting access to a new few known entities to mine and verify blocks, then making them base of an cryptocurrency is not necessarily required. Proof of work is still the main way of mining blocks, but there is a new technique called proof of stake, which aims to address the issue computational complexity and the huge energy requirements needed by proof of work. Instead of solving a complex hashing puzzle, the proof of stake concept states that a person can mine or validate block transactions according to how many coins he or she owns or holds. This means that the more Bitcoin or alternative coins owned by the miner, the more mining power he or she has in the block creation process. This is looking to be adopted in the future by a blockchain such as Ethereum. In the next module, we'll take a look at how blockchains maintain consensus when there are many nodes, each with their own copies of the blockchain, or competing to create blocks.
-
Maintaining Consensus
Overview
Hi, my name is Stephen Haunts, and welcome back to my course, Blockchain Principles and Practices. Earlier in the course, we upgraded our blockchain sample application to our proof of work, which enables our blockchain to be immutable. By this we mean that once a block has been confirmed, it's computationally infeasible to change the data in the blockchain. What I want to do in this module is talk a little bit about how the blockchain maintains consensus. During the course, we talked about how a public blockchain like bitcoin or Ethereum each has many nodes on the network with their own copies of the blockchain that they can verify. Those nodes are responsible for trying to create blocks with multiple transactions. Each of these nodes competes to solve the proof of work hashing puzzle, and when a node does solve it, it has to let all the other nodes on the network be aware, so that they can add the blocks to their own blockchain. This means that each node is responsible for maintaining itself by some new blocks that it receives. So the goal here is to try and maintain the single, unambiguous chain of transactions across all of the nodes in the blockchain network. The act of trying to maintain the same blockchain independently is what we refer to as keeping consensus. Let's now dive in and look at what is posed as a challenge.
-
The Challenge with Consensus
Each node in a blockchain network works in two states. The node is either verifying blocks that were created by our peers, or working to solve the proof of work hashing puzzle to create a new block for a series of transactions. Each node is independent of each other, so they are not synchronized in any way. They each react to their inputs accordingly. Due to network latencies of blocks being sent around and transactions being sent to the nodes, each node will not necessarily have identical information to work with. Messages could get lost in transit, or arrive in different orders. Because of this, the two modes that a blockchain node operates in will be in different states per node. Because of this, maintaining a consistent blockchain transaction history between nodes is quite challenging. This means the challenge is to get one unambiguous history of transaction data, even though we have message delivery delays and reliability issues between nodes. We want to keep the nodes independent of each other without resorting to a centralized solution. As the blockchain nodes are entirely independent, they are therefore responsible for coming to an agreement of which version of the transaction history is selected by themselves. The premise for how they choose transaction history is based on how new blocks are picked, added to the block and protected from manipulation is in the proof of work algorithm that we discussed earlier in the course. The proof of work algorithmic process makes a block creation process computationally very expensive, and efforts to manipulate or change transactions in a blockchain even more computationally expensive. This means that the amount of total computational efforts spent creating blocks is a good criteria for selecting the transaction history, should we get a conflict when adding blocks into the chain. If all of the participating nodes follow the same selection criteria for the transaction history, nodes will all eventually agree on an identical version of the transaction history. By using the idea of the amount of computational effort to select the history, this leads us to a strategy to pick the correct transaction history. This method is referred to as picking the longest chain.
-
Longest Chain
The longest chain solution revolves around the idea that a chain or fork in a blockchain that consists of the most blocks will represent the most combined computational efforts with calculating the proof of work hashing puzzle for the block. Let's walk through this as an example. First, let's start with a case where all of the nodes on a network have an initial blockchain agree that consist of two blocks. These are represented by the two boxes that you can see on the screen now. Each box is a block in the blockchain. As we have seen earlier in this course, the makeup of a block is quite complicated, as we have multiple transactions and a block header, but if the examples in this module would just assume those and keep the diagram simple. The letters in a block boxes represent our block hashes. I've kept them as simple letters here just to keep the diagrams easy to understand. The box on the top has a block hash of B. And the block underneath has a block hash of A. For our example, all nodes in the network have this same blockchain of two blocks. As new transactions arrive at each node in the network, they each race to solve the proof of work hashing puzzle that we looked at earlier in the course. The blockchain you can see on the screen now shows the chain after a node has won the race to create the next block and sends that block to all of the other nodes. This is block C. This is the best-case scenario where the race to solve the hashing puzzle is won in all of the nodes except the new block. In this case, all of the nodes have a consensus on the blockchain. What happens, though, if the winning block is delayed getting through to all the other nodes? In that time, another node finds a winning block hash, which then starts to distribute to the other nodes. Once a block is finished propagating, we could end up with what you see in the blockchain on the screen now. We now have a fork in the blockchain where both blocks C and D are linked to block B. In fact, this is more like a block tree than a blockchain. At the moment, we cannot reach a full consensus as the notion of picking the longest chain is impossible, because both branches have the same length. The branches in play here are A, B, and C, and A, B, and D. Now we have this initial fork in the blockchain. A node can decide for themselves which branch to extend. They could try to build the next block on top of block C, or they could do it on top of block D. At this stage, there is no wrong answer, both are valid. So the nodes go off and all try to create new blocks. At around the same time, two nodes of the hashing puzzle will send out their winning blocks. In this case, both of the nodes have built their block on top of block C. So we have what you can see on the screen now. In this case, we can already roll out the chain that is A, B, D. This is the shortest chain in the tree, so it can go. We'll talk about what happens to these blocked transactions in a bit. In this state, though, we still do not have a clear winner for the longest chain, as we have the chains A, B, C, and E, along with the chains A, B, C, and F. The loads again all go off and race for the construction of the next block. The nodes can either construct the next block so that it builds off block E, or off block F. Eventually one of the nodes solves the hashing puzzle, and sends a block to all of the nodes on the blockchain network. This winning node, G, was built on top of node F. This means we end up with a blockchain that we can see on the screen now. Can we apply the longest chain process now? Yes, we can. The chain A, B, C, F, and G is five blocks long, whilst A, B, C, E is only four blocks long. So the branch that is five blocks long wins. Every node is performing this same role, so we end up with consensus, and the nodes end up with a chain like you can see on the screen. Conceptually this is quite easy to understand. Over time we are continually reevaluating the chains to apply the longest chain role, which means we achieve eventual consistency across our nodes in reaching consensus. In the example we've just walked through, we eliminated two blocks, D and E. Each of these blocks contains any number of transactions which, if it was a currency like Bitcoin or Ethereum represents actual transactions, we can't just delete them, as we will be losing details of people's transactions. When we apply these back to Globomantics in our fictional insurance company, by limiting blocks D and E, we could have been deleting the details of hundreds of insurance claim settlements. Let's now go through some of the consequences of selecting the longest chain in this manner.
-
Longest Chain Consequences
As we have just mentioned, applying the longest chain process to the branches of the tree to set the common authoritative chain between the nodes does have some side effects. These are orphaned blocks and reclaiming mining rewards. When you have multiple nodes competing to mine blocks of transactions by solving the proof of work puzzle, you end up with forks in a blockchain as we have already covered. This means our blockchain structure is more like a block tree. As the nodes perform the longest node selection criteria over the tree, we end up deciding as to the authoritative branch to use. This means that the blocks that lose out in the selection are orphaned from the blockchain. We saw this in our earlier example, as blocks D and E were orphans. This means the transactions contained within those blocks no longer have a home in the blockchain. Now that these transactions have lost their chance of being part of the blockchain, they need to be given another chance. What happens now as those transactions are placed back into the port of transactions for that node, and they are given the opportunity to be mined again as part of another block. This means that transactions that once appeared as though they were added to the blockchain suddenly seem to disappear from existing for awhile until they are re-added into the blockchain as part of a new block. Let's now relate this to our example of Globomantics, and the record of insurance claims that were orphaned as part of blocks D and E. If we assumed our blocks only contained one transaction for simplicity, then as you can see on the screen now, the claim settlements for claim numbers 1, 2, 3, 4, 5, and 6, 7, 8, 9, 0 would no longer be confirmed on the blockchain, and they have to go back into the port of transactions to be processed into a future block. This means we have a system that is eventually consistent. This means that even though transactions have been sent to the nodes for inclusion in a block, it is not in this instance. You have to wait for a node to solve the proof of work puzzle and the blocks to be added, with the risk that if the block ends up in a fork, it could get orphaned in the future, and reprocessed. But the transactions will make it onto the blockchain eventually. In our example throughout this course with Globomantics, we have gone under the assumption that it will be using a private blockchain, so that we could focus on explaining the data structures and algorithms used to create a blockchain. If Globomantics were using a more public blockchain system, such as Ethereum, then the miners of the blocks will receive a crypto- cash reward for successfully solving the proof of work puzzle. They get rewarded because they are expending a lot of time and computational power to solve the hashing puzzle, so it's only fair they should get a reward. If the block a miner creates is subsequently orphaned from the blockchain, then that block becomes useless. If this is the case, and the mining reward that was paid out to that miner is reclaimed, but they have another chance at trying to mine the next block.
-
Summary
In each module we discussed that in a scenario where you have multiple nodes on a blockchain network competing to create blocks for the blockchain, each of these nodes is operating independently of each other to maintain their copy of the blockchain. Nodes are competing against each other to mine blocks by solving the proof of work hashing puzzle. When the node wins a race to create a block, they let all the other nodes on the network know about this new block. In an ideal scenario, their block is added to the blockchain and the node goes on with trying to mine the next block. You can get into a scenario where two or more nodes mine the block at nearly the same time, and due to network latency we've propagating a block over nodes, multiple nodes get added onto the blockchain, creating a fork in the chain. This is not our ideal, as you then have different nodes trying to mine new blocks attached to different forks in the chain. That means each node has to apply a set of roles to the blockchain to try and ensure multiple forks are not allowed and that a definitive chain is created. A most common technique for doing this is called the longest chain criteria. This is where once a fork grows a little, the longest chain in the fork will be the one that survives. This does have side effects in that you end up with orphaned blocks. Your transactions no longer live on the blockchain. This means those transactions are added back onto the pool of transactions for that node, and they are reprocessed back into a new block in the future. This means the processing and adding of transactions onto a blockchain is what is known as eventually consistent. If you are operating a public blockchain like Ethereum, then the miner of those orphaned blocks are now being paid a small cash reward for creating the block. If they end up as one of your orphaned blocks, then their reward is reclaimed. We have now reached the end of the main content for this course, and the final module will summarize the key details that you have learned in this course.
-
Course Summary
Summary
Hi, my name is Stephen Haunts. Congratulations, you have now reached the end of this course, Blockchain Principles and Practices. In this module, I'm going to summarize what we have learned in this course. Then I will end with some additional resources you can turn to for more information. The main aim of this course is to teach architects and developers about the theory of how a blockchain works. I believe it is important for technical people to have good grasp of the technology and how it works under the covers. This could be because you want to implement your own blockchain solution. Or it could be that you're going to implement a third party solution like Ethereum, but just want a good theoretical background of how technologies work first. If either of these were the case, then this course is for you. Let's start our summary.
-
Thinking About Trust
We started out by discussing what a blockchain is by looking at some definitions. First was by Don and Alex Tapscott, who are the authors of the book, Blockchain Revolution." Blockchain is "an incorruptible digital ledger of economic transactions that can be programmed to record not just financial transactions but virtually everything of value." We looked at a standard definition from Wikipedia. "Blockchain is a continuously growing list of records, called blocks, which are linked and secured using cryptography." These are two perfect definitions, and highlighting certain words helps us with our mental model. First of all, we have digital ledger. I know this is ledger like an accounting system, but it's backed up with the word transactions. Next we have "a continuously growing list of records." We know this is a ledger of transactions that is ever growing. Hopefully this is forming a good picture in your head. Then we have the word incorruptible, so we have a continuously growing ledger of transactions that cannot be corrupted. Next we can also see that these transactions are linked together in some way using cryptography. So we can see that our blockchain is corruptible, a digital ledger, it's continuously growing list of records, linked and secured using cryptography. These simple definitions go a long way to covering what a blockchain is. Of course these are quite simplistic, as there is much more to it, but fundamentally we have a good idea. But why do we need this? Well, this comes down to just one word. Trust. One of the big benefits of blockchains is to get trust on the internet by using decentralization. We explored how traditional banking in business revolves around having a central authority. Blockchain is a great way to decentralize these companies so that they are policed and monitored by the participants in the blockchain instead of that central authority.
-
Public vs. Private Blockchain
We also spent some time looking at different types of blockchains in use today. The preferred type of blockchain that offers the best form of trust and decentralization is actually the public blockchain. In a public blockchain, anyone can write data to it. Each node on the network participates in that blockchain and can be involved in creating and verifying blocks. It offers the best level of security and trust between peers. Happening later are these blockchains and the fact that anyone can participate has made some companies cautious about using them in this form, which is giving rise to a different type of blockchain called a private blockchain. There's been a fair bit of controversy about the existence of private blockchains. Chain purists, Bitcoin advocates, and online activists maintain that private blockchains are not needed, and they don't offer the full anonymity and openness of a public blockchain. Members of different organizations and industries, like financial services and healthcare, to name a few, disagree. See the benefits in maintaining an immutable ledger. So exactly how does a private blockchain differ to a public one? We have a private blockchain, a company that owns it also decides who can read the blockchain transactions or has the ability to verify them. That means that they have control over the privacy of the data that is recorded onto that blockchain. This is very important in regulating the industry, such as financial services and healthcare, who have very strict rules that they have to adhere to around how visible certain data is.
-
Storing Transactions in Blocks
We started off our look at blockchains by understanding how we could process just one single transaction in a block first. With the sample application code for this course is being written to support .NET Standard 2.0 and .NET Core 2.0. It means that you can execute the samples on Windows, MacOS, and Linux, using IDEE such as for Visual Studio on Windows, Visual Studio for Mac, Visual Studio Code, and JetBrains Rider. Then, of course, build among the examples from the command-line using a .NET terminal application. We went through our sample application. We based it around a fictional insurance company called Globomantics. Also there are digital transformation. They want to use the blockchain to record details of settlement payments to their claimants. Globomantics wants to roll this out in their Motor Vehicle Division first. Their implementation will include recording payments to claimants for settlement for vehicles that are being written off. This could either from an accident where the car is beyond economical repair, or the car has been stolen. You can see on the screen the type of data that I want to record in a transaction for a block to represent one of their insurance claims. The claim number, a settlement amount, a settlement date, a car registration plate, mileage, and the claim type. This allowed us to represent an example for a blockchain that sits outside the standard cryptocurrency transfer of value. Block contains two sections. Diagram explored in the course, we had a transaction details at the top of the screen containing the insurance claim details, and then a block header details below that. Block header consists of the block number, the block hash, which consists of a SHA-256 hash of the transaction details, block number, creation date, and previous block hash from the header. Finally, we have a reference in the next block in the chain. First block is called the genesis block, and the previous block hash will be null, or empty, as there is nothing before it. The next field refers to the next block in the chain. That next block points back to its parents with a previous block hash field. Traditional computer science is very similar in theory to a double linked list. We then go on and try the next block by passing across a newly created block hash and comparing it against a previous hash that's stored in that block. We can do this by recalculating that block hash, incorporating the previous block hash. Then if the newly hash matches what has already been set in the block, another data is changed. If it doesn't match, then the block has been modified has lost its integrity. Keep on doing this down the chain, but if a block fails this integrity check, then all of the following blocks will also fall. So previous block hashes change in each block. In other words, a change in one block causes a cascading failure through all of the remaining blocks. Once we had looked at building a basic blockchain containing a single transaction per block, we moved onto the next level where a single block contains multiple transactions. We build it originally with a single transaction per block was perfectly fine and it works. Much more common to have multiple transactions contained within the block. The mechanics of how a block operates with the block hash being used to link the blocks together doesn't change. It's exactly the same. What you need to do is generate a single hash that represents all of the transactions in that block. You do this by generating a unique hash for your transaction back in the single transaction example, and then concatenate all these hashes together and calculate a single hash to store in the block. This will work, but there is a data structure that is tailor-made for creating hash codes from multiple data points. And that data structure is called a Merkle tree. Merkle tree is a structure that lets you create a single hash from many hashes. In the example we can see on the screen at the moment, we have four hashes at the bottom. Then they are collapsed into another layer of hashes by combining those hashes together. The result is we have a single hash representing all of the nodes in the tree. Let's now incorporate this into the blockchain. The mechanism for calculating the block hash hasn't changed. You'll have a single hash representing all of the transactions in a block, because before we just had a single transaction with a single hash. The thing here is it is still just a single hash. Look on the screen to illustrate this. We have a hash on the top of the tree feeding into our SHA-256 hash function with the data from the block header, which is the block number, previous block hash, and the block creation dates. Since SHA-256 hash has been calculated, we then store the results in the block hash. As you can see, the way we create the hash of the overall block is identical. Now I have support for multiple transactions per block, which is much more efficient.
-
Proof of Work
So at this point in the course, we had a blockchain where we can have multiple transactions per block, but we had a problem where the actual blockchain wasn't truly immutable. If you change the transaction early in the chain, the chain would fail verification, which would cause a cascading failure across the chain. This current form is possible to recalculate the block hashes for the entire chain without too much difficulty. It's clearly not ideal. We introduced a mechanism called proof of work. For work designed to be expensive and difficult to calculate, but very easy to verify once it has been completed. This means that it could take, say, 10 minutes to calculate the correct block hash for the block, but once you've accepted that block onto the chain, it's very quick to verify. This means that once a proof of work has been calculated, if anyone changes the data in the transaction earlier in the blockchain, they don't have to recalculate the proof of work for that block and any blocks that follow. If, for example, we had 5000 blocks and each block took 10 minutes to calculate, this will take 830 free hours to recalculate the block hashes full of the blocks, which is around 35 days. The concept behind the proof of work puzzle is quite straightforward. Make a block hash, and you append a number to it starting at 0. You then create a SHA-256 hash with this block hash and the number. Check if that hash starts with a specific number of 0s. It doesn't, so then you add one onto this number and you rehash it. Keep on going by increasing that number by a 1, or a nonce as is it known, until you get the desired number of 0s at the beginning of the hash. Let's see an example from our sample code on the screen now where we have set a different difficulty level quite so the desired number of 0s and the time it took to calculate those hashes. As you can see, the amount of time it took to recalculate the desired hash goes up, depending on the difficulty level. This process of calculating the hash puzzle is also called mining the block.
-
Maintaining Consensus
Once we've had a working blockchain solution that have multiple transactions per block being made immutable using proof of work, we then discussed a problem of having multiple nodes on a blockchain network all competing to create blocks and how they can all maintain consensus. We talked about a protocol called the longest chain selection criteria. As blocks are created by a node, they are distributed across the network. If two nodes create a block at the same time and send this around, then we could up with a problem where we have a fork in the chain. You can see you have different nodes creating blocks off of a different parent. As a node creates this in new blocks, each node inspects the forks, and the longest fork will always win, meaning the shorter forks will be discarded. This happens and you end up with orphaned blocks, which means that included transactions are no longer part of the blockchain. In this case, those transactions are put back into the processing pool for each node, so they will get included into future blocks. They'll all get included eventually, but there won't actually be a delay. In the case of blockchains like Bitcoin and Ethereum, the miners would have been paid a mining reward for creating their blocks. If their block is ultimately orphaned, then the mining reward is reclaimed. But they have the opportunity to compete to create the next block in the chain.
-
What Next?
That concludes this course, Blockchain Principle and Practices. I hope you've enjoyed taking the course and you now have a good understanding of how the underlying blockchain data structures and algorithms work. If you're interested in cryptography, which is a fascinating subject, and if you want to understand the subjects even further, then I recommend my Pluralsight course called Practical Cryptography in .NET. Of course we'll go into a little depth about the cryptographic primitives available in .NET and how to use them altogether. If you want to understand more about encryption key managements, which we touched on very briefly in this course, then I recommend another of my courses called Play by Play: Enterprise Data Encryption in Azure Revealed. In this course, I explain how to use Microsoft Azure's key vault to protect your encryption keys. If you wish to learn more about the cryptocurrency bitcoin and how to use it, then I recommend the course Introduction to BitCoin and Decentralized Technology by Scott Driscoll. And finally, if you want to learn more about blockchains and Ethereum in particular, then I recommend the course Blockchain Fundamentals by Jan-Erik Sandberg. You can follow me on Twitter with the handle @stephenhaunts. It would be great to connect with you on there. And if you liked this course, then I'd really appreciate a tweet about it. Again, thank you very much for taking this course. I hope it has really helped you broaden your understanding of how blockchains work. Because there are lots of frameworks and platforms out there for you to use, I feel it's important to gain a good understanding of the principles for how these technologies work. If you liked this course, then please tweet about it, and also use the rating buttons at Pluralsight to give it a score out of 5. You can also hit the Follow button on Pluralsight if you'd like to be notified about any of my future courses. You've been watching Blockchain Principles and Practices. Thank you.