PBFT consensus algorithm

The Byzantine Generals Problem

In ancient times, generals from various places in Byzantium went to war. For example, there were 10 generals. They had to communicate with each other and reach a consensus before they could start together, otherwise the battle would fail.

Question 1. There is a general who will rebel. How to reach a consensus if there is a general who rebels.

Question 2. The communication channel must be secure. It is difficult to reach a consensus when communicating in an insecure channel.

It has been proved that a consensus can be reached when the betrayer is f and the total number of generals> 3f. 3f+1<=n---->(n-1)/3

Fault-tolerant node: It should be a hardware or network problem, and the node is not responding.

Malicious node: In addition to no response, it can also send wrong information to mislead others.

The number of error nodes that raft can tolerate f = (n-1)/2,

Suppose the total number of nodes is n and the error node is f. According to the consensus theorem, only one more than f can be f+1, f+f+1 = n

2f+1=n-->f = (n-1)/2 In summary, the maximum number of malicious nodes is (n-1)/2, and the system can reach a consensus.

The number of incorrect nodes that PBFT can tolerate is f = (n-1)/3,

Scenario 1. Suppose the total number of nodes is n, the malicious nodes are f, and the number of error nodes is f. After the malicious nodes are found, they will be eliminated by the system. Then, as long as there are 1 more intact nodes than the wrong nodes, f+1 That's it. Then f+f+f+1 = n-->3f+1=n ----> f=(n-1)/3
Case 2. The number of malicious nodes and error nodes is the same, the same as raft.

Basic process:

The client sends the status to the master node,
The master node broadcasts the request to other nodes, and the other nodes perform three-stage processing.
After the node processing is completed, place the order and return to the client.
After the client receives f+1 correct messages, it represents the end of the consensus.

Three-stage processing:

pre-pre, promise, commit

Message type <v,n,d,m>

v is the first few rounds of the primary node election, 1, 2. . . and many more

n is the number of the request sent by the client

d is a summary of the message content

m is the content of the message

The master node will send pre-prepare messages to other nodes after receiving the message. Start the three-stage process

The non-primary node receives the pre-prepare message and judges whether to accept it or not. The logic of not accepting, compared with the first received message, is that V and n are the same, but d and M are different, and it is not this message. Or the request number is not between the high and low water level.
If you accept the pre-prepare message, you want other nodes to broadcast and send promise messages, and other nodes do the same thing. If a node receives more than 2f+1 promise messages, it enters the commit phase.
The node will broadcast the commit message, and other nodes are doing the same operation. When 2f+1 commit messages are received, it is considered that most nodes have committed. At this time, the order will be placed and the correct message will be sent to the client.

View change

Replacing the master node, when the master node goes down, or other slave nodes think that the master node is doing evil, a view change will be triggered.

The process is divided into view-change, view-change-ack, new-view

When the master node is down, the slave node wants to send a view-change message to other nodes. The node with the smallest number will become the master node. When the new master node receives the view-change message from 2f+1 nodes, it will be considered as the same New node. Then he will send a new-view message, and then process the request that was not processed by the previous view.

Garbage collection

checkpoint and stable checkpoint and high and low water level

Checkpoint the latest request sequence number processed by the current node.

stable checkpoint Most nodes 2f+1 nodes have agreed to complete the request sequence number.

The stable checkpoint is to reduce the space occupied by the data, and its previous requests can be deleted. Each node no longer needs to store all the requests, saving the space of the node.

Node i will send <checkpoint ,n,d,i> to other nodes

n is the request sequence number of the current node.

d is a summary of the current state.

When receiving replies from 2f+1 nodes, it is recognized that most people have processed the request sequence number, and a stable checkpoint will be formed.

If other nodes that receive the i-node message have not processed the request sequence number, then no message will be returned.

High and low water level

The low water level is the last stable checkpoint, and the high water level is the low water level + L (L is the value we set).

When the processing request sequence number of node i exceeds the high water mark, it will wait, not processing it, and wait for the stable checkpoint to change before continuing processing.