The problem is raised.
It is unreliable to put all services on one machine, so the stand-alone system is changed to a distributed system; some machines in the distributed system may also hang, so there is a need for a Solutions/strategies to enable the entire distributed cluster to work normally, even if some machines will hang. Such a scheme/strategy is called consensus algorithm, the most famous consensus is Paxos, and other consensuses are mostly improved based on Paxos. In a distributed server cluster, each machine is running the same state machine, which can be simply understood as an ordinary application and is a deterministic program. Determinism means that the client is given an input, this machine Will give a definite output, and the output given by different machines is the same. In practical applications, this "input" is often an operation on the data, such as "reducing the balance by x yuan" after withdrawal. In fact, the state machine is running a sequence of operations. For example, the first operation of the sequence is "balance-100", the second operation is "balance+20", and so on. A series of operations are recorded in the log of each machine.
The fundamental requirement of distributed consistency
After the client requests a certain operation on a machine, how to make it All other machines agree and perform this operation together? This is the problem to be solved by the Paxos algorithm on each machine. (2). The role of the machine. Here, when a client requests a machine to perform a certain operation, we attribute the operation to a certain "value" and call this machine Proposer. The core idea of Paxos is that a value first It is proposed by Proposer and then accepted by most (more than half) machines before this value will be finally adopted (only a certain number of operations in the operation sequence will be valid). The machine that performs value recognition is called Acceptor. In addition, there is another type of role called Learner, which is mainly used to record the accepted value. A machine can be Proposer, Acceptor and Learner at the same time.
The system requires a good system , Should meet the following two requirements:
Safety: A value can only be adopted when it is proposed by the Proposer (it cannot be made out of nothing), and only one value can be adopted (only one operation can be performed at a certain point in time) Liveness: There may be multiple Proposers If multiple values are proposed, one of them must be adopted by the Acceptor in the end (cannot be mentioned for nothing). There may be a deadlock problem. I will talk about it later. In addition, as mentioned earlier, Paxos is for the system to work fault-tolerantly and correctly. Fault tolerance is reflected in: some machines may hang up, and they can be restored after the hang. Network communication is unreliable. Message packets may be delivered late and may be lost, but one thing is that the message will not be tampered with (the so-called "non-Byzantine general problem"). (4). The Paxos algorithm mechanism is complete, and here comes the actual Paxos algorithm. To repeat, the problem that this algorithm solves is: some Proposers propose their own values, how can the Acceptor finally adopt one (and only one) value? Here you need to introduce the proposal number first: the number of a certain value proposal of a certain Proposer, the format can be similar to "roundNum + proposerID". As you will see later, the value proposal may not be accepted this time; and other proposal proposals are also Not accepted, so we can proceed to the next round of proposals. This is the meaning of roundNum. Such numbers are ordered (compared according to roundNum), are unique (proposerID is unique), and are necessary for the Paxos algorithm (see the algorithm steps below). From a coding perspective, a proposal is a combination of value and proposal number.
More about this source textSource text required for additional translation information