redis Replication

At the base of Redis replication (excluding the high availability features provided as an additional layer by Redis Cluster or Redis Sentinel) there is a very simple to use and configure leader follower (master-slave) replication: it allows slave Redis instances to be exact copies of master instances. The slave will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.

This system works using three main mechanisms:

When a master and a slave instances are well-connected, the master keeps the slave updated by sending a stream of commands to the slave, in order to replicate the effects on the dataset happening in the master side due to: client writes, keys expired or evicted, any other action changing the master dataset.

SUMMARY : 在replication.c:replicationFeedSlaves中实现的这个过程

When the link between the master and the slave breaks, for network issues or because a timeout is sensed in the master or the slave, the slave reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
When a partial resynchronization is not possible, the slave will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the slave, and then continue sending the stream of commands as the dataset changes.

Redis uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of Redis use cases. However Redis slaves asynchronously acknowledge the amount of data they received periodically with the master. So the master does not wait every time for a command to be processed by the slaves, however it knows, if needed, what slave already processed what command. This allows to have optional syncrhonous replication.

Synchronous replication of certain data can be requested by the clients using the WAIT command. However WAIT is only able to ensure that there are the specified number of acknowledged copies in the other Redis instances, it does not turn a set of Redis instances into a CP system （CP system means Consistency and Partition tolerance）with strong consistency: acknowledged writes can still be lost during a failover, depending on the exact configuration of the Redis persistence. However with WAIT the probability of losing a write after a failure event is greatly reduced to certain hard to trigger failure modes.

You could check the Sentinel or Redis Cluster documentation for more information about high availability and failover（故障恢复）. The rest of this document mainly describe the basic characteristics of Redis basic replication.

The following are some very important facts about Redis replication:

Redis uses asynchronous replication, with asynchronous slave-to-master acknowledges of the amount of data processed.

SUMMARY :如何实现的asynchronous replication？是通过new一个thread来实现？

A master can have multiple slaves.
Slaves are able to accept connections from other slaves. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a cascading-like structure. Since Redis 4.0, all the sub-slaves will receive exactly the same replication stream from the master.

SUMMARY : 是谁向sub-slaves 发送replication stream？应该还是master，这样会比较简单一些。是这样的，最后一句话就证实了；

Redis replication is non-blocking on the master side. This means that the master will continue to handle queries when one or more slaves perform the initial synchronization or a partial resynchronization.
Replication is also largely non-blocking on the slave side. While the slave is performing the initial synchronization, it can handle queries using the old version of the dataset, assuming you configured Redis to do so in redis.conf. Otherwise, you can configure Redis slaves to return an error to clients if the replication stream is down. However, after the initial sync, the old dataset must be deleted and the new one must be loaded. The slave will block incoming connections during this brief window (that can be as long as many seconds for very large datasets). Since Redis 4.0 it is possible to configure Redis so that the deletion of the old data set happens in a different thread, however loading the new initial dataset will still happen in the main thread and block the slave.

SUMMARY :如何实现的？

Replication can be used both for scalability, in order to have multiple slaves for read-only queries (for example, slow O(N) operations can be offloaded to slaves), or simply for improving data safety and high availability.
It is possible to use replication to avoid the cost of having the master writing the full dataset to disk: a typical technique involves configuring your master redis.conf to avoid persisting to disk at all, then connect a slave configured to save from time to time, or with AOF enabled. However this setup must be handled with care, since a restarting master will start with an empty dataset: if the slave tries to synchronized with it, the slave will be emptied as well.

SUMMARY : 下面这一段描述了这种场景

Safety of replication when master has persistence turned off

In setups where Redis replication is used, it is strongly advised to have persistence turned on in the master and in the slaves. When this is not possible, for example because of latency concerns due to very slow disks, instances should be configured to avoid restarting automatically after a reboot.

To better understand why masters with persistence turned off configured to auto restart are dangerous, check the following failure mode where data is wiped from the master and all its slaves:

We have a setup with node A acting as master, with persistence turned down, and nodes B and C replicating from node A.
Node A crashes, however it has some auto-restart system, that restarts the process. However since persistence is turned off, the node restarts with an empty data set.
Nodes B and C will replicate from node A, which is empty, so they'll effectively destroy their copy of the data.

When Redis Sentinel is used for high availability, also turning off persistence on the master, together with auto restart of the process, is dangerous. For example the master can restart fast enough for Sentinel to don't detect a failure, so that the failure mode described above happens.

Every time data safety is important, and replication is used with master configured without persistence, auto restart of instances should be disabled.

How Redis replication works

Every Redis master has a replication ID: it is a large pseudo random string that marks a given story of the dataset. Each master also takes an offset that increments for every byte of replication stream that it is produced to be sent to slaves, in order to update the state of the slaves with the new changes modifying the dataset. The replication offset is incremented even if no slave is actually connected, so basically every given pair of:

Replication ID, offset

Identifies an exact version of the dataset of a master.

SUMMARY : struct redisServer的字段replid就表示replication ID，master_repl_offset就表示current replication offset

When slaves connects to masters, they use the PSYNC command in order to send their old master replication ID and the offsets they processed so far. This way the master can send just the incremental part needed. However if there is not enough backlog in the master buffers, or if the slave is referring to an history (replication ID) which is no longer known, than a full resynchronization happens: in this case the slave will get a full copy of the dataset, from scratch.

SUMMARY : struct redisServer的字段repl_backlog就表示Replication backlog for partial syncs，它是一个circular array。

The master starts a background saving process in order to produce an RDB file. At the same time it starts to buffer all new write commands received from the clients. When the background saving is complete, the master transfers the database file to the slave, which saves it on disk, and then loads it into memory. The master will then send all buffered commands to the slave. This is done as a stream of commands and is in the same format of the Redis protocol itself.

You can try it yourself via telnet. Connect to the Redis port while the server is doing some work and issue the SYNCcommand. You'll see a bulk transfer and then every command received by the master will be re-issued in the telnet session. Actually SYNC is an old protocol no longer used by newer Redis instances, but is still there for backward compatibility: it does not allow partial resynchronizations, so now PSYNC is used instead.

As already said, slaves are able to automatically reconnect when the master-slave link goes down for some reason. If the master receives multiple concurrent slave synchronization requests, it performs a single background save in order to serve all of them.

SUMMARY : 两个process之间是如何进行通信的？

Replication ID explained

In the previous section we said that if two instances have the same replication ID and replication offset, they have exactly the same data. However it is useful to understand what exctly is the replication ID, and why instances have actually two replication IDs the main ID and the secondary ID.

A replication ID basically marks a given history of the data set. Every time an instance restarts from scratch as a master, or a slave is promoted to master, a new replication ID is generated for this instance. The slaves connected to a master will inherit（继承） its replication ID after the handshake. So two instances with the same ID are related by the fact that they hold the same data, but potentially at a different time. It is the offset that works as a logical time to understand, for a given history (replication ID) who holds the most updated data set.

For instance if two instances A and B have the same replication ID, but one with offset 1000 and one with offset 1023, it means that the first lacks certain commands applied to the data set. It also means that A, by applying just a few commands, may reach exactly the same state of B.

The reason why Redis instances have two replication IDs is because of slaves that are promoted to masters. After a failover, the promoted slave requires to still remember what was its past replication ID, because such replication ID was the one of the former master. In this way, when other slaves will synchronize with the new master, they will try to perform a partial resynchronization using the old master replication ID. This will work as expected, because when the slave is promoted to master it sets its secondary ID to its main ID, remembering what was the offset when this ID switch happend. Later it will select a new random replication ID, because a new history begins. When handling the new slaves connecting, the master will match their IDs and offsets both with the current ID and the secondary ID (up to a given offset, for safety). In short this means that after a failover, slaves connecting to the new promoted master don't have to perform a full sync.

In case you wonder why a slave promoted to master needs to change its replication ID after a failover: it is possible that the old master is still working as a master because of some network partition: retaining the same replication ID would violate the fact that the same ID and same offset of any two random instances mean they have the same data set.

Diskless replication

Normally a full resynchronization requires to create an RDB file on disk, then reload the same RDB from disk in order to feed the slaves with the data.

With slow disks this can be a very stressing operation for the master. Redis version 2.8.18 is the first version to have support for diskless replication. In this setup the child process directly sends the RDB over the wire to slaves, without using the disk as intermediate storage.

Configuration

To configure basic Redis replication is trivial: just add the following line to the slave configuration file:

slaveof 192.168.1.1 6379

Of course you need to replace 192.168.1.1 6379 with your master IP address (or hostname) and port. Alternatively, you can call the SLAVEOF command and the master host will start a sync with the slave.

There are also a few parameters for tuning the replication backlog taken in memory by the master to perform the partial resynchronization. See the example redis.conf shipped with the Redis distribution for more information.

Diskless replication can be enabled using the repl-diskless-sync configuration parameter. The delay to start the transfer in order to wait more slaves to arrive after the first one, is controlled by the repl-diskless-sync-delayparameter. Please refer to the example redis.conf file in the Redis distribution for more details.

Read-only slave

Since Redis 2.6, slaves support a read-only mode that is enabled by default. This behavior is controlled by the slave-read-only option in the redis.conf file, and can be enabled and disabled at runtime using CONFIG SET.

Read-only slaves will reject all write commands, so that it is not possible to write to a slave because of a mistake. This does not mean that the feature is intended to expose a slave instance to the internet or more generally to a network where untrusted clients exist, because administrative commands like DEBUG or CONFIG are still enabled. However, security of read-only instances can be improved by disabling commands in redis.conf using the rename-commanddirective.

You may wonder why it is possible to revert the read-only setting and have slave instances that can be targeted by write operations. While those writes will be discarded if the slave and the master resynchronize or if the slave is restarted, there are a few legitimate use case for storing ephemeral data in writable slaves.

For example computing slow Set or Sorted set operations and storing them into local keys is an use case for writable slaves that was observed multiple times.

However note that writable slaves before version 4.0 were incapable of expiring keys with a time to live set. This means that if you use EXPIRE or other commands that set a maximum TTL for a key, the key will leak, and while you may no longer see it while accessing it with read commands, you will see it in the count of keys and it will still use memory. So in general mixing writable slaves (previous version 4.0) and keys with TTL is going to create issues.

Redis 4.0 RC3 and greater versions totally solve this problem and now writable slaves are able to evict keys with TTL as masters do, with the exceptions of keys written in DB numbers greater than 63 (but by default Redis instances only have 16 databases).

Also note that since Redis 4.0 slave writes are only local, and are not propagated to sub-slaves attached to the instance. Sub slaves instead will always receive the replication stream identical to the one sent by the top-level master to the intermediate slaves. So for example in the following setup:

A ---> B ---> C

Even if B is writable, C will not see B writes and will instead have identical dataset as the master instance A.

Setting a slave to authenticate to a master

If your master has a password via requirepass, it's trivial to configure the slave to use that password in all sync operations.

To do it on a running instance, use redis-cli and type:

config set masterauth <password>

To set it permanently, add this to your config file:

masterauth <password>

Allow writes only with N attached replicas

Starting with Redis 2.8, it is possible to configure a Redis master to accept write queries only if at least N slaves are currently connected to the master.

However, because Redis uses asynchronous replication it is not possible to ensure the slave actually received a given write, so there is always a window for data loss.

This is how the feature works:

Redis slaves ping the master every second, acknowledging the amount of replication stream processed.
Redis masters will remember the last time it received a ping from every slave.
The user can configure a minimum number of slaves that have a lag not greater than a maximum number of seconds.

If there are at least N slaves, with a lag less than M seconds, then the write will be accepted.

You may think of it as a best effort data safety mechanism, where consistency is not ensured for a given write, but at least the time window for data loss is restricted to a given number of seconds. In general bound data loss is better than unbound one.

If the conditions are not met, the master will instead reply with an error and the write will not be accepted.

There are two configuration parameters for this feature:

min-slaves-to-write <number of slaves>
min-slaves-max-lag <number of seconds>

For more information, please check the example redis.conf file shipped with the Redis source distribution.

The INFO and ROLE command

There are two Redis commands that provide a lot of information on the current replication parameters of master and slave instances. One is INFO. If the command is called with the replication argument as INFO replication only information relevant to the replication are displayed. Another more computer-friendly command is ROLE, that provides the replication status of masters and slaves together with their replication offsets, list of connected slaves and so forth.

Partial resynchronizations after restarts and failovers

Since Redis 4.0, when an instance is promoted to master after a failover, it will be still able to perform a partial resynchronization with the slaves of the old master. To do so, the slave remembers the old replication ID and offset of its former master, so can provide part of the backlog to the connecting slaves even if they ask for the old replication ID.

However the new replication ID of the promoted slave will be different, since it constitutes a different history of the data set. For example, the master can return available and can continue accepting writes for some time, so using the same replication ID in the promoted slave would violate the rule that a of replication ID and offset pair identifies only a single data set.

Moreover slaves when powered off gently and restarted, are able to store in the RDB file the information needed in order to resynchronize with their master. This is useful in case of upgrades. When this is needed, it is better to use the SHUTDOWN command in order to perform a save & quit operation on the slave.

It is not possilbe to partially resynchronize a slave that restarted via the AOF file. However the instance may be turned to RDB persistence before shutting down it, than can be restarted, and finally AOF can be enabled again.