Redis with failover replication

Redis - nice tools for store key-value data in different formats. Here pretty easy way to create failover for replication. Sometime named as redis cluster, but is it not true, just few (prefer 3 for sentinel quorum) servers with one master and slaves in different configuration (slave from slave, slave by priority. local slave...).




We need 3 servers with redis+sentinel. Tested on Redis version 2.x and 3.x.
1 - 10.1.1.1
2 - 10.1.1.2
3 - 10.1.1.3

r1_redis.conf
#bind 127.0.0.1
protected-mode no
port 6379
...

r2_redis.conf r3_redis.conf
#bind 127.0.0.1
protected-mode no
port 6379
...
slaveof 10.1.1.1 6379

Sentinel, commented part no needed for bootstrap and will be appended after start. I keep it for example.
r1_sentinel.conf r2_sentinel.conf r3_sentinel.conf
daemonize yes
pidfile "/var/run/redis/redis-sentinel.pid"
logfile "/var/log/redis/redis-sentinel.log"
port 16379
dir "/var/lib/redis"
protected-mode no
#sentinel myid 4809b5ae33b617b24e4ee061222a3cb11f4457cd
sentinel monitor redis-ha 10.1.1.1 6379 2
sentinel down-after-milliseconds redis-ha 3000
sentinel failover-timeout redis-ha 6000
#sentinel config-epoch redis-ha 24
#sentinel leader-epoch redis-ha 24
#sentinel known-slave redis-ha 10.1.1.2 6379
#sentinel known-slave redis-ha 10.1.1.3 6379
#sentinel known-sentinel redis-ha 10.1.1.2 16379 1316ff79cb3a4558119c53ca5038f86271683fa7
#sentinel known-sentinel redis-ha 10.1.1.3 16379 e36c6b59c49cbdc8f8056877a410644cda7a6255
#sentinel current-epoch 24

And finally if you need offload network, you can start local redis with slave of redis-ro pool with failover too.
listen redis-rw
        bind 127.0.0.1:6379
        mode tcp
        balance leastconn
        option tcplog
        option tcp-check
        tcp-check connect
        tcp-check send PING\r\n
        tcp-check expect string +PONG
        tcp-check send info\ replication\r\n
        tcp-check expect string role:master
        tcp-check send QUIT\r\n
        tcp-check expect string +OK
        server redis-1 10.1.1.1:6379 check inter 2s backup
        server redis-2 10.1.1.2:6379 check inter 2s backup
        server redis-3 10.1.1.3:6379 check inter 2s backup

listen redis-ro
        bind 127.0.0.2:6379
        mode tcp
        balance leastconn
        option tcplog
        option tcp-check
        tcp-check connect
        tcp-check send PING\r\n
        tcp-check expect string +PONG
        tcp-check send info\ replication\r\n
        tcp-check expect string master_link_status:up
        tcp-check send QUIT\r\n
        tcp-check expect string +OK
        server redis-1 10.1.1.1:6379 check inter 2s
        server redis-2 10.1.1.2:6379 check inter 2s
        server redis-3 10.1.1.3:6379 check inter 2s
        server redis-rw 127.0.0.1:6379 backup

listen redis-local-ro
        bind 127.0.0.3:6379
        mode tcp
        balance leastconn
        option tcplog
        option tcp-check
        tcp-check connect
        tcp-check send PING\r\n
        tcp-check expect string +PONG
        tcp-check send info\ replication\r\n
        tcp-check expect string master_link_status:up
        tcp-check send QUIT\r\n
        tcp-check expect string +OK
        server redis-local 127.0.0.4:6379 check inter 2s
        server redis-ro 127.0.0.2:6379 backup

What happen
127.0.0.1:6379 - Redis Cluster RW
127.0.0.2:6379 - Redis Cluster RO
127.0.0.3:6379 - Local Redis RO with fallback to Redis Cluster RO
127.0.0.4:6379 - Local Redis, slave of Redis Cluster RO
In this configuration we can miss all servers except one, he become master and you no need reconfigure apps. In switch moment, app will try write to the RO and so on. But haproxy tcp-check in few seconds will detect all nodes statuses and fix it on-the-fly.

Apps should use 127.0.0.1:6379 as master and 127.0.0.3:6379 as salve

Known problems:
1. Replication delay, in case when you need write and read from local slave
2. Replication crash when RO/RW traffic too high (more then 200mbps on master in one way, tested on AWS)
3. Lot of problems when you have more then 5-6 slaves and permanent problems with replication when amount of slaves 10+
 

Comments

Popular posts from this blog

FreeRadius and Google Workspace LDAP