Replace More Than One Dead Node In A Scylla Cluster

Scylla is a fault-tolerant system, a cluster can be available even when more than one node is down.

Prerequisites

  • Verify the status of the cluster using nodetool status command, node with status DN is down and need to be replaced.
Datacenter: DC1
Status=Up/Down
State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)                         Host ID         Rack
UN  192.168.1.201  112.82 KB  256     32.7%             8d5ed9f4-7764-4dbd-bad8-43fddce94b7c   B1
DN  192.168.1.202  91.11 KB   256     32.9%             125ed9f4-7777-1dbn-mac8-43fddce9123e   B1
DN  192.168.1.203  124.42 KB  256     32.6%             675ed9f4-6564-6dbd-can8-43fddce952gy   B1

Login to one of the nodes in the cluster with (UN) status, collect the following info from the node:

  • cluster_name - cat /etc/scylla/scylla.yaml | grep cluster_name
  • seeds - cat /etc/scylla/scylla.yaml | grep seeds:
  • endpoint_snitch - cat /etc/scylla/scylla.yaml | grep endpoint_snitch
  • Scylla version - scylla --version

Procedure

Depend on the Replication Factor (RF)

  • If the number of failed nodes is smaller than your keyspaces RF, you still have at least one available replica with your data, and you can use Replace a Dead Node procedure.
  • If the number of failed nodes is equal or larger than your keyspaces RF, then some of the data is lost, and you need to retrieve it from backup. Use the Replace a Dead Node procedure and restore the data from backup.

Procedures