Remove a Node from a Scylla Cluster (Down Scale)

Use these instructions when you want to remove nodes in-order to reduce the size of your cluster.

Procedure

  1. Run the nodetool status command to check the status of the nodes in your cluster

    Datacenter: DC1
       Status=Up/Down
       State=Normal/Leaving/Joining/Moving
       --  Address        Load       Tokens  Owns (effective)                         Host ID         Rack
       UN  192.168.1.201  112.82 KB  256     32.7%             8d5ed9f4-7764-4dbd-bad8-43fddce94b7c   B1
       UN  192.168.1.202  91.11 KB   256     32.9%             125ed9f4-7777-1dbn-mac8-43fddce9123e   B1
       DN  192.168.1.203  124.42 KB  256     32.6%             675ed9f4-6564-6dbd-can8-43fddce952gy   B1
    
  2. If the node status is Up Normal (UN), use the nodetool decommission command.

    Using nodetool decommission guarantees that the node streams its own data to the remaining nodes, so there is no possibility of data loss. Aside from hard node failures, nodetool decommission is always preferred.

    Warning

    Review current disk space utilization on existing nodes and make sure the amount of data streamed from the node being removed can fit into the disk space available on the remaining nodes. If there is not enough disk space on the remaining nodes, the removal of a node will fail. Add more storage to remaining nodes before starting the removal procedure.

    Use the nodetool netstats command to monitor the progress of the token reallocation.

  3. If the node status is Down Normal (DN) and cannot be restored, use the nodetool removenode command.

    It is crucial to make sure the node in DN state is indeed down. It is recommended that you stop or kill the node before you continue. Otherwise, such a node, even after removenode starts, may still respond to queries using obsolete cluster topology. If you can access the node remotely, you can stop the service.

    sudo systemctl stop scylla-server
    
    sudo service scylla-server stop
    
    docker exec -it some-scylla supervisorctl stop scylla
    

    (without stopping some-scylla container)

    If there is no access, stop or kill the node using your virtualization or Cloud console, or physically.

    Note

    Attention nodetool removenode notify other nodes the token range it owns need to move to other nodes. As such other nodes redistribute the data using streaming. nodetool removenode does not guarantee consistency of rebalancing data. It may happen that stream sources do not have the most recent data, nodes are unavailable or another kind of error has happened during the transition. removenode ignored such failures before and it was possible to end up in an inconsistent state when QUORUM reads will miss replicas with the most recent data and return a stale result. More here.

    For this reason, in order to be able to preserve consistency among replicas it is extremely advisable to:

    • Make sure all other nodes are in status UN

    • Run a full cluster repair before nodetool removenode so all existing replicas have the most up to date data.

    • In case of node failures during removenode, run repair again before running nodetool removenode

    Later version of Scylla, starting from PR #7626, eliminate the above risk by failing removenode operation if any of them is down or fails. If the user want the removenode operation to succeed even if some of the nodes are not available, the user has to explicitly pass a list of nodes that can be skipped for the operation.

    Example restful api:

    curl -X POST "http://127.0.0.1:10000/storage_service/remove_node/?host_id=7bd303e9-4c7b-4915-84f6-343d0dbd9a49&ignore_nodes=127.0.0.3,127.0.0.5"
    

    (ignore nodes is not available in nodetool removenode yet (tools-#251 )

    When using nodetool removenode we need to use the Host ID of the node

    For Example:

    nodetool removenode 675ed9f4-6564-6dbd-can8-43fddce952gy

    Use the nodetool netstats command to monitor the progress of the token reallocation.

  4. If the node status is Up Joining (UJ) and does not change to Up Normal (UN), you should remove the node.

  5. Verify that the node removed by using the nodetool status command

    Datacenter: DC1
    Status=Up/Down
    State=Normal/Leaving/Joining/Moving
    --  Address        Load       Tokens  Owns (effective)                         Host ID         Rack
    UN  192.168.1.201  112.82 KB  256     32.7%             8d5ed9f4-7764-4dbd-bad8-43fddce94b7c   B1
    UN  192.168.1.202  91.11 KB   256     32.9%             125ed9f4-7777-1dbn-mac8-43fddce9123e   B1
    
  6. When a node is removed from the cluster, its data is not removed automatically. The data and commitlog stored on that node needs to be removed manually. If you do not do this, the old data will still be counted against the load on that node.

    To delete the data use

    sudo rm -rf /var/lib/scylla/data
    sudo find /var/lib/scylla/commitlog -type f -delete
    sudo find /var/lib/scylla/hints -type f -delete
    sudo find /var/lib/scylla/view_hints -type f -delete
    

Additional Information