ScyllaDB Docs ScyllaDB Manual ScyllaDB for Administrators Procedures Cluster Management Procedures Cluster Platform Migration Using Node Cycling

Cluster Platform Migration Using Node Cycling¶

This procedure describes how to migrate a ScyllaDB cluster to new instance types using the add-and-replace approach, which is commonly used for:

Migrating from one CPU architecture to another (e.g., x86_64 to ARM/Graviton)
Upgrading to newer instance types with better performance
Changing instance families within the same cloud provider

The add-and-replace approach maintains data replication throughout the migration and ensures zero downtime for client applications.

Note

This procedure does not change the ScyllaDB software version. All nodes (both existing and new) must run the same ScyllaDB version. For software version upgrades, see Upgrade.

Overview¶

The add-and-replace migration follows these steps:

Add new nodes (on target instance type) to the existing cluster
Wait for data to stream to the new nodes
Decommission old nodes (on source instance type)

This approach keeps the cluster operational throughout the migration while maintaining the configured replication factor.

Key characteristics¶

Zero downtime: Client applications continue to operate during migration
Data safety: Replication factor is maintained throughout the process
Flexible: Works with both vnodes and tablets-enabled clusters
Multi-DC support: Can migrate nodes across multiple datacenters

Warning

Ensure your cluster has sufficient capacity during the migration. At the peak of the process, your cluster will temporarily have double the number of nodes.

Prerequisites¶

Check cluster health¶

Before starting the migration, verify that your cluster is healthy:

Check that all nodes are in Up Normal (UN) status:
```
nodetool status
```
All nodes should show UN status. Do not proceed if any nodes are down.
Ensure no streaming or repair operations are in progress:
```
nodetool netstats
nodetool compactionstats
```

Plan the migration¶

Before provisioning new instances, plan the following:

Instance type mapping: Identify the source and target instance types. If your cluster uses vnodes (not tablets), consider that mismatched shard counts between source and target instance types can cause slower repairs. With tablets enabled, shard count mismatch is fully supported.

Rack assignment planning: Each new node must be assigned to the same rack as the node it will replace. This maintains rack-aware topology for:

Rack-aware replication (NetworkTopologyStrategy)
Proper data distribution across failure domains
Minimizing data movement during decommission

Example mapping for a 3-node cluster:

Source nodes (to be decommissioned):     Target nodes (to be added):
168.1.10 - RACK0                 →   192.168.2.10 - RACK0
168.1.11 - RACK1                 →   192.168.2.11 - RACK1
168.1.12 - RACK2                 →   192.168.2.12 - RACK2

Create a backup¶

Back up the data before starting the migration. One of the following methods can be used:

ScyllaDB Manager (recommended): Use ScyllaDB Manager to perform a cluster-wide backup. See the ScyllaDB Manager documentation for details.
Snapshots: On each node in the cluster, create a snapshot:
```
nodetool snapshot -t pre_migration_backup
nodetool listsnapshots
```
Note

Snapshots are local to each node and do not protect against node or disk failure. For full disaster recovery, use ScyllaDB Manager backup.

Procedure¶

Adding new nodes¶

Provision new instances with the target instance type. Ensure:
- The same ScyllaDB version as existing nodes
- Same network configuration and security groups
- Appropriate storage configuration
On each new node, configure /etc/scylla/scylla.yaml to join the existing cluster:
- cluster_name: Must match the existing cluster name
- seeds: IP address of an existing node in the cluster (used to discover cluster topology on join)
- endpoint_snitch: Must match the existing cluster configuration
- listen_address: IP address of the new node
- rpc_address: IP address of the new node
All other cluster-wide settings (tablets configuration, encryption settings, experimental features, etc.) must match the existing nodes.

Caution

Make sure that the ScyllaDB version on the new node is identical to the version on the other nodes in the cluster. Running nodes with different versions is not supported.
If using GossipingPropertyFileSnitch, configure /etc/scylla/cassandra-rackdc.properties with the correct datacenter and rack assignment for this node:
```
dc = <datacenter-name>
rack = <rack-name>
prefer_local = true
```
Warning

Each node must have the correct rack assignment. Using the same rack for all new nodes breaks rack-aware replication topology.

Start ScyllaDB on the new node:

sudo systemctl start scylla-server

For Docker deployments:

docker exec -it <container-name> supervisorctl start scylla

Monitor the bootstrap process from an existing node:

nodetool status

The new node will appear with UJ (Up, Joining) status while streaming data from existing nodes. Wait until it transitions to UN (Up, Normal).

Example output during bootstrap:

Datacenter: dc1
Status=Up/Down
State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns   Host ID                               Rack
UN  192.168.1.10   500 MB     256     33.3%  8d5ed9f4-7764-4dbd-bad8-43fddce94b7c  RACK0
UN  192.168.1.11   500 MB     256     33.3%  125ed9f4-7777-1dbn-mac8-43fddce9123e  RACK1
UN  192.168.1.12   500 MB     256     33.3%  675ed9f4-6564-6dbd-can8-43fddce952gy  RACK2
UJ  192.168.2.10   250 MB     256     ?      a1b2c3d4-5678-90ab-cdef-112233445566  RACK0

Example output after bootstrap completes:

Datacenter: dc1
Status=Up/Down
State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns   Host ID                               Rack
UN  192.168.1.10   400 MB     256     25.0%  8d5ed9f4-7764-4dbd-bad8-43fddce94b7c  RACK0
UN  192.168.1.11   400 MB     256     25.0%  125ed9f4-7777-1dbn-mac8-43fddce9123e  RACK1
UN  192.168.1.12   400 MB     256     25.0%  675ed9f4-6564-6dbd-can8-43fddce952gy  RACK2
UN  192.168.2.10   400 MB     256     25.0%  a1b2c3d4-5678-90ab-cdef-112233445566  RACK0

For tablets-enabled clusters, wait for tablet load balancing to complete. After the node reaches UN status, verify no streaming is in progress:
```
nodetool netstats
```
Wait until output shows “Not sending any streams” and no active receiving streams.
Repeat steps 1-6 for each new node to be added.

Note

You can add multiple nodes in parallel if they are in different datacenters. Within a single datacenter, add nodes one at a time for best results.

Updating seed node configuration¶

If any of your original nodes are configured as seed nodes, you must update the seed configuration before decommissioning them.

Check the current seed configuration on any node:

grep -A 4 "seed_provider" /etc/scylla/scylla.yaml

If the seeds include nodes you plan to decommission, update scylla.yaml on all new nodes to use the new node IPs as seeds:
```
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "192.168.2.10,192.168.2.11,192.168.2.12"
```
Note

Updating seed configuration on the old nodes (that will be decommissioned) is optional. Seeds are only used during node startup to discover the cluster. If you don’t plan to restart the old nodes before decommissioning them, their seed configuration doesn’t matter. However, updating all nodes is recommended for safety in case an old node unexpectedly restarts during the migration.
Restart ScyllaDB on each new node (one at a time) to apply the new seed configuration:
```
sudo systemctl restart scylla-server
```
Wait for the node to fully start before restarting the next node.
After restarting the new nodes, verify the cluster is healthy:
```
nodetool status
nodetool describecluster
```

Warning

Complete this seed list update on all new nodes before decommissioning any old nodes. This ensures the new nodes can reform the cluster after the old nodes are removed.

Decommissioning old nodes¶

After all new nodes are added and healthy, decommission the old nodes one at a time.

Verify all nodes are healthy before starting decommission:
```
nodetool status
```
All nodes should show UN status.
On the node to be decommissioned, run:
```
nodetool decommission
```
This command blocks until the decommission is complete. The node will stream its data to the remaining nodes.
Monitor the decommission progress from another node:
```
nodetool status
```
The decommissioning node will transition from UN → UL (Up, Leaving) → removed from the cluster.

You can also monitor streaming progress:
```
nodetool netstats
```
After decommission completes, verify the node is no longer in the cluster:
```
nodetool status
```
The decommissioned node should no longer appear in the output.
Run nodetool cleanup on the remaining nodes to remove data that no longer belongs to them after the topology change:
```
nodetool cleanup
```
Note

nodetool cleanup can be resource-intensive. Run it on one node at a time during low-traffic periods.
Wait for the cluster to stabilize before decommissioning the next node. Ensure no streaming operations are in progress.
Repeat steps 1-7 for each old node to be decommissioned.

Post-migration verification¶

After all old nodes are decommissioned, verify the migration was successful.

Verify cluster topology¶

nodetool status

Confirm:

All nodes show UN (Up, Normal) status
Only the new instance type nodes are present
Nodes are balanced across racks

Verify schema agreement¶

nodetool describecluster

All nodes should report the same schema version.

Verify data connectivity¶

Connect to the cluster and run a test query:

cqlsh <node-ip> -e "SELECT count(*) FROM system_schema.keyspaces;"

Note

If ScyllaDB is configured with listen_interface, you must use the node’s interface IP address (not localhost) for cqlsh connections.

Verify ScyllaDB version¶

Confirm all nodes are running the same ScyllaDB version:

scylla --version

Verify data integrity (optional)¶

Run data validation on each keyspace to verify sstable integrity:

nodetool scrub --mode=VALIDATE <keyspace_name>

Rollback¶

If issues occur during the migration, you can roll back by reversing the procedure.

During add phase¶

If a new node fails to bootstrap:

Stop ScyllaDB on the new node:
```
sudo systemctl stop scylla-server
```

From an existing node, remove the failed node:

nodetool removenode <host-id-of-failed-node>

During decommission phase¶

If a decommission operation gets stuck:

If the node is still reachable, try stopping and restarting ScyllaDB
If the node is unresponsive, from another node:
```
nodetool removenode <host-id>
```
See Remove a Node from a ScyllaDB Cluster for more details.

Full rollback¶

To roll back after the migration is complete (all nodes on new instance type), apply the same add-and-replace procedure in reverse:

Add new nodes on the original instance type
Wait for data streaming to complete
Decommission the nodes on the new instance type

Troubleshooting¶

Node stuck in Joining (UJ) state¶

If a new node remains in UJ state for an extended period:

Check ScyllaDB logs for streaming errors: journalctl -u scylla-server
Verify network connectivity between nodes
Ensure sufficient disk space on all nodes
Check for any ongoing operations that may be blocking

Decommission taking too long¶

Decommission duration depends on data size. If it appears stuck:

Check streaming progress: nodetool netstats
Look for errors in ScyllaDB logs
Verify network bandwidth between nodes

Schema disagreement¶

If nodes report different schema versions:

Wait a few minutes for schema to propagate
If disagreement persists, restart the nodes one by one
Run nodetool describecluster to verify agreement

Additional resources¶

Was this page helpful?