Deploying Scylla in Multi-Datacenter

Building a production environment requires both resilience and workload balance on data clusters. ScyllaDB, which is compatible with Apache Cassandra, provides a multi-datacenter deployment capability.

This guide provides the how-to procedure for multi-datacenter deployment. You should know how to install ScyllaDB prior to beginning the following procedures.

We will review 2 cases, single and multiple network domains.

Deploying in a single network domain

In the first case, all your nodes are connected under a single domain. You have complete connectivity and required ports open under your private network. In this example we have 3 nodes in each datacenter.

Data Center 1 West
Node1 172.31.0.4 (Seed)
Node2 172.31.0.5 (Seed)
Node3 172.31.0.6

Data Center 2 West
Node4 172.31.12.204 (Seed)
Node5 172.31.12.205 (Seed)
Node6 172.31.32.6

After installing Scylla on each node, modify the following settings in the scylla.yaml file, typically located under /etc/scylla/.

  • Make sure all nodes use the same cluster name, for example: cluster_name: ‘my-cluster’
  • Define the seed node IP addresses in the seeds list.
Seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
        - seeds: "172.31.0.4,172.31.0.5,172.31.12.204,172.31.12.205"

It is highly recommended to have more than one seed node in a cluster for a production environment. However, refrain from adding all nodes to the seed pool. Seed node IP addresses are used by the cluster nodes during their initialization phase. After each Scylla node is up and running, the gossip protocol updates the cluster orchestration scheme.

  1. Define the parameter listen_address: <Private IP Addr> , for example here for node6 we defined listen_address: 172.31.32.6
  2. Define the parameter rpc_address: <Private IP Addr>, for example here for node2 we defined rpc_address: 172.31.0.5
  3. To create a multi-datacenter cluster an appropriate snitch definition is required. Modify the snitch parameter to endpoint_snitch: GossipingPropertyFileSnitch

The next file to edit is cassandra-rackdc.properties , typically located under /etc/scylla/.

For nodes 1-3 we defined:

dc=DC1            #Describe data center name
rack=Rack1        #Describe rack name
prefer_local=true

For nodes 4-6 we defined:

dc=DC2    #Describe data center name
rack=Rack1  #Describe rack name
prefer_local=true

Now that the configuration files are set, first start Scylla on the seed nodes, one by one. Once all seed nodes are up, start Scylla on all other nodes. Starting Scylla can be done with the following commands:

sudo systemctl  start scylla-server
sudo systemctl  start scylla-jmx

Let all nodes join the ring and verify the cluster is up and running with the nodetool status command.

$ nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--   Address         Load            Tokens  Owns            Host ID                                 Rack
UN   172.31.0.6      988.99 MB       256     ?               9df475f3-7353-425b-ba0a-d27855c438d0    Rack1
UN   172.31.0.5      979.43 MB       256     ?               158afb86-e325-412f-ae96-8c1ce55b2662    Rack1
UN   172.31.0.4      955.03 MB       256     ?               61f99882-8cc3-4902-ac19-00cb4c6a1cac    Rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--   Address         Load            Tokens  Owns            Host ID                                 Rack
UN   172.31.12.205   1.53 GB         256     ?               1dba07f5-7a65-4ea3-bc6d-9686e19a6de2    Rack1
UN   172.31.12.204   246.5 KB        256     ?               dde0c976-6495-4db2-9392-17c8ed5ca8ed    Rack1
UN   172.31.32.6     246.5 KB        256     ?               418ec044-c08b-4798-8f35-b5d14fa028da    Rack1

Deploying in different network domains

In the second case the nodes are located in different data centers under different domains. For example, you might have two different private networks, and each node has its own public IP address. This is a typical deployment in a hosted environment such as AWS.

Data Center 1 West
node#        Private IP      Public IP
Node1        172.31.0.4,     54.187.36.59   (Seed)
Node2        172.31.0.5,     54.187.142.201 (Seed)
Node3        172.31.0.6,     54.187.168.20

Data Center 2 West
node#        Private IP      Public IP
Node4        172.31.12.204,  54.191.72.56 (Seed)
Node5        172.31.12.205,  54.187.25.99 (Seed)
Node6        172.31.32.6,    54.191.2.121

Data Center 3 East
node#        Private IP      Public IP
Node7        10.111.217.15,  54.160.174.243  (Seed)
Node8        10.101.212.62,  54.235.9.159    (Seed)
Node9        10.143.251.171, 54.146.228.25

Please note that we described both private and public IP addresses of every node.

After installing Scylla on each node, modify the following settings in the scylla.yaml file, typically located under /etc/scylla/.

  1. Make sure all nodes use the same cluster name cluster_name: 'my-cluster'
  2. Define the seed parameters list to include the public IP addresses of all seed nodes.
Seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
     - seeds: "54.187.36.59,54.187.142.201,54.191.72.56,54.187.25.99,54.160.174.243,54.235.9.159"

In this case we defined two seed nodes in each datacenter.

  1. Define the parameter listen_address: <Private IP Addr>. For example here for node4 we defined listen_address: 172.31.12.204.
  2. Define the parameter rpc_address: <Private IP Addr>. For example here for node7 we defined rpc_address: 10.111.217.15.
  3. Define the parameter broadcast_rpc_address: <Public IP Addr>. For example here for node7 we defined broadcast_rpc_address: 54.160.174.243.

Note: It is not mandatory to define broadcast_rpc_address for deployments using EC2-based snitches.

  1. To create a multi-datacenter cluster an appropriate snitch has to be defined. In this example we defined snitch parameter as endpoint_snitch: Ec2MultiRegionSnitch

While not mandatory to define cluster suffix in EC2, it is helpful to determine a node’s datacenter attributes. So, the next file to edit is cassandra-rackdc.properties typically located under /etc/scylla/.

For nodes 1-3 we defined:

dc_suffix=_DC1

For nodes 4-6 we defined:

dc_suffix=_DC2

For nodes 4-6 we defined:

dc_suffix=_DC3

Now that the configuration files are set, first start Scylla on the seed nodes, one by one. Once all seed nodes are up, start Scylla on the rest of the nodes. Starting Scylla can be done with the following commands:

sudo systemctl  start scylla-server
sudo systemctl  start scylla-jmx

Let all nodes join the ring and verify the cluster is up and running with the nodetool status command.

$ nodetool status
Datacenter: us-west-2_DC2
=========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--   Address         Load            Tokens  Owns            Host ID                                 Rack
UN   54.191.2.121    120.97 KB       256     ?               c84b80ea-cb60-422b-bc72-fa86ede4ac2e    2b
UN   54.191.72.56    109.54 KB       256     ?               129087eb-9aea-4af6-92c6-99fdadb39c33    2c
UN   54.187.25.99    104.94 KB       256     ?               0540c7d7-2622-4f1f-a3f0-acb39282e0fc    2c
Datacenter: us-east_DC3
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--   Address         Load            Tokens  Owns            Host ID                                 Rack
UN   54.160.174.243  109.54 KB       256     ?               c7686ffd-7a5b-4124-858e-df2e61130aaa    1c
UN   54.235.9.159    109.75 KB       256     ?               39798227-9f6f-4868-8193-08570856c09a    1c
UN   54.146.228.25   128.33 KB       256     ?               7a4957a1-9590-4434-9746-9c8a6f796a0c    1c
Datacenter: us-west-2_DC1
=========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--   Address         Load            Tokens  Owns            Host ID                                 Rack
UN   54.187.36.59    114.35 KB       256     ?               4c3e1533-1b78-45bf-8bd4-818090f019ab    2c
UN   54.187.142.201  109.54 KB       256     ?               d99967d6-987c-4a54-829d-86d1b921470f    2c
UN   54.187.168.20   109.54 KB       256     ?               2329c2e0-64e1-41dc-8202-74403a40f851    2c

Note: In a multi-DC, multi-domain deployment, seed nodes’ IP addresses are the public IP address. In case static public IP addresses are not a viable option, the deployment of multi-DC clusters should use a Virtual Private Network to enable connectivity between the data centers.

Scylla recommends the following snitches for production environments:

Ec2MultiRegionSnitch - for AWS cloud based deployments, for both single and multi-data center deployments

GossipingPropertyFileSnitch - for bare metal deployments

Knowledge Base