Scylla Architecture - Fault Tolerance

Scylla replicates data according to a replication strategy that you choose. This strategy will determine the placement of the replicated data. Scylla runs nodes in a hash ring. All nodes are equal: there are no master, slave, or replica sets.

The Replication Factor (RF) is equivalent to the number of nodes where data (rows and partitions) are replicated. Data is replicated to multiple (RF=N) nodes.

An RF of 1 means there is only one copy of a row in a cluster and there is no way to recover the data if the node is compromised or goes down. RF=2 means that there are two copies of a row in a cluster. An RF of at least 3 is used in most systems or similar.

Data is always replicated automatically. Read or write operations can occur to data stored on any of the replicated nodes.

../../_images/1-write_op_RF_3.jpg

In the example above, our client sends a request to write partition 1 to node V; 1’s data is replicated to nodes W, X, and Z. We have a Replication Factor (RF) of 3. In this drawing, V is a coordinator node but not a replicator node. However, replicator nodes can also be coordinator nodes, and often are.

../../_images/2-read_op_RF_3.jpg

During a read operation, the client sends a request to the coordinator. Effectively because the RF=3, 3 nodes respond to the read request.

The Consistency Level (CL) determines how many replicas in a cluster that must acknowledge read or write operations before it is considered successful.

Some of the most common Consistency Levels used are:

  • ANY - A write is written to at least one node in the cluster. Provides lowest availability with highest consistency.
  • QUORUM - When a majority of the replicas respond, the request is honored. If RF=3, then 2 replicas respond. QUORUM can be calculated using the formula (n/2 +1) where n is the Replication Factor.
  • ONE - If one replica responds; the request is honored.
  • LOCAL_ONE - At least one replica in the local data center responds.
  • LOCAL_QUORUM - A quorum of replicas in the local datacenter responds.
  • EACH_QUORUM - (unsupported for reads) - A quorum of replicas in ALL datacenters must be written to.

During a write operation, the coordinator communicates with the replicas (the number of which depends on the Consistency Level and Replication Factor). The write is successful when the specified number of replicas confirm the write.

../../_images/3-write_op_RF_3_CL_1.jpg

In the above diagram, the double arrows indicate the write operation request going into the coordinator from the client and the acknowledgment being returned. Since the Consistency Level is one, the coordinator, V, must only wait for the write to be sent to and responded by a single node in the cluster which is W.

Since RF=3, our partition 1 is also written to nodes X and Z, but the coordinator does not need to wait for a response from them to confirm a successful write operation. In practice, acknowledgements from nodes X and Z can arrive to the coordinator at a later time, after the coordinator acknowledges the client.

When our Consistency Level is set to QUORUM, the coordinator must wait for a majority of nodes to acknowledge the write before it is considered successful. Since our Replication Factor is 3, we must wait for 2 acknowledgements (the third acknowledgement does not need to be returned):

../../_images/4-write_op_RF_3_CL_Quorum.jpg

During a read operation, the coordinator communicates with just enough replicas to guarantee that the required Consistency Level is met. Data is then returned to the client.

../../_images/5-read_op_RF_3_CL_1.jpg

The Consistency Level is tunable per operation in CQL. This is known as tunable consistency. Sometimes response latency is more important, making it necessary to adjust settings on a per- query or operation level to override keyspace or even data center-wide consistency settings. In other words, the Consistency Level setting allows you to choose a point in the consistency vs. latency tradeoff.

The Consistency Level and Replication Factor both impact performance. The lower the Consistency Level and/or Replication Factor, the faster the read or write operation. However, there will be less fault tolerance if a node goes down.

The Consistency Level itself impacts availability. A higher Consistency Level (more nodes required to be on line) means less availability with less tolerance to tolerate node failures. A lower Consistency Level means more availability and more fault tolerance.

The following table shows what Consistency Levels are available for a read or write operation:

Consistency Level Read Write
Any No Yes
1 Yes Yes
2 Yes Yes
3 Yes Yes
QUORUM Yes Yes
LOCAL_ONE Yes Yes
LOCAL_QUORUM Yes Yes
EACH_QUORUM No Yes
ALL Yes Yes

Scylla, as do many distributed database systems, adheres to the CAP Theorem. The CAP Theorem is the notion that Consistency, Availability and Partition Tolerance of data are mutually dependent in a distributed system. Increasing any 2 of these factors will reduce the third.

Scylla adheres to the CAP theorem in the following way:

../../_images/6-CAP_Theorem.jpg

Scylla chooses availability and partition tolerance over consistency, such that:

  • It’s impossible to be both consistent and highly available during a network partition;
  • If we sacrifice consistency, we can be highly available.

You’ll need to design your application around Scylla’s data modeling, but the net result is an application that will never go down.

See our Consistency Level Console Demo.

Terms for Review

Replication

The process of replicating data across nodes in a cluster.

Consistency Level

A configurable setting which dictates how many replicas in a cluster that must acknowledge read or write operations.

Tunable Consistency

The possibility for unique, per-query, Consistency Level settings. These are incremental and override fixed database settings intended to enforce data consistency. Such settings may be set directly from a CQL statement when response speed for a given query or operation is more important.

Replication Factor

The total number of replica nodes across a given cluster. A Replication Factor of 1 means that the data will only exist on a single node in the cluster and will not have any fault tolerance. This number is a setting defined for each keyspace. All replicas share equal priority; there are no primary or master replicas.