Scylla Architecture - Fault Tolerance¶
Scylla replicates data according to a replication strategy that you choose. This strategy will determine the placement of the replicated data. Scylla runs nodes in a hash ring. All nodes are equal: there are no master, slave, or replica sets.
The Replication Factor (RF) is equivalent to the number of nodes where data (rows and partitions) are replicated. Data is replicated to multiple (RF=N) nodes.
An RF of
1 means there is only one copy of a row in a cluster and there is no way to recover the data if the node is compromised or goes down. RF=2 means that there are two copies of a row in a cluster. An RF of at least
3 is used in most systems or similar.
Data is always replicated automatically. Read or write operations can occur to data stored on any of the replicated nodes.
In the example above, our client sends a request to write partition 1 to node V; 1’s data is replicated to nodes W, X, and Z. We have a Replication Factor (RF) of
3. In this drawing, V is a coordinator node but not a replicator node. However, replicator nodes can also be coordinator nodes, and often are.
During a read operation, the client sends a request to the coordinator. Effectively because the RF=3, 3 nodes respond to the read request.
The Consistency Level (CL) determines how many replicas in a cluster that must acknowledge read or write operations before it is considered successful.
Some of the most common Consistency Levels used are:
- ANY - A write must be written to at least one replica in the cluster. A read waits for a response from at least one replica. Provides the highest availability with the lowest consistency.
- QUORUM - When a majority of the replicas respond, the request is honored. If RF=3, then 2 replicas respond. QUORUM can be calculated using the formula (n/2 +1) where n is the Replication Factor.
- ONE - If one replica responds; the request is honored.
- LOCAL_ONE - At least one replica in the local data center responds.
- LOCAL_QUORUM - A quorum of replicas in the local datacenter responds.
- EACH_QUORUM - (unsupported for reads) - A quorum of replicas in ALL datacenters must be written to.
- ALL - A write must be written to all replicas in the cluster, a read waits for a response from all replicas. Provides the lowest availability with the highest consistency.
Regardless of the Consistency Level, a write is always sent to all replicas, as set by the Replication Factor. Consistency Level control when a client acknowledged, not how many replicas are updated.
During a write operation, the coordinator communicates with the replicas (the number of which depends on the Replication Factor). The write is successful when the specified number of replicas confirm the write.
In the above diagram, the double arrows indicate the write operation request going into the coordinator from the client and the acknowledgment being returned. Since the Consistency Level is one, the coordinator, V, must only wait for the write to be sent to and responded by a single node in the cluster which is W.
Since RF=3, our partition 1 is also written to nodes X and Z, but the coordinator does not need to wait for a response from them to confirm a successful write operation. In practice, acknowledgements from nodes X and Z can arrive to the coordinator at a later time, after the coordinator acknowledges the client.
When our Consistency Level is set to
QUORUM, the coordinator must wait for a majority of nodes to acknowledge the write before it is considered successful. Since our Replication Factor is 3, we must wait for 2 acknowledgements (the third acknowledgement does not need to be returned):
During a read operation, the coordinator communicates with just enough replicas to guarantee that the required Consistency Level is met. Data is then returned to the client.
The Consistency Level is tunable per operation in CQL. This is known as tunable consistency. Sometimes response latency is more important, making it necessary to adjust settings on a per- query or operation level to override keyspace or even data center-wide consistency settings. In other words, the Consistency Level setting allows you to choose a point in the consistency vs. latency tradeoff.
The Consistency Level and Replication Factor both impact performance. The lower the Consistency Level and/or Replication Factor, the faster the read or write operation. However, there will be less fault tolerance if a node goes down.
The Consistency Level itself impacts availability. A higher Consistency Level (more nodes required to be on line) means less availability with less tolerance to tolerate node failures. A lower Consistency Level means more availability and more fault tolerance.
The following table shows what Consistency Levels are available for a read or write operation:
Scylla, as do many distributed database systems, adheres to the CAP Theorem. The CAP Theorem is the notion that Consistency, Availability and Partition Tolerance of data are mutually dependent in a distributed system. Increasing any 2 of these factors will reduce the third.
Scylla adheres to the CAP theorem in the following way:
Scylla chooses availability and partition tolerance over consistency, such that:
- It’s impossible to be both consistent and highly available during a network partition;
- If we sacrifice consistency, we can be highly available.
You’ll need to design your application around Scylla’s data modeling, but the net result is an application that will never go down.
See our Consistency Level Console Demo.
Terms for Review¶
The process of replicating data across nodes in a cluster.
A configurable setting which dictates how many replicas in a cluster that must acknowledge read or write operations.
The possibility for unique, per-query, Consistency Level settings. These are incremental and override fixed database settings intended to enforce data consistency. Such settings may be set directly from a CQL statement when response speed for a given query or operation is more important.
The total number of replica nodes across a given cluster. A Replication Factor of
1 means that the data will only exist on a single node in the cluster and will not have any fault tolerance. This number is a setting defined for each keyspace. All replicas share equal priority; there are no primary or master replicas.