A typical write in Scylla works according to the scenarios described in our Fault Tolerance documentation.
But what happens when a write request is sent to a Scylla node that is unresponsive due to reasons including heavy write load on a node, network issues, or even hardware failure? To ensure availability and consistency, Scylla implements hinted handoff.
In other words, Scylla saves a copy of the writes intended for down nodes, and replays them to the nodes when they are up later. Thus, the write operation flow, when a node is down, looks like this:
The co-ordinator determines all the replica nodes;
Based on the replication factor (RF) , the co-ordinator attempts to write to RF nodes;
If one node is down, acknowledgments are only returned from two nodes:
If the consistency level does not require responses from all replicas, the co-ordinator, V in this case, will respond to the client that the write was successful. The co-ordinator will write and store a hint for the missing node:
Once the down node comes up, the co-ordinator will replay the hint for that node. After the co-ordinator receives an acknowledgement of the write, the hint is deleted.
A co-ordinator stores hints for a handoff under the following conditions:
For down nodes;
If the replica doesn’t respond within
The co-ordinator will stop creating any hints for a dead node if the node’s downtime is greater than
Hinted handoff is enabled and managed by these settings in
hinted_handoff_enabled: enables or disables the hinted handoff feature completely or enumerates data centers where hints are allowed. By default, “true” enables hints to all nodes.
max_hint_window_in_ms: do not generate hints if the destination node has been down for more than this value. If a node is down longer than this period, new hints are not created. Hint generation resumes once the destination node is back up. By default, this is set to 3 hours.
hints_directory: the directory where Scylla will store hints. By default this is
Storing of the hint can also fail. Enabling hinted handoff therefore does not eliminate the need for repair; a user must recurrently run a full repair to ensure data consistency across the cluster nodes.
Hinted handoff was released as production-ready in Scylla Open Source 3.0 and Scylla Enterprise 2019.1.
© 2016, The Apache Software Foundation.
Apache®, Apache Cassandra®, Cassandra®, the Apache feather logo and the Apache Cassandra® Eye logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.