Run a Repair

When you create a cluster a repair job is automatically scheduled. This task is set to occur each week by default, but you can change it to another time, or add additional repair tasks. It is important to make sure that data across the nodes is consistent when maintaining your clusters.

Why repair with Scylla Manager

Scylla Manager automates the repair process and allows you to manage how and when the repair occurs. The advantage of repairing the cluster with Scylla Manager is:

  • Clusters are repaired node by node, ensuring that each database shard performs exactly one repair task at a time. This gives the best repair parallelism on a node, shortens the overall repair time, and does not introduce unnecessary load.
  • If there is an error, Scylla Manager’s retry mechanism will try to run the repair again up to the number of retries that you set.
  • It has a restart (pause) mechanism that allows for restarting a repair from where it left off.
  • Repair what you want, when you want, and how often you want. Manager gives you that flexibility.
  • The most apparent advantage is that with Manager you do not have to manually SSH into every node as you do with nodetool.

What can you repair with Scylla Manager

Scylla Manager can repair any item which it manages, specifically:

  • Specific tables, keyspaces, clusters, or data centers.
  • A group of tables, keyspaces, clusters or data centers.
  • All tables, keyspaces, clusters, or data centers.

What sort of repairs can I run with Scylla Manager

You can run two types of repairs:

  • Ad-hoc - this is a one time repair
  • Scheduled - this repair is scheduled in advance and can repeat

Schedule a Repair

By default, a cluster successfully added to Scylla Manager has a repair task created for it which repairs the entire cluster. This is a repeating task which runs every week. You can change this repair, add additional repairs, or delete this repair. You can schedule repairs to run in the future on a regular basis, schedule repairs to run once, or schedule repairs to run immediately on an as needed basis. Any repair can be rescheduled, paused, resumed, or deleted. For information on what is repaired and the types of repairs available see What can you repair with Scylla Manager.

Create a scheduled repair

While the most recommended way to run a repair is across an entire cluster, repairs can be scheduled to run on a single/multiple datacenters, keyspaces, or tables. Scheduled repairs run every X days depending on the frequency you set. The procedure here shows the most frequently used repair command. Additional parameters are located in the sctool Reference.

Procedure

Run the following sctoool repair command, replacing the parameters with your own parameters:

  • -c - cluster name - replace prod-cluster with the name of your cluster
  • -s - start-time - replace 2018-01-02T15:04:05Z07:00 with the time you want the repair to begin
  • -i - interval - replace -i 7d with your own time interval

For example:

sctool repair -c prod-cluster -s 2018-01-02T15:04:05Z07:00 -i 7d
repair/3208ff15-6e8f-48b2-875c-d3c73f545410
  1. The command returns the task ID. You will need this ID for additional actions.
  2. If you want to run the repair only once, remove the -i argument.
  3. If you want to run this command immediately, but still want the repair to repeat, keep the interval argument (-i), but remove the start-date (-s).

Schedule an ad-hoc repair

An ad-hoc repair runs immediately and does not repeat. This procedure shows the most frequently used repair command. Additional parameters can be used. Refer to the sctool Reference.

Procedure

  1. Run the following command, replacing the -c argument with your cluster name:
sctool repair -c prod-cluster
repair/3201ff14-6e8f-72b2-875c-d3c73f524410
  1. The command returns the task ID. You will need this ID for additional actions.

Reschedule a Repair

You can change the run time of a scheduled repair using the update repair command. The new time you set replaces the time which was previously set. This command requires the task ID which was generated when you set the repair. This can be retrieved using the command sctool task list.

This example updates a task to run in 3 hours instead of whatever time it was supposed to run.

sctool task update -c prod-cluster repair/143d160f-e53c-4890-a9e7-149561376cfd -s now+3h

To start a scheduled repair immediately, run the following command inserting the task id and cluster name:

sctool task start repair/143d160f-e53c-4890-a9e7-149561376cfd -c prod-cluster

Pause a Repair

Pauses a specified task, provided it is running. You will need the task ID for this action. This can be retrieved using the command sctool task list. To start the task again see Resume a Repair.

sctool task stop repair/143d160f-e53c-4890-a9e7-149561376cfd -c prod-cluster

Resume a Repair

Re-start a repair that is currently in the paused state. To start running a repair which is scheduled, but is currently not running, use the task update command. See Reschedule a Repair. You will need the task ID for this action. This can be retrieved using the command sctool task list.

sctool task start repair/143d160f-e53c-4890-a9e7-149561376cfd -c prod-cluster

Delete a Repair

This action removes the repair from the task list. Once removed, you cannot resume the repair. You will have to create a new one. You will need the task ID for this action. This can be retrieved using the command sctool task list.

sctool task delete repair/143d160f-e53c-4890-a9e7-149561376cfd -c prod-cluster