Backup your Data

Even though Scylla is a fault-tolerant system, it is recommended to regularly backup the data to an external storage. Backup is per node procedure. Make sure to backup each node in your cluster. Backup includes two procedures. These are:

Full Backup - Snapshots

Snapshots are taken using nodetool snapshot. The command first flushes the MemTables from memory to SSTables on disk, then create a hard link for each SSTable in each keyspace. With time, SSTables are compacted, but the hard link keeps a copy of each file. This takes up and increasing amount of disk space. It is important to clear space by clean unnecessary snapshots.


1. The data can only be restored from a snapshot if the table schema exists. Backup your schema, with the following command:
$: cqlsh -e "DESC SCHEMA" > <schema_name.cql>

For example:

$: cqlsh -e "DESC SCHEMA" > db_schema.cql

2. Take a snaphost
$ nodetool snapshot <KEYSPACE_NAME>
For example:
$ nodetool snapshot mykeyspace
The snapshot is created under Scylla data directory /var/lib/scylla/data
It will have the following structure:
For example:

Incremental Backup

Enabling the incremental backup (disabled by default) will create a hard-link from each SSTable, right after it is flushed, to a backups directory.
For a complete point in time backup the following is required: a snapshot plus incremental backups and commit logs since from the time of the snapshot. Make sure to delete unnecessary incremental backups, Scylla does not do this automatically.


1. In the /etc/scylla/scylla.yaml file set the incremental backups parameters to true and restart the Scylla service. Snapshot are created under Scylla data directory /var/lib/scylla/data
with the following structure:
For example: