ScyllaDB University LIVE, FREE Virtual Training Event | March 21
Register for Free
ScyllaDB Documentation Logo Documentation
  • Server
    • ScyllaDB Open Source
    • ScyllaDB Enterprise
    • ScyllaDB Alternator
  • Cloud
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
Download
Menu
ScyllaDB Docs ScyllaDB Open Source Scylla for Administrators Admin Tools Scylla SStable

Scylla SStable¶

New in version 5.0.

Introduction¶

This tool allows you to examine the content of SStables by performing operations such as dumping the content of SStables, generating a histogram, validating the content of SStables, and more. See Supported Operations for the list of available operations.

Run scylla sstable --help for additional information about the tool and the operations.

This tool is similar to SStableDump, with notable differences:

  • Built on the ScyllaDB C++ codebase, it supports all SStable formats and components that ScyllaDB supports.

  • Expanded scope: this tool supports much more than dumping SStable data components (see Supported Operations).

  • More flexible on how schema is obtained and where SStables are located: SStableDump only supports dumping SStables located in their native data directory. To dump an SStable, one has to clone the entire ScyllaDB data directory tree, including system table directories and even config files. scylla sstable can dump sstables from any path with multiple choices on how to obtain the schema, see Schema.

Currently, SStableDump works better on production systems as it automatically loads the schema from the system tables, unlike scylla sstable, which has to be provided with the schema explicitly. On the other hand scylla sstable works better for off-line investigations, as it can be used with as little as just a schema definition file and a single sstable. In the future we plan on closing this gap – adding support for automatic schema-loading for scylla sstable too – and completely supplant SStableDump with scylla sstable.

Usage¶

Syntax¶

The command syntax is as follows:

scylla sstable <operation> <path to SStable>

You can specify more than one SStable.

Schema¶

All operations need a schema to interpret the SStables with. Currently, there are two ways to obtain the schema:

  • --schema-file FILENAME - Read the schema definition from a file.

  • --system-schema KEYSPACE.TABLE - Use the known definition of built-in tables (only works for system tables).

By default, the tool uses the first method: --schema-file schema.cql; i.e. it assumes there is a schema file named schema.cql in the working directory. If this fails, it will exit with an error.

The schema file should contain all definitions needed to interpret data belonging to the table.

Example schema.cql:

CREATE KEYSPACE ks WITH replication = {'class': 'NetworkTopologyStrategy', 'mydc1': 1, 'mydc2': 4};

CREATE TYPE ks.mytype (
    f1 int,
    f2 text
);

CREATE TABLE ks.cf (
    pk int,
    ck text,
    v1 int,
    v2 mytype,
    PRIMARY KEY (pk, ck)
);

Note:

  • In addition to the table itself, the definition also has to includes any user defined types the table uses.

  • The keyspace definition is optional, if missing one will be auto-generated.

  • The schema file doesn’t have to be called schema.cql, this is just the default name. Any file name is supported (with any extension).

Dropped columns¶

The examined sstable might have columns which were dropped from the schema definition. In this case providing the up-do-date schema will not be enough, the tool will fail when attempting to process a cell for the dropped column. Dropped columns can be provided to the tool in the form of insert statements into the system_schema.dropped_columns system table, in the schema definition file. Example:

INSERT INTO system_schema.dropped_columns (
    keyspace_name,
    table_name,
    column_name,
    dropped_time,
    type
) VALUES (
    'ks',
    'cf',
    'v1',
    1631011979170675,
    'int'
);

CREATE TABLE ks.cf (pk int PRIMARY KEY, v2 int);

System tables¶

If the examined table is a system table – it belongs to one of the system keyspaces (system, system_schema, system_distributed or system_distributed_everywhere) – you can just tell the tool to use the known built-in definition of said table. This is possible with the --system-schema flag. Example:

scylla sstable dump-data --system-schema system.local ./path/to/md-123456-big-Data.db

Supported Operations¶

The dump-* operations output JSON. For dump-data, you can specify another output format.

  • dump-data - Dumps the content of the SStable. You can use it with additional parameters:

    • --merge - Allows you to process multiple SStables as a unified stream (if not specified, multiple SStables are processed one by one).

    • --partition={{<partition key>}} or partitions-file={{<partition key>}} - Allows you to narrow down the scope of the operation to specified partitions. To specify the partition(s) you want to be processed, provide partition keys in the hexdump format used by ScyllaDB (the hex representation of the raw buffer).

    • --output-format=<format> - Allows you to specify the output format: json or text.

  • dump-index - Dumps the content of the SStable index.

  • dump-compression-info - Dumps the SStable compression information, including compression parameters and mappings between compressed and uncompressed data.

  • dump-summary - Dumps the summary of the SStable index.

  • dump-statistics - Dumps the statistics of the SStable, including metadata about the data component.

  • dump-scylla-metadata - Dumps the SStable’s scylla-specific metadata.

  • writetime-histogram - Generates a histogram of all the timestamps in the SStable. You can use it with a parameter:

    • --bucket=<unit> - Allows you to specify the unit of time to be used as bucket (years, months, weeks, days, or hours).

  • validate - Validates the content of the SStable with the mutation fragment stream validator.

  • validate-checksums - Validates SStable checksums (full checksum and per-chunk checksum) against the SStable data.

  • decompress - Decompresses the data component of the SStable (the *-Data.db file) if compressed. The decompressed data is written to a *-Data.decompressed file.

Examples¶

Dumping the content of the SStable:

scylla sstable dump-data /path/to/md-123456-big-Data.db

Dumping the content of two SStables as a unified stream:

scylla sstable dump-data --merge /path/to/md-123456-big-Data.db /path/to/md-123457-big-Data.db

Validating the specified SStables:

scylla sstable validate /path/to/md-123456-big-Data.db /path/to/md-123457-big-Data.db
PREVIOUS
Tracing
NEXT
Scylla Types
ScyllaDB Open Source
  • 5.1
    • master
    • 5.2
    • 5.1
  • Getting Started
    • Install Scylla
      • ScyllaDB Web Installer for Linux
      • Scylla Unified Installer (relocatable executable)
      • Air-gapped Server Installation
      • What is in each RPM
      • Scylla Housekeeping and how to disable it
      • Scylla Developer Mode
      • Scylla Configuration Reference
    • Configure Scylla
    • ScyllaDB Requirements
      • System Requirements
      • OS Support by Platform and Version
      • Scylla in a Shared Environment
    • Migrate to ScyllaDB
      • Migration Process from Cassandra to Scylla
      • Scylla and Apache Cassandra Compatibility
      • Migration Tools Overview
    • Integration Solutions
      • Integrate Scylla with Spark
      • Integrate Scylla with KairosDB
      • Integrate Scylla with Presto
      • Integrate Scylla with Elasticsearch
      • Integrate Scylla with Kubernetes
      • Integrate Scylla with the JanusGraph Graph Data System
      • Integrate Scylla with DataDog
      • Integrate Scylla with Kafka
      • Integrate Scylla with IOTA Chronicle
      • Integrate Scylla with Spring
      • Shard-Aware Kafka Connector for Scylla
      • Install Scylla with Ansible
      • Integrate Scylla with Databricks
    • Tutorials
  • Scylla for Administrators
    • Administration Guide
    • Procedures
      • Cluster Management
      • Backup & Restore
      • Change Configuration
      • Maintenance
      • Best Practices
      • Benchmarking Scylla
      • Migrate from Cassandra to Scylla
      • Disable Housekeeping
    • Security
      • Scylla Security Checklist
      • Enable Authentication
      • Enable and Disable Authentication Without Downtime
      • Generate a cqlshrc File
      • Reset Authenticator Password
      • Enable Authorization
      • Grant Authorization CQL Reference
      • Role Based Access Control (RBAC)
      • Scylla Auditing Guide
      • Encryption: Data in Transit Client to Node
      • Encryption: Data in Transit Node to Node
      • Generating a self-signed Certificate Chain Using openssl
      • Encryption at Rest
      • LDAP Authentication
      • LDAP Authorization (Role Management)
    • Admin Tools
      • Nodetool Reference
      • CQLSh
      • REST
      • Tracing
      • Scylla SStable
      • Scylla Types
      • SSTableLoader
      • cassandra-stress
      • SSTabledump
      • SSTable2json
      • SSTable Index
      • Scylla Logs
      • Seastar Perftune
      • Virtual Tables
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
    • ScyllaDB Manager
    • Upgrade Procedures
      • Scylla Enterprise
      • Scylla Open Source
      • Scylla Open Source to Scylla Enterprise
      • Scylla AMI
    • System Configuration
      • System Configuration Guide
      • scylla.yaml
      • Scylla Snitches
    • Benchmarking Scylla
  • Scylla for Developers
    • Learn To Use Scylla
      • Scylla University
      • Course catalog
      • Scylla Essentials
      • Basic Data Modeling
      • Advanced Data Modeling
      • MMS - Learn by Example
      • Care-Pet an IoT Use Case and Example
    • Scylla Alternator
    • Scylla Features
      • Scylla Open Source Features
      • Scylla Enterprise Features
    • Scylla Drivers
      • Scylla CQL Drivers
      • Scylla DynamoDB Drivers
  • CQL Reference
    • CQLSh: the CQL shell
    • Appendices
    • Compaction
    • Consistency Levels
    • Consistency Level Calculator
    • Data Definition
    • Data Manipulation
    • Data Types
    • Definitions
    • Global Secondary Indexes
    • Additional Information
    • Expiring Data with Time to Live (TTL)
    • Additional Information
    • Functions
    • JSON Support
    • Materialized Views
    • Non-Reserved CQL Keywords
    • Reserved CQL Keywords
    • ScyllaDB CQL Extensions
  • Scylla Architecture
    • Scylla Ring Architecture
    • Scylla Fault Tolerance
    • Consistency Level Console Demo
    • Scylla Anti-Entropy
      • Scylla Hinted Handoff
      • Scylla Read Repair
      • Scylla Repair
    • SSTable
      • Scylla SSTable - 2.x
      • ScyllaDB SSTable - 3.x
    • Compaction Strategies
    • Raft Consensus Algorithm in ScyllaDB
  • Troubleshooting Scylla
    • Errors and Support
      • Report a Scylla problem
      • Error Messages
      • Change Log Level
    • Scylla Startup
      • Ownership Problems
      • Scylla will not Start
      • Scylla Python Script broken
    • Cluster and Node
      • Failed Decommission Problem
      • Cluster Timeouts
      • Node Joined With No Data
      • SocketTimeoutException
      • NullPointerException
    • Data Modeling
      • Scylla Large Partitions Table
      • Scylla Large Rows and Cells Table
      • Large Partitions Hunting
    • Data Storage and SSTables
      • Space Utilization Increasing
      • Disk Space is not Reclaimed
      • SSTable Corruption Problem
      • Pointless Compactions
      • Limiting Compaction
    • CQL
      • Time Range Query Fails
      • COPY FROM Fails
      • CQL Connection Table
      • Reverse queries fail
    • Scylla Monitor and Manager
      • Manager and Monitoring integration
      • Manager lists healthy nodes as down
  • Knowledge Base
    • Upgrading from experimental CDC
    • Compaction
    • Counting all rows in a table is slow
    • CQL Query Does Not Display Entire Result Set
    • When CQLSh query returns partial results with followed by “More”
    • Run Scylla and supporting services as a custom user:group
    • Decoding Stack Traces
    • Snapshots and Disk Utilization
    • DPDK mode
    • Debug your database with Flame Graphs
    • How to Change gc_grace_seconds for a Table
    • Gossip in Scylla
    • Increase Permission Cache to Avoid Non-paged Queries
    • How does Scylla LWT Differ from Apache Cassandra ?
    • Map CPUs to Scylla Shards
    • Scylla Memory Usage
    • NTP Configuration for Scylla
    • Updating the Mode in perftune.yaml After a ScyllaDB Upgrade
    • POSIX networking for Scylla
    • Scylla consistency quiz for administrators
    • Recreate RAID devices
    • How to Safely Increase the Replication Factor
    • Scylla and Spark integration
    • Increase Scylla resource limits over systemd
    • Scylla Seed Nodes
    • How to Set up a Swap Space
    • Scylla Snapshots
    • Scylla payload sent duplicated static columns
    • Stopping a local repair
    • System Limits
    • How to flush old tombstones from a table
    • Time to Live (TTL) and Compaction
    • Scylla Nodes are Unresponsive
    • Update a Primary Key
    • Using the perf utility with Scylla
    • Configure Scylla Networking with Multiple NIC/IP Combinations
  • ScyllaDB University
  • Scylla FAQ
  • Contribute to ScyllaDB
  • Glossary
  • Alternator: DynamoDB API in Scylla
    • Getting Started With ScyllaDB Alternator
    • Scylla Alternator for DynamoDB users
  • Create an issue
  • Edit this page

On this page

  • Scylla SStable
    • Introduction
    • Usage
      • Syntax
      • Schema
        • Dropped columns
        • System tables
      • Supported Operations
      • Examples
Logo
Docs Contact Us About Us
Mail List Icon Slack Icon Forum Icon
© 2023, ScyllaDB. All rights reserved.
Last updated on 31 Mar 2023.
Powered by Sphinx 4.3.2 & ScyllaDB Theme 1.4.2