ScyllaDB University LIVE, FREE Virtual Training Event | March 21
Register for Free
ScyllaDB Documentation Logo Documentation
  • Deployments
    • Cloud
    • Server
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
    • Supported Driver Versions
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Install
Ask AI
ScyllaDB Docs ScyllaDB Manual Features Change Data Capture (CDC) Querying CDC Streams

Caution

You're viewing documentation for an unstable version of ScyllaDB Manual. Switch to the latest stable version.

Querying CDC Streams¶

Some use cases for CDC may require querying the log table periodically in short intervals. One way to do that would be to perform partition scans, where you don’t specify the partition (in this case, the stream) which you want to query, for example:

SELECT * FROM ks.t_scylla_cdc_log;

Although partition scans are convenient, they require the read coordinator to contact the entire cluster, not just a small set of replicas defined by the replication factor.

The recommended alternative is to query each stream separately:

SELECT * FROM ks.t_scylla_cdc_log WHERE "cdc$stream_id" = 0x365fd1a9ae34373954529ac8169dfb93;

With the above approach you can, for instance, build a distributed CDC consumer, where each of the consumer nodes queries only streams that are replicated to ScyllaDB nodes in proximity to the consumer node. This allows efficient, concurrent querying of streams, without putting strain on a single node due to a partition scan.

Reacting to stream changes¶

As explained in CDC Stream Changes, the set of used CDC stream IDs may change due to some events. You should then query the CDC description table to read the new set of stream IDs and the corresponding timestamp.

If you’re periodically querying streams and you don’t want to miss any writes that are sent to the old generation, you should query it at least one time after the old generation stops operating (which happens when the new generation starts operating).

Keep in mind that time is relative: every node has its own clock. Therefore you should make sure that the old generation stops operating from the point of view of every node in the cluster before you query it one last time and start querying the new generation.

Example: switching streams¶

Suppose that cdc_generation_timestamps contains the following entries:

 time
---------------------------------
 2020-03-25 16:05:29.484000+0000
 2020-03-25 12:44:43.006000+0000

(2 rows)

The currently operating generation’s timestamp is 2020-03-25 16:05:29.484000+0000 — the highest one in the above list. You’ve been periodically querying all streams in this generation. In the meantime, a new node is bootstrapped, hence a new generation appears:

 time
---------------------------------
 2020-03-25 17:21:45.360000+0000
 2020-03-25 16:05:29.484000+0000
 2020-03-25 12:44:43.006000+0000

(3 rows)

You should keep querying streams from generation 2020-03-25 16:05:29.484000+0000 until after you make sure that every node’s clock moved past 2020-03-25 17:21:45.360000+0000. One way to do that is to connect to each node and use the now() function:

$ cqlsh 127.0.0.1
Connected to  at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select totimestamp(now()) from system.local;

 system.totimestamp(system.now())
----------------------------------
  2020-03-25 17:24:34.104000+0000

(1 rows)
cqlsh>
$ cqlsh 127.0.0.4
Connected to  at 127.0.0.4:9042.
[cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select totimestamp(now()) from system.local;

 system.totimestamp(system.now())
----------------------------------
  2020-03-25 17:24:42.038000+0000

(1 rows)

and so on. After you make sure that every node uses the new generation, you can query streams from the previous generation one last time, and then switch to querying streams from the new generation.

Querying CDC Streams¶

The system tables used for CDC stream descriptions differ depending on whether your keyspace uses vnodes or tablets. The following sections describe how to query CDC streams for each keyspace type:

  • Vnode-based keyspaces

  • Tablets-based keyspaces

Note

We highly recommend using the newest releases of our client CDC libraries (Java CDC library, Go CDC library, Rust CDC library). They take care of correctly querying the stream description tables and they handle the upgrade procedure for you.

Vnode-based keyspaces¶

To query the log table without performing partition scans, you need to know which streams to look at. For this you can use the system_distributed.cdc_generation_timestamps and system_distributed.cdc_streams_descriptions_v2 tables.

Example: querying the CDC description table¶

  1. Retrieve the timestamp of the currently operating CDC generation from the cdc_generation_timestamps table. If you have a multi-node cluster, query the table with QUORUM or ALL consistency level so you don’t miss any entry:

    CONSISTENCY QUORUM;
    SELECT time FROM system_distributed.cdc_generation_timestamps WHERE key = 'timestamps';
    

    The query can return multiple entries:

     time
    ---------------------------------
     2020-03-25 16:05:29.484000+0000
     2020-03-25 12:44:43.006000+0000
    
    (2 rows)
    

    Take the highest one. In our case this is 2020-03-25 16:05:29.484000+0000.

  2. Retrieve the list of stream IDs in the current CDC generation from the cdc_streams_descriptions_v2 table. Unfortunately, to use the time-date value in a WHERE clause, you have to modify the format of the time-date a little by removing the last three 0s before the +. In our case, the modified time-date is 2020-03-25 16:05:29.484+0000:

    CONSISTENCY QUORUM;
    SELECT streams FROM system_distributed.cdc_streams_descriptions_v2 WHERE time = '2020-03-25 16:05:29.484+0000';
    

    The result consists of a number of rows (most likely returned in multiple pages, unless you turned off paging), each row containing a list of stream IDs, such as:

     streams
    --------------------------------------------------------------------------------------------------------------
     {0x7ffe0c687fcce86e0783343730000001, 0x80000000000000010d9ee5f1f4000001, 0x800555555555555653e250f2d8000001}
     {0x807ae73e07dbd4122e32d36e08000011, 0x80800000000000001facbbb618000011, 0x80838c6b76e19a1bc3581db310000011}
     {0x80838c6b76e19a1c6da83d4d14000021, 0x80855555555555566d556a0a18000021, 0x808aaaaaaaaaaaabf1008f4120000021}
     {0x80c5343222b6eee636e3ed42d0000031, 0x80c5555555555556efd251b0b8000031, 0x80caaaaaaaaaaaabb9bde28998000031}
     ...
    
    (256 rows)
    

    Save all stream IDs returned by the query. When we ran the example, the query returned 256 * 3 = 768 stream IDs.

  3. Use the obtained stream IDs to query your CDC log tables:

    CREATE TABLE ks.t (pk int, ck int, v int, primary key (pk, ck)) WITH cdc = {'enabled': true};
    INSERT INTO ks.t (pk, ck, v) values (0, 0, 0);
    SELECT * FROM ks.t_scylla_cdc_log WHERE "cdc$stream_id" = 0x7ffe0c687fcce86e0783343730000001;
    SELECT * FROM ks.t_scylla_cdc_log WHERE "cdc$stream_id" = 0x80000000000000010d9ee5f1f4000001;
    ...
    

    Each change will be present in exactly one of these stream IDs. When we ran the example, it was:

    SELECT * FROM ks.t_scylla_cdc_log WHERE "cdc$stream_id" = 0xced00000000000009663c8dc500005a1;
    
     cdc$stream_id                      | cdc$time                             | cdc$batch_seq_no | cdc$deleted_v | cdc$end_of_batch | cdc$operation | cdc$ttl | ck | pk | v
    ------------------------------------+--------------------------------------+------------------+---------------+------------------+---------------+---------+----+----+---
     0xced00000000000009663c8dc500005a1 | 7a370e64-819f-11eb-c419-1f717873d8fa |                0 |          null |             True |             2 |    null |  0 |  0 | 0
    
    (1 rows)
    

Query all streams to read the entire CDC log.

Tablets-based keyspaces¶

Scylla exposes two system tables to provide information about CDC streams for CDC consumers:

  • system.cdc_timestamps: This table records the timestamps when CDC streams are changed. CDC consumers use this table to learn when any changes to streams have occurred. After discovering a relevant timestamp in system.cdc_timestamps, the consumer can then query the system.cdc_streams table for that specific timestamp to get detailed information about the streams at that point in time.

  • system.cdc_streams: For each timestamp, this table shows the set of streams operating at that timestamp, as well as the changes from the previous timestamp (such as streams being opened or closed). Each row includes the stream’s state (stream_state), which describes whether the stream is active, opened, or closed at this timestamp.

The stream_state column in system.cdc_streams formally describes the lifecycle of a stream at a given timestamp:

  • 0 (active): The stream is active and can be queried for CDC data at this timestamp.

  • 1 (closed): The stream was closed at this timestamp; no new CDC data will be written to this stream after this point.

  • 2 (opened): The stream was opened at this timestamp; CDC data for this stream starts from this point.

To list all available CDC streams for a tablets-based keyspace:

  1. Retrieve the timestamps of the CDC stream sets for your table:

    SELECT timestamp FROM system.cdc_timestamps WHERE keyspace_name = 'ks' AND table_name = 't';
    

    The query returns all timestamps in descending order. The first timestamp is the timestamp for the currently operating CDC stream set. For example:

    timestamp
    ---------------------------------
    2025-09-02 15:34:42.467000+0000
    2025-09-02 15:33:27.888000+0000
    
    (2 rows)
    
  2. Retrieve all CDC streams for a specific timestamp (stream_state = 0 means the stream is active at this timestamp):

    SELECT stream_id FROM system.cdc_streams WHERE keyspace_name = 'ks' AND table_name = 't' AND timestamp = '2025-09-02 15:34:42.467+0000' AND stream_state = 0;
    

    For example, the query can return:

    stream_id
    ------------------------------------
    0xbfffffffffffffffa15608ebf0000001
    0xffffffffffffffff372c68c25c000001
    0x3fffffffffffffff73b3f26904000001
    0x7fffffffffffffff1ef74fe610000001
    
    (4 rows)
    

    Or, you can query for the streams that were opened or closed at a specific timestamp as follows:

    SELECT stream_id FROM system.cdc_streams WHERE keyspace_name = 'ks' AND table_name = 't' AND timestamp = '2025-09-02 15:34:42.467+0000' AND stream_state >= 1 AND stream_state <= 2;
    

    returns:

    stream_state | stream_id
    -------------+------------------------------------
               1 | 0xffffffffffffffffdb6cb86b34000001
               1 | 0x7fffffffffffffff0ded3e1868000001
               2 | 0xbfffffffffffffffa15608ebf0000001
               2 | 0xffffffffffffffff372c68c25c000001
               2 | 0x3fffffffffffffff73b3f26904000001
               2 | 0x7fffffffffffffff1ef74fe610000001
    
    (4 rows)
    
  3. Use the obtained stream IDs to query your CDC log tables:

    SELECT * FROM ks.t_scylla_cdc_log WHERE "cdc$stream_id" = 0xffffffffffffffffdb6cb86b34000001;
    SELECT * FROM ks.t_scylla_cdc_log WHERE "cdc$stream_id" = 0x7fffffffffffffff0ded3e1868000001;
    ...
    

    Query all streams to read the entire CDC log.

Garbage collection of CDC streams metadata¶

For tablets-based keyspaces, Scylla periodically performs garbage collection of old CDC streams metadata in the system.cdc_timestamps and system.cdc_streams tables. This process removes information about streams that are no longer needed, helping to prevent unbounded growth of the metadata tables.

The garbage collection process runs periodically in the background and examines streams that have been closed. It removes information about a stream if the stream’s close timestamp is older than the configured TTL of the CDC table. Since the stream has been closed for longer than the TTL, this means that all rows in this stream have also exceeded their TTL and expired, unless the table’s TTL was altered to a smaller value after some rows have been written.

Warning

When altering the TTL of a CDC table to a smaller value, you can lose information about streams that still contain live rows. Make sure to read all the information you need from the system.cdc_timestamps and system.cdc_streams tables before performing such alterations.

Was this page helpful?

PREVIOUS
CDC Stream Changes
NEXT
Advanced column types
  • Create an issue
  • Edit this page

On this page

  • Querying CDC Streams
    • Reacting to stream changes
      • Example: switching streams
    • Querying CDC Streams
      • Vnode-based keyspaces
        • Example: querying the CDC description table
      • Tablets-based keyspaces
        • Garbage collection of CDC streams metadata
ScyllaDB Manual
  • master
    • master
    • 2025.4
    • 2025.3
    • 2025.2
    • 2025.1
  • Getting Started
    • Install ScyllaDB
      • Launch ScyllaDB on AWS
      • Launch ScyllaDB on GCP
      • Launch ScyllaDB on Azure
      • ScyllaDB Web Installer for Linux
      • Install ScyllaDB Linux Packages
      • Install scylla-jmx Package
      • Run ScyllaDB in Docker
      • Install ScyllaDB Without root Privileges
      • Air-gapped Server Installation
      • ScyllaDB Housekeeping and how to disable it
      • ScyllaDB Developer Mode
    • Configure ScyllaDB
    • ScyllaDB Configuration Reference
    • ScyllaDB Requirements
      • System Requirements
      • OS Support
      • Cloud Instance Recommendations
      • ScyllaDB in a Shared Environment
    • Migrate to ScyllaDB
      • Migration Process from Cassandra to ScyllaDB
      • ScyllaDB and Apache Cassandra Compatibility
      • Migration Tools Overview
    • Integration Solutions
      • Integrate ScyllaDB with Spark
      • Integrate ScyllaDB with KairosDB
      • Integrate ScyllaDB with Presto
      • Integrate ScyllaDB with Elasticsearch
      • Integrate ScyllaDB with Kubernetes
      • Integrate ScyllaDB with the JanusGraph Graph Data System
      • Integrate ScyllaDB with DataDog
      • Integrate ScyllaDB with Kafka
      • Integrate ScyllaDB with IOTA Chronicle
      • Integrate ScyllaDB with Spring
      • Shard-Aware Kafka Connector for ScyllaDB
      • Install ScyllaDB with Ansible
      • Integrate ScyllaDB with Databricks
      • Integrate ScyllaDB with Jaeger Server
      • Integrate ScyllaDB with MindsDB
  • ScyllaDB for Administrators
    • Administration Guide
    • Procedures
      • Cluster Management
      • Backup & Restore
      • Change Configuration
      • Maintenance
      • Best Practices
      • Benchmarking ScyllaDB
      • Migrate from Cassandra to ScyllaDB
      • Disable Housekeeping
    • Security
      • ScyllaDB Security Checklist
      • Enable Authentication
      • Enable and Disable Authentication Without Downtime
      • Creating a Custom Superuser
      • Generate a cqlshrc File
      • Reset Authenticator Password
      • Enable Authorization
      • Grant Authorization CQL Reference
      • Certificate-based Authentication
      • Role Based Access Control (RBAC)
      • ScyllaDB Auditing Guide
      • Encryption: Data in Transit Client to Node
      • Encryption: Data in Transit Node to Node
      • Generating a self-signed Certificate Chain Using openssl
      • Configure SaslauthdAuthenticator
      • Encryption at Rest
      • LDAP Authentication
      • LDAP Authorization (Role Management)
      • Software Bill Of Materials (SBOM)
    • Admin Tools
      • Nodetool Reference
      • CQLSh
      • Admin REST API
      • Tracing
      • ScyllaDB SStable
      • ScyllaDB Types
      • SSTableLoader
      • cassandra-stress
      • SSTabledump
      • SSTableMetadata
      • ScyllaDB Logs
      • Seastar Perftune
      • Virtual Tables
      • Reading mutation fragments
      • Maintenance socket
      • Maintenance mode
      • Task manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
    • ScyllaDB Manager
    • Upgrade Procedures
      • About Upgrade
      • Upgrade Guides
    • System Configuration
      • System Configuration Guide
      • scylla.yaml
      • ScyllaDB Snitches
    • Benchmarking ScyllaDB
    • ScyllaDB Diagnostic Tools
  • ScyllaDB for Developers
    • Develop with ScyllaDB
    • Tutorials and Example Projects
    • Learn to Use ScyllaDB
    • ScyllaDB Alternator
    • ScyllaDB Drivers
  • CQL Reference
    • CQLSh: the CQL shell
    • Reserved CQL Keywords and Types (Appendices)
    • Compaction
    • Consistency Levels
    • Consistency Level Calculator
    • Data Definition
    • Data Manipulation
      • SELECT
      • INSERT
      • UPDATE
      • DELETE
      • BATCH
    • Data Types
    • Definitions
    • Global Secondary Indexes
    • Expiring Data with Time to Live (TTL)
    • Functions
    • Wasm support for user-defined functions
    • JSON Support
    • Materialized Views
    • DESCRIBE SCHEMA
    • Service Levels
    • ScyllaDB CQL Extensions
  • Alternator: DynamoDB API in Scylla
    • Getting Started With ScyllaDB Alternator
    • ScyllaDB Alternator for DynamoDB users
    • Alternator-specific APIs
  • Features
    • Lightweight Transactions
    • Global Secondary Indexes
    • Local Secondary Indexes
    • Materialized Views
    • Counters
    • Change Data Capture
      • CDC Overview
      • The CDC Log Table
      • Basic operations in CDC
      • CDC Streams
      • CDC Stream Changes
      • Querying CDC Streams
      • Advanced column types
      • Preimages and postimages
      • Data Consistency in CDC
    • Workload Attributes
    • Workload Prioritization
    • Backup and Restore
    • Incremental Repair
  • ScyllaDB Architecture
    • Data Distribution with Tablets
    • ScyllaDB Ring Architecture
    • ScyllaDB Fault Tolerance
    • Consistency Level Console Demo
    • ScyllaDB Anti-Entropy
      • ScyllaDB Hinted Handoff
      • ScyllaDB Read Repair
      • ScyllaDB Repair
    • SSTable
      • ScyllaDB SSTable - 2.x
      • ScyllaDB SSTable - 3.x
    • Compaction Strategies
    • Raft Consensus Algorithm in ScyllaDB
    • Zero-token Nodes
  • Troubleshooting ScyllaDB
    • Errors and Support
      • Report a ScyllaDB problem
      • Error Messages
      • Change Log Level
    • ScyllaDB Startup
      • Ownership Problems
      • ScyllaDB will not Start
      • ScyllaDB Python Script broken
    • Upgrade
      • Inaccessible configuration files after ScyllaDB upgrade
    • Cluster and Node
      • Handling Node Failures
      • Failure to Add, Remove, or Replace a Node
      • Failed Decommission Problem
      • Cluster Timeouts
      • Node Joined With No Data
      • NullPointerException
      • Failed Schema Sync
    • Data Modeling
      • ScyllaDB Large Partitions Table
      • ScyllaDB Large Rows and Cells Table
      • Large Partitions Hunting
      • Failure to Update the Schema
    • Data Storage and SSTables
      • Space Utilization Increasing
      • Disk Space is not Reclaimed
      • SSTable Corruption Problem
      • Pointless Compactions
      • Limiting Compaction
    • CQL
      • Time Range Query Fails
      • COPY FROM Fails
      • CQL Connection Table
    • ScyllaDB Monitor and Manager
      • Manager and Monitoring integration
      • Manager lists healthy nodes as down
    • Installation and Removal
      • Removing ScyllaDB on Ubuntu breaks system packages
  • Knowledge Base
    • Upgrading from experimental CDC
    • Compaction
    • Consistency in ScyllaDB
    • Counting all rows in a table is slow
    • CQL Query Does Not Display Entire Result Set
    • When CQLSh query returns partial results with followed by “More”
    • Run ScyllaDB and supporting services as a custom user:group
    • Customizing CPUSET
    • Decoding Stack Traces
    • Snapshots and Disk Utilization
    • DPDK mode
    • Debug your database with Flame Graphs
    • Efficient Tombstone Garbage Collection in ICS
    • How to Change gc_grace_seconds for a Table
    • Gossip in ScyllaDB
    • Increase Permission Cache to Avoid Non-paged Queries
    • How does ScyllaDB LWT Differ from Apache Cassandra ?
    • Map CPUs to ScyllaDB Shards
    • ScyllaDB Memory Usage
    • NTP Configuration for ScyllaDB
    • Updating the Mode in perftune.yaml After a ScyllaDB Upgrade
    • POSIX networking for ScyllaDB
    • ScyllaDB consistency quiz for administrators
    • Recreate RAID devices
    • How to Safely Increase the Replication Factor
    • ScyllaDB and Spark integration
    • Increase ScyllaDB resource limits over systemd
    • ScyllaDB Seed Nodes
    • How to Set up a Swap Space
    • ScyllaDB Snapshots
    • ScyllaDB payload sent duplicated static columns
    • Stopping a local repair
    • System Limits
    • How to flush old tombstones from a table
    • Time to Live (TTL) and Compaction
    • ScyllaDB Nodes are Unresponsive
    • Update a Primary Key
    • Using the perf utility with ScyllaDB
    • Configure ScyllaDB Networking with Multiple NIC/IP Combinations
  • Reference
    • AWS Images
    • Azure Images
    • GCP Images
    • Configuration Parameters
    • Glossary
    • Limits
    • API Reference
      • Authorization Cache
      • Cache Service
      • Collectd
      • Column Family
      • Commit Log
      • Compaction Manager
      • Endpoint Snitch Info
      • Error Injection
      • Failure Detector
      • Gossiper
      • Hinted Handoff
      • LSA
      • Messaging Service
      • Raft
      • Storage Proxy
      • Storage Service
      • Stream Manager
      • System
      • Task Manager Test
      • Task Manager
      • Tasks
    • Metrics
  • ScyllaDB FAQ
  • 2024.2 and earlier documentation
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 05 Dec 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.9
Ask AI