Scylla Large Rows and Large Cells Tables

New in version 3.1.

This document describes how to detect large rows and large cells in Scylla. Scylla is not optimized for very large rows or large cells. They require allocation of large, contiguous memory areas and therefore may increase latency. Rows may also grow over time. For example, many insert operations may add elements to the same collection, or a large blob can be inserted in a single operation.

Similar to the large partitions table, the large rows and large cells tables are updated when sstables are written or deleted, for example, on memtable flush or during compaction.

Find Large Rows

  • Search for large rows.

For example:

> SELECT * FROM system.from system.large_rows;

   keyspace_name | table_name | sstable_name                                                                         | row_size | partition_key | clustering_key | compaction_time
   --------------+------------+--------------------------------------------------------------------------------------+----------+---------------+----------------+---------------------------------
   mykeyspace    |         gr | /var/lib/scylla/data/mykeyspace/gr-67d502908ea211e98d05000000000000/mc-1-big-Data.db |  1206130 |             1 |              1 | 2019-06-14 13:03:24.039000+0000
Parameter Description
keyspace_name The keyspace name that holdes the large partition
table_name The table name that holdes the large partition
sstable_name The SSTable name that holdes the large partition
row_size The size of the row
clustring_key The clustering key that holdes the large row
compaction_time Time when compaction occur
  • Search within all the large rows for a specific keyspace and or table.

For example we are looking for the keyspace demodb and table tmcr:

SELECT * FROM system.large_rows WHERE keyspace_name = 'demodb' AND table_name = 'tmcr;

Find Large Cells

  • Search for large cells.

For example:

> SELECT * FROM system.from system.large_cells;


keyspace_name | table_name | sstable_name                                                                         | cell_size | partition_key | clustering_key | column_name | compaction_time
--------------+------------+--------------------------------------------------------------------------------------+-----------+---------------+----------------+-------------+---------------------------------
mykeyspace    |         gr | /var/lib/scylla/data/mykeyspace/gr-67d502908ea211e98d05000000000000/mc-1-big-Data.db |   1206115 |             1 |              1 |        link | 2019-06-14 13:03:24.034000+0000
Parameter Description
keyspace_name The keyspace name that holdes the large partition
table_name The table name that holdes the large partition
sstable_name The SSTable name that holdes the large partition
row_size The size of the row
clustring_key The clustering key that holdes the large row
column_name The column of the large cell
compaction_time Time when compaction occur
  • Search within all the large cells for a specific keyspace and or table.

For example we are looking for the keyspace demodb and table tmcr:

SELECT * FROM system.large_cells WHERE keyspace_name = 'demodb' AND table_name = 'tmcr;

Configure

Configuration of the large row and cell detection threshold in the scylla.yaml file using the following parameters:

  • compaction_large_row_warning_threshold_mb parameter (default: 10MB).
  • compaction_large_cell_warning_threshold_mb parameter (default: 1MB).

Once the threshold is reached, the relevant information is captured in the large_row / large_cell table. In addition, a warning message is logged in the Scylla log (refer to logging).

Storing

Large rows and large cells are stored in system tables with the following schemas:

CREATE TABLE system.large_rows (
    keyspace_name text,
    table_name text,
    sstable_name text,
    row_size bigint,
    partition_key text,
    clustering_key text,
    compaction_time timestamp,
    PRIMARY KEY ((keyspace_name, table_name), sstable_name, row_size, partition_key, clustering_key)
) WITH CLUSTERING ORDER BY (sstable_name ASC, row_size DESC, partition_key ASC, clustering_key ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = 'rows larger than specified threshold'
    AND compaction = {'class': 'SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

CREATE TABLE system.large_cells (
    keyspace_name text,
    table_name text,
    sstable_name text,
    cell_size bigint,
    partition_key text,
    clustering_key text,
    column_name text,
    compaction_time timestamp,
    PRIMARY KEY ((keyspace_name, table_name), sstable_name, cell_size, partition_key, clustering_key, column_name)
) WITH CLUSTERING ORDER BY (sstable_name ASC, cell_size DESC, partition_key ASC, clustering_key ASC, column_name ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = 'cells larger than specified threshold'
    AND compaction = {'class': 'SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Expiring Data

In order to prevent stale data from appearing, all rows in the system.large_rows and system.large_cells tables are inserted with Time To Live (TTL) equal to 30 days.

Troubleshoot