Data Definition¶
This reference covers CQL specification version 3.3.1¶
CQL stores data in tables, whose schema defines the layout of said data in the table, and those tables are grouped in keyspaces. A keyspace defines a number of options that apply to all the tables it contains, most prominently of which is the replication strategy used by the keyspace. An application can have only one keyspace. However, it is also possible to have multiple keyspaces in case your application has different replication requirements.
This section describes the statements used to create, modify, and remove keyspaces and tables.
Common Definitions¶
Keyspace and table names are defined by the following grammar:
keyspace_name: `name`
table_name: [ `keyspace_name` '.' ] `name`
name: `unquoted_name` | `quoted_name`
unquoted_name: re('[a-zA-Z_0-9]{1, 48}')
quoted_name: '"' `unquoted_name` '"'
Both keyspace and table names consist of only alphanumeric characters, cannot be empty, and are limited in
size to 48 characters (that limit exists mostly to avoid filenames, which may include the keyspace and table name, to go
over the limits of certain file systems). By default, keyspace and table names are case insensitive (myTable
is
equivalent to mytable
), but case sensitivity can be forced by using double-quotes ("myTable"
is different from
mytable
).
Further, a table is always part of a keyspace, and a table name can be provided fully-qualified by the keyspace it is part of. If it is not fully-qualified, the table is assumed to be in the current keyspace (see USE statement).
Further, valid column names are simply defined as:
column_name: `identifier`
We also define the notion of statement options for use in the following section:
options: `option` ( AND `option` )*
option: `identifier` '=' ( `identifier` | `constant` | `map_literal` )
In all cases, for creating keyspaces and tables, if you are using Reserved Keywords, enclose them in single or double-quotes.
CREATE KEYSPACE¶
A keyspace is created using a CREATE KEYSPACE
statement:
create_keyspace_statement: CREATE KEYSPACE [ IF NOT EXISTS ] `keyspace_name` WITH `options`
For example:
CREATE KEYSPACE Excalibur
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 3}
AND durable_writes = true;
CREATE KEYSPACE Excelsior
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
The supported options
are:
name |
kind |
mandatory |
default |
description |
---|---|---|---|---|
|
map |
yes |
The replication strategy and options to use for the keyspace (see details below). |
|
|
simple |
no |
true |
Whether to use the commit log for updates on this keyspace (disable this option at your own risk!). |
The replication
property is mandatory and must at least contains the 'class'
sub-option, which defines the
replication strategy class to use. The rest of the sub-options depend on what replication
strategy is used. By default, Scylla supports the following 'class'
:
'NetworkTopologyStrategy'
: A replication strategy that lets you set the replication factor independently for each data-center. The rest of the sub-options are key-value pairs where a key is a data-center name and its value is the associated replication factor, or a single key value with Auto-Expand Replication Factor. Recommended for production.'SimpleStrategy'
: defines a replication factor for the whole cluster. The only sub-options supported is'replication_factor'
to define that replication factor and is mandatory.
Note
Using NetworkTopologyStrategy is recommended. Using SimpleStrategy will make it harder to add Data Center in the future.
Attempting to create a keyspace that already exists will return an error unless the IF NOT EXISTS
option is used. If
it is used, the statement will be a no-op if the keyspace already exists.
Auto-Expand Replication Factor¶
New in version 3.2: Scylla Open Source
Using the ‘NetworkTopologyStrategy’, you can set a single replication factor for all Data Centers, including all existing Data Centers.
CREATE KEYSPACE Excalibur
WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3}
AND durable_writes = false;
Note that when adding new data centers, you need to set the Replication Factor using the auto-expansion option a second time, to include the added Data Centers.
For Example :
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3}
In addition, you can set the Replication Factor for the new Data Center manually. This is useful if you want the new Data Center to have a different Replication Factor. If you choose this method, you still need to state the old Data Centers’ replication factor. If not, the new value will replace the old one.
For Example :
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'new-dc' : 5}
USE¶
The USE
statement allows you to change the current keyspace (for the connection on which it is executed). Some objects in CQL are bound to a keyspace (tables, user-defined types, functions, …), and the current keyspace is the
default keyspace used when those objects are referred without a fully-qualified name (that is, without being prefixed a
keyspace name). A USE
statement simply takes the specified keyspace and uses the name as an argument for all future actions until this name is changed.
use_statement: USE `keyspace_name`
ALTER KEYSPACE¶
An ALTER KEYSPACE
statement lets you modify the options of a keyspace:
alter_keyspace_statement: ALTER KEYSPACE `keyspace_name` WITH `options`
For instance:
ALTER KEYSPACE Excelsior
WITH replication = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 0};
ALTER KEYSPACE Excelsior
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 4};
The supported options are the same as creating a keyspace.
DROP KEYSPACE¶
Dropping a keyspace can be done using the DROP KEYSPACE
statement:
drop_keyspace_statement: DROP KEYSPACE [ IF EXISTS ] `keyspace_name`
For instance:
DROP KEYSPACE Excelsior;
Dropping a keyspace results in the immediate removal of that keyspace, including all the tables, UTD and functions in it, and all the data contained in those tables.
Note
By default, when a table or a keyspace is removed, a snapshot is taken so that you can restore it later. As a result, the disk space remains the same and is not immediately reclaimed. Refer to this article or this FAQ entry.
If the keyspace does not exist, the statement will return an error unless IF EXISTS
is used, in which case the
operation is a no-op.
CREATE TABLE¶
Creating a new table uses the CREATE TABLE
statement:
create_table_statement: CREATE TABLE [ IF NOT EXISTS ] `table_name`
: '('
: `column_definition`
: ( ',' `column_definition` )*
: [ ',' PRIMARY KEY '(' `primary_key` ')' ]
: ')' [ WITH `table_options` ]
column_definition: `column_name` `cql_type` [ STATIC ] [ PRIMARY KEY]
primary_key: `partition_key` [ ',' `clustering_columns` ]
partition_key: `column_name`
: | '(' `column_name` ( ',' `column_name` )* ')'
clustering_columns: `column_name` ( ',' `column_name` )*
table_options: COMPACT STORAGE [ AND `table_options` ]
: | CLUSTERING ORDER BY '(' `clustering_order` ')' [ AND `table_options` ]
: | scylla_encryption_options: '=' '{'[`cipher_algorithm` : <hash>]','[`secret_key_strength` : <len>]','[`key_provider`: <provider>]'}'
: | `options`
clustering_order: `column_name` (ASC | DESC) ( ',' `column_name` (ASC | DESC) )*
For instance:
CREATE TABLE monkeySpecies (
species text PRIMARY KEY,
common_name text,
population varint,
average_size int
) WITH comment='Important biological records'
AND read_repair_chance = 1.0;
CREATE TABLE timeline (
userid uuid,
posted_month int,
posted_time uuid,
body text,
posted_by text,
PRIMARY KEY (userid, posted_month, posted_time)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };
CREATE TABLE loads (
machine inet,
cpu int,
mtime timeuuid,
load float,
PRIMARY KEY ((machine, cpu), mtime)
) WITH CLUSTERING ORDER BY (mtime DESC);
CREATE TABLE users_picture (
userid uuid,
pictureid uuid,
body text,
posted_by text,
PRIMARY KEY (userid, pictureid, posted_by)
) WITH compression = {'sstable_compression': 'LZ4Compressor'};
CREATE TABLE data_atrest (
pk text PRIMARY KEY,
c0 int
) WITH scylla_encryption_options = {
'cipher_algorithm' : 'AES/ECB/PKCS5Padding',
'secret_key_strength' : 128,
'key_provider': 'LocalFileSystemKeyProviderFactory',
'secret_key_file': '/etc/scylla/data_encryption_keys/secret_key'};
A CQL table has a name and is composed of a set of rows. Creating a table amounts to defining which columns the rows will be composed, which of those columns compose the primary key, as well as optional options for the table.
Attempting to create an already existing table will return an error unless the IF NOT EXISTS
directive is used. If
it is used, the statement will be a no-op if the table already exists.
Column definitions¶
Every row in a CQL table has a set of predefined columns defined at the time of the table creation (or added later using an alter statement).
A column_definition
is primarily comprised of the name of the column defined and its type,
which restricts which values are accepted for that column. Additionally, a column definition can have the following
modifiers:
STATIC
declares the column as being a static column.
PRIMARY KEY
declares the column as being the sole component of the primary key of the table.
Static columns¶
Some columns can be declared as STATIC
in a table definition. A column that is static will be “shared” by all the
rows belonging to the same partition (having the same partition key). For instance:
CREATE TABLE t (
pk int,
t int,
v text,
s text static,
PRIMARY KEY (pk, t)
);
INSERT INTO t (pk, t, v, s) VALUES (0, 0, 'val0', 'static0');
INSERT INTO t (pk, t, v, s) VALUES (0, 1, 'val1', 'static1');
SELECT * FROM t;
pk | t | v | s
----+---+--------+-----------
0 | 0 | 'val0' | 'static1'
0 | 1 | 'val1' | 'static1'
As can be seen, the s
value is the same (static1
) for both of the rows in the partition (the partition key in
that example being pk
, both rows are in that same partition): the 2nd insertion has overridden the value for s
.
Static columns have the following restrictions:
tables with the
COMPACT STORAGE
option (see below) cannot use them.a table without clustering columns cannot have static columns (in a table without clustering columns, every partition has only one row, and so every column is inherently static).
only non
PRIMARY KEY
columns can be static.
The Primary key¶
Within a table, a row is uniquely identified by its PRIMARY KEY
, and hence all tables must define a PRIMARY KEY
(and only one). A PRIMARY KEY
definition is composed of one or more of the columns defined in the table.
Syntactically, the primary key is defined by the keywords PRIMARY KEY
, followed by a comma-separated list of the column
names composing it within parenthesis. However, if the primary key has only one column, one can alternatively follow that
column definition by the PRIMARY KEY
keywords. The order of the columns in the primary key definition matter.
A CQL primary key is composed of 2 parts:
the partition key part. It is the first component of the primary key definition. It can be a single column or, using additional parenthesis, can be multiple columns. A table always has at least a partition key, the smallest possible table definition is:
CREATE TABLE t (k text PRIMARY KEY);
the clustering columns. Those are the columns after the first component of the primary key definition, and the order of those columns define the clustering order.
Some examples of primary key definition are:
PRIMARY KEY (a)
:a
is the partition key, and there are no clustering columns.PRIMARY KEY (a, b, c)
:a
is the partition key, andb
andc
are the clustering columns.PRIMARY KEY ((a, b), c)
:a
andb
compose the partition key (this is often called a composite partition key), andc
is the clustering column.
The partition key¶
Within a table, CQL defines the notion of a partition. A partition is simply the set of rows that share the same value for their partition key. Note that if the partition key is composed of multiple columns, then rows belong to the same partition only when they have the same values for all those partition key columns. So, for instance, given the following table definition and content:
CREATE TABLE t (
a int,
b int,
c int,
d int,
PRIMARY KEY ((a, b), c, d)
);
SELECT * FROM t;
a | b | c | d
---+---+---+---
0 | 0 | 0 | 0 // row 1
0 | 0 | 1 | 1 // row 2
0 | 1 | 2 | 2 // row 3
0 | 1 | 3 | 3 // row 4
1 | 1 | 4 | 4 // row 5
row 1
and row 2
are in the same partition, row 3
and row 4
are also in the same partition (but a
different one) and row 5
is in yet another partition.
Note that a table always has a partition key and that if the table has no clustering columns, then every partition of that table is only comprised of a single row (since the primary key uniquely identifies rows and the primary key is equal to the partition key if there are no clustering columns).
The most important property of partition is that all the rows belonging to the same partition are guarantee to be stored on the same set of replica nodes. In other words, the partition key of a table defines which of the rows will be localized together in the cluster, and it is thus important to choose your partition key wisely so that rows that need to be fetched together are in the same partition (so that querying those rows together require contacting a minimum of nodes).
However, please note that there is a flip-side to this guarantee: as all rows sharing a partition key are guaranteed to be stored on the same set of replica nodes, a partition key that groups too much data can create a hotspot.
Another useful property of a partition is that when writing data, all the updates belonging to a single partition are done atomically and in isolation, which is not the case across partitions.
The proper choice of the partition key and clustering columns for a table is probably one of the most important aspects of data modeling in Scylla. It largely impacts which queries can be performed and how efficient they are.
The clustering columns¶
The clustering columns of a table define the clustering order for the partition of that table. For a given partition, all the rows are physically ordered inside Scylla by that clustering order. For instance, given:
CREATE TABLE t (
a int,
b int,
c int,
PRIMARY KEY (a, b, c)
);
SELECT * FROM t;
a | b | c
---+---+---
0 | 0 | 4 // row 1
0 | 1 | 9 // row 2
0 | 2 | 2 // row 3
0 | 3 | 3 // row 4
then the rows (which all belong to the same partition) are all stored internally in the order of the values of their
b
column (the order they are displayed above). So, where the partition keys of the table let you group rows on the
same replica set, the clustering columns control how those rows are stored on the replica. That sorting allows the
retrieval of a range of rows within a partition (for instance, in the example above, SELECT * FROM t WHERE a = 0 AND b
> 1 and b <= 3
) is very efficient.
Table options¶
A CQL table has a number of options that can be set at creation (and, for most of them, altered later). These options are specified after the WITH
keyword.
Amongst those options, two important ones cannot be changed after creation and influence which queries can be done
against the table: the COMPACT STORAGE
option and the CLUSTERING ORDER
option. Those, as well as the other
options of a table are described in the following sections.
Compact tables¶
A compact table is one defined with the COMPACT STORAGE
option. This option is only maintained for backward
compatibility for definitions created before CQL version 3 and shouldn’t be used for new tables. Declaring a
table with this option creates limitations for the table, which are largely arbitrary (and exists for historical
reasons). Amongst these limitations:
a compact table cannot use collections nor static columns.
if a compact table has at least one clustering column, then it must have exactly one column outside of the primary key ones. This implies that you cannot add or remove columns in particular after creation.
a compact table is limited as to the indexes it can create, and no materialized view can be created on it.
Reversing the clustering order¶
The clustering order of a table is defined by the clustering columns of that table. By
default, that ordering is based on the natural order of the clustering order, but the CLUSTERING ORDER
lets you
change that clustering order to use the reverse natural order for some (potentially all) of the columns.
The CLUSTERING ORDER
option takes the comma-separated list of the clustering column, each with an ASC
(for
ascendant, e.g. the natural order) or DESC
(for descendant, e.g. the reverse natural order). Note in particular
that the default (if the CLUSTERING ORDER
option is not used) is strictly equivalent to using the option with all
clustering columns using the ASC
modifier.
Note that this option is basically a hint for the storage engine to change the order in which it stores the row, but it has three visible consequences:
- # it limits which
ORDER BY
clause is allowed for selects on that table. You can only order results by the clustering order or the reverse clustering order. Meaning that if a table has two clustering columns
a
andb
, and you defineWITH CLUSTERING ORDER (a DESC, b ASC)
, then in queries, you will be allowed to useORDER BY (a DESC, b ASC)
and (reverse clustering order)ORDER BY (a ASC, b DESC)
but notORDER BY (a ASC, b ASC)
(norORDER BY (a DESC, b DESC)
).- # it also changes the default order of results when queried (if no
ORDER BY
is provided). Results are always returned in clustering order (within a partition).
- # it has a small performance impact on some queries as queries in reverse clustering order are slower than the one in
forward clustering order. In practice, this means that if you plan on querying mostly in the reverse natural order of your columns (which is common with time series, for instance, where you often want data from the newest to the oldest), it is an optimization to declare a descending clustering order.
Other table options¶
A table supports the following options:
option |
kind |
default |
description |
---|---|---|---|
|
simple |
none |
A free-form, human-readable comment. |
|
simple |
0 |
The probability that extra nodes are queried (e.g. more nodes than required by the consistency level) for the purpose of read repairs. |
|
simple |
0.1 |
The probability that extra nodes are queried (e.g. more nodes than required by the consistency level) belonging to the same data center as the read coordinator for the purpose of read repairs. |
|
simple |
99PERCENTILE |
|
|
simple |
864000 |
Time to wait before garbage collecting tombstones (deletion markers). |
|
simple |
0.00075 |
The target probability of false-positive of the sstable bloom filters. Sstable bloom filters will be sized to provide the provided probability (thus lowering this value impact the size of bloom filters in-memory and on-disk). |
|
simple |
0 |
The default expiration time (“TTL”) in seconds for a table. |
|
map |
see below |
|
|
map |
see below |
|
|
simple |
0 |
Time (in ms) before Scylla flushes memtables to disk. |
|
map |
see below |
Speculative retry options¶
By default, Scylla read coordinators only query as many replicas as necessary to satisfy
consistency levels: one for consistency level ONE
, a quorum for QUORUM
, and so on.
speculative_retry
determines when coordinators may query additional replicas, which is useful
when replicas are slow or unresponsive. The following are legal values (case-insensitive):
Format |
Example |
Description |
---|---|---|
|
90.5PERCENTILE |
Coordinators record average per-table response times for all replicas.
If a replica takes longer than |
|
90.5P |
Synonym for |
|
25ms |
If a replica takes more than |
|
Coordinators always query all replicas. |
|
|
Coordinators never query additional replicas. |
This setting does not affect reads with consistency level ALL
because they already query all replicas.
Note that frequently reading from additional replicas can hurt cluster performance.
When in doubt, keep the default 99PERCENTILE
.
Compaction options¶
The compaction
options must at least define the 'class'
sub-option, which defines the compaction strategy class
to use. The default supported class are 'SizeTieredCompactionStrategy'
,
'LeveledCompactionStrategy'
, 'IncrementalCompactionStrategy'
, and 'DateTieredCompactionStrategy'
Custom strategy can be provided by specifying the full class name as a string constant.
All default strategies support a number of common options, as well as options specific to the strategy chosen (see the section corresponding to your strategy for details: STCS, LCS, ICS, and TWCS). DTCS is not recommended, and TWCS should be used instead.
Compression options¶
The compression
options define if and how the sstables of the table are compressed. The following sub-options are
available:
Option |
Default |
Description |
---|---|---|
|
LZ4Compressor |
The compression algorithm to use. Default compressors are LZ4Compressor,
SnappyCompressor, and DeflateCompressor. Use |
|
true |
Enable/disable sstable compression. |
|
4KB |
On disk SSTables are compressed by block (to allow random reads). This defines the size (in KB) of the block. Bigger values may improve the compression rate, but increases the minimum size of data to be read from disk for a read. |
For instance:
CREATE TABLE id (id int PRIMARY KEY) WITH compression = {'sstable_compression': 'LZ4Compressor'};
CDC options¶
New in version 3.2: Scylla Open Source
The following options are to be used with Change Data Capture. Available as an experimental feature from Scylla Open Source 3.2. To use this feature, you must enable the experimental tag in the scylla.yaml.
option |
default |
description |
---|---|---|
|
|
When set to |
|
|
When set to |
|
86400 seconds 24 hours |
Time after which data stored in CDC will be removed and won’t be accessible to the client anymore. |
For example:
CREATE TABLE customer_data (
cust_id uuid,
cust_first-name text,
cust_last-name text,
cust_phone text,
cust_get-sms text,
PRIMARY KEY (customer_id)
) WITH cdc = { 'enabled' : 'true', 'preimage' : 'true' };
Encryption options¶
Encryption options are used when enabling or disabling encryption at rest, available in Scylla Enterprise from version 2019.1.1.
Note
When the key_provider
is LocalFileSystemKeyProviderFactory
, you must indicate where the key resides using the secret_key_file: <path>
parameter. Refer to Encryption at Rest for details.
Other considerations:¶
Adding new columns (see
ALTER TABLE
below) is a constant time operation. There is thus no need to try to anticipate future usage when creating a table.
ALTER TABLE¶
Altering an existing table uses the ALTER TABLE
statement:
alter_table_statement: ALTER TABLE `table_name` `alter_table_instruction`
alter_table_instruction: ADD `column_name` `cql_type` ( ',' `column_name` `cql_type` )*
: | DROP `column_name` ( `column_name` )*
: | ALTER `column_name` TYPE `cql_type`
: | WITH `options`
: | scylla_encryption_options: '=' '{'[`cipher_algorithm` : <hash>]','[`secret_key_strength` : <len>]','[`key_provider`: <provider>]'}'
For instance:
ALTER TABLE addamsFamily ADD gravesite varchar;
ALTER TABLE addamsFamily
WITH comment = 'A most excellent and useful table'
AND read_repair_chance = 0.2;
ALTER TABLE data_atrest (
pk text PRIMARY KEY,
c0 int
) WITH scylla_encryption_options = {
'cipher_algorithm' : 'AES/ECB/PKCS5Padding',
'secret_key_strength' : 128,
'key_provider': 'LocalFileSystemKeyProviderFactory',
'secret_key_file': '/etc/scylla/data_encryption_keys/secret_key'};
ALTER TABLE customer_data
WITH cdc = { 'enabled' : 'true', 'preimage' : 'true' };
The ALTER TABLE
statement can:
Add new column(s) to the table (through the
ADD
instruction). Note that the primary key of a table cannot be changed, and thus newly added column will, by extension, never be part of the primary key. Also, note that compact tables have restrictions regarding column addition. Note that this is constant (in the amount of data the cluster contains) time operation.Remove column(s) from the table. This drops both the column and all its content, but note that while the column becomes immediately unavailable, its content is only removed lazily during compaction. Please also note the warnings below. Due to lazy removal, the altering itself is a constant (in the amount of data removed or contained in the cluster) time operation.
Change data type of the column to a compatible type.
Change some of the table options (through the
WITH
instruction). The supported options are the same that when creating a table (outside ofCOMPACT STORAGE
andCLUSTERING ORDER
that cannot be changed after creation). Note that setting anycompaction
sub-options has the effect of erasing all previouscompaction
options, so you need to re-specify all the sub-options if you want to keep them. The same note applies to the set ofcompression
sub-options.Change or add any of the Encryption options above.
Change or add any of the CDC options above.
Warning
Dropping a column assumes that the timestamps used for the value of this column are “real” timestamp in microseconds. Using “real” timestamps in microseconds is the default is and is strongly recommended, but as Scylla allows the client to provide any timestamp on any table, it is theoretically possible to use another convention. Please be aware that if you do so, dropping a column will not work correctly.
Warning
Once a column is dropped, it is allowed to re-add a column with the same name as the dropped one unless the type of the dropped column was a (non-frozen) column (due to an internal technical limitation).
DROP TABLE¶
Dropping a table uses the DROP TABLE
statement:
drop_table_statement ::= DROP TABLE [ IF EXISTS ] table_name
Dropping a table results in the immediate removal of the table, including all data it contains.
Note
By default, when a table or a keyspace is removed, a snapshot is taken so that you can restore it later. As a result, the disk space remains the same and is not immediately reclaimed. Refer to this article or this FAQ entry.
If the table does not exist, the statement will return an error unless IF EXISTS
is used, in which case the
operation is a no-op.
TRUNCATE¶
A table can be truncated using the TRUNCATE
statement:
truncate_statement ::= TRUNCATE [ TABLE ] table_name
Note that TRUNCATE TABLE foo
is allowed for consistency with other DDL statements, but tables are the only object
that can be truncated currently and so the TABLE
keyword can be omitted.
Truncating a table permanently removes all existing data from the table, but without removing the table itself.
Caution
Do not run any operation on a table that is being truncated. Truncate operation is an administrative operation, and running any other operation on the same table in parallel may cause the truncating table’s data to end up in an undefined state.
Apache Cassandra Query Language
Copyright
© 2016, The Apache Software Foundation.
Apache®, Apache Cassandra®, Cassandra®, the Apache feather logo and the Apache Cassandra® Eye logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.