How to Report a Scylla Problem

To save time and increase the likelihoods of a solution, please follow the guidelines when reporting a Scylla problem.

Information to Include in the Report

Run the node_health_check script (Starting from Scylla version 2.0 it will be in the default path):

It will generate an archive file (output_files.tgz) containing configuration data (hardware, OS, Scylla SW, etc.) and system logs, as well as a textual report file (<node_IP>-health-check-report.txt) based on the collected info.

./node_health_check.sh -h

This script performs system review and generates health check report based on
the configuration data (hardware, OS, Scylla SW, etc.) collected from the node.

Usage:
-p   Port to use for nodetool commands (default: 7199)
-q   Port to use for cqlsh (default: 9042)
-c   Print cfstats output
-d   Print data model info
-n   Print network info
-a   Print all
-h   Display this help and exit

Note: output for the above is collected, but not printed in the report.
If you wish to have them printed, please supply the relevant flag/s.
  • Generate a UUID, export report_uuid=$(uuidgen) (this uuid will be used to upload the configuration data archive, textual report and core dump file/s)
  • Upload the files to S3:
curl --request PUT --upload-file output_files.tgz "scylladb-users-upload.s3.amazonaws.com/$report_uuid/output_files.tgz
curl --request PUT --upload-file [node_IP]-health-check-report.txt "scylladb-users-upload.s3.amazonaws.com/$report_uuid/[node_IP]-health-check-report.txt

Core Dump

When Scylla fails, it should create a core dump which can later be used to debug the issue.

Coredumps are written to /var/lib/scylla/coredump

Systemd

If Scylla restarts for some reason and there is no core dump file, make sure you set the OS to generate core dumps. Note that you will need a spare disk space which is larger than Scylla RAM.

Core dump file location is defined in /etc/systemd/coredump.conf.d/custom.conf

No Systemd

If files are still not written, it might be that Automatic Bug Reporting Tool (ABRT) is running and all core dumps are pipelined right to it. Check /proc/sys/kernel/core_pattern file, and if it have something like |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %h %e 636f726500 replace it with just core.

Send the Core Dump

The dump file can be very large. Make sure to zip it with xz or similar: xz -z core.21692

Upload the compress file to our dedicated s3 bucket:

curl --request PUT --upload-file "yourfile" "scylladb-users-upload.s3.amazonaws.com/$report_uuid/yourfile"

Where report-uuid is the uuid you generated earlier. This method has a 5GB limit size per file. If your core file is bigger, split it using

split -b 1GB core.19644.xz

and upload the files one by one using the same method above, and the same uuid.

For example:

for f in $(ls xa*); do sudo curl --request PUT --upload-file "$f" "scylladb-users-upload.s3.amazonaws.com/$report_uuid/$f" ; done

Prometheus

When using Grafana and Prometheus to monitor Scylla, sharing the metrics stored in Prometheus is very useful. Here is how to do it (from the monitoring server)

  1. sudo docker ps to validate Prometheus instance is running.
  2. sudo docker cp a64bf3ba0b7f:/prometheus /tmp/prometheus_data to download the DB, use your CONTAINER ID instead of a64bf3ba0b7f.
  3. sudo tar -zcvf /tmp/prometheus_data.tar.gz /tmp/prometheus_data/ to zip the file.
  4. Upload the file /tmp/prometheus_data.tar.gz to s3 (see curl above).

Back