How to Report a Performance Problem

To save time and increase the likelihoods of a solution, please follow the following guidelines when reporting a Scylla performance problem.

Information to Include in your Report

System setup

The following should allow us to reproduce your benchmark or to get close to it.

Include the complete Scylla setup info, as detailed in How to Report a Scylla Problem, including OS, scylla version, etc.

Hardware:
  • How many nodes in Scylla cluster
  • What is the spec of each node (CPU, Disk, RAM, network)
  • How many loaders (stress machine) are you using?
  • What is the spec of each machine?
Stress
  • What stress software are you using (cassandra-stress, YCSB, other)
  • What is the exact parameter you are using?

Check out Scylla benchmark results for an example of the level of details required.

Real-Time Metrics

The following should help find out the bottleneck of your deployment. You should take this measurement while the system is under maximum load.

Scylla Metrics

You can read Scylla’s metrics using scyllatop or use an external monitoring service like Grafana

Metric to Collect - Client Side

Check the client CPU using top. If the CPU is close to 100%, the bottleneck is the client CPU. In this case, you should add more loaders to stress Scylla.

Metric to collect - Server

Check the gauge-load using - scyllatop *gauge-load, if the load is close to 100%, the bottleneck is Scylla CPU. Note that checking the CPU load using top is not a good metric for Scylla

  • Use sar -P ALL to see if one of Scylla core is busier than the others. Use perf top -C0 to check the load on one CPU (0 in this example)
  • Use iostat -x 1 to observe the disk utilization. If the %util is close to 100%, the disk might be the bottleneck.
  • Use sudo perf record --call-graph dwarf -C 0 -F 99 -p $(ps -C scylla -o pid --no-headers) -g sleep 10 collect run time stats. Prerequisite: install debug info

Alternatively, you can run the sudo ./collect-runtime-info.sh source which does all of the above, except scyllatop and upload the compressed result to s3. You can also see the results in ./report dir

Prometheus

When using Grafana and Prometheus to monitor Scylla, sharing the metrics stored in Prometheus is very useful. Here is how to do it (from the monitoring server)

  1. sudo docker ps to validate Prometheus instance is running.
  2. sudo docker cp a64bf3ba0b7f:/prometheus /tmp/prometheus_data to download the DB, use your CONTAINER ID instead of a64bf3ba0b7f.
  3. sudo tar -zcvf /tmp/prometheus_data.tar.gz /tmp/prometheus_data/ to zip the file.
  4. Upload the file /tmp/prometheus_data.tar.gz to upload.scylladb.com (see curl above).

Back