NTP Configuration for Scylla

Topic: System administration

Learn: How to configure time synchronization for Scylla

Audience: Scylla and Apache Cassandra administrators

Apache Cassandra and Scylla depend on an accurate system clock. Kyle Kingsbury, author of the jepsen distributed systems testing tool, writes,

Apache Cassandra uses wall-clock timestamps provided by the server, or optionally by the client, to order writes. It makes several guarantees about the monotonicity of writes and reads given timestamps. For instance, Cassandra guarantees most of the time that if you write successfully to a quorum of nodes, any subsequent read from a quorum of nodes will see that write or one with a greater timestamp.

So servers need to keep their time in sync. Not a hard problem, since we all have NTP on our Linux systems, right? Not quite. The way that NTP ships out of the box is fine for a stand-alone server, but can be a problem for a distributed data store.

You did WHAT in the Pool?

The default NTP configuration that comes with a typical Linux system uses “NTP pools”, lists of publicly available time servers contributed by public-minded Internet timekeeping system administrators. The pools are a valuable service, but in order to spare the NTP traffic load on any given server, they’re managed with DNS round robin. One client that tries to resolve the hostname 0.pool.ntp.org will get a different result from another client.

As Viliam Holub points out in a two-part series – part 1, part 2 – if Apache Cassandra nodes in a cluster are independently obtaining their time from random pool servers out on the Internet, the chances that two nodes can have widely (by NTP standards) differing time is high. For example, if a cluster has 10 nodes, 50% of the time some pair of nodes will have time that differs by more than 10.9ms. The problem only grows as more nodes are added.

The solution is to be able to take that ntp.conf file that came with your Linux distribution, and take the default “pool” servers out and put your data center’s own NTP servers in.

Instead of lines that looks something like:

server 0.fedora.pool.ntp.org iburst
server 1.fedora.pool.ntp.org iburst

Or

server 0.debian.pool.ntp.org iburst
server 1.debian.pool.ntp.org iburst

use your own servers. So ntp.conf will have “server” lines pointing to your own NTP servers, and look more like:

# begin ntp.conf

# Store clock drift -- see ntp.conf(5)
driftfile /var/lib/ntp/drift

# Restrict all access by default
restrict default nomodify notrap nopeer noquery

# Allow localhost access and LAN management
restrict 127.0.0.1
restrict ::1
restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

# Use our company’s NTP servers only
server 0.ntp.example.com iburst
server 1.ntp.example.com iburst
server 2.ntp.example.com iburst

# End ntp.conf

The same ntp.conf can be deployed to all the servers in your data center. Not just Apache Cassandra nodes, but the application servers that use them. It’s much more important for the time to be in sync throughout the cluster than for any node to match some random machine out on the Internet. It’s also helpful to keep the data store time the same as the application server’s time, for ease in troubleshooting and matching up log entries.

Dedicated NTP appliances are available, and might be a good choice for large sites. Otherwise, any standard Linux system should make a good NTP server.

On the NTP servers, you can go ahead and use the “pool.ntp.org” server lines that shipped with your Linux distribution if you don’t have a known good time server. But a good hosting provider or business-class ISP probably has NTP servers that are close to you on the network, and that would be better choices to replace the pool entries.

Your NTP servers should peer with each other:

peer 0.ntp.example.com prefer
peer 1.ntp.example.com
peer 2.ntp.example.com

Almost done.

Pass the Fudge?

What happens when the network goes down? In most cases, NTP should just work. Your NTP servers will establish a new consensus time among themselves. Old-school NTP documentation had “fudge” lines to let the NTP server rely on the local system clock if the network connection failed. On modern versions of NTP, the “fudge” functionality has been replaced with Orphan mode.

Add an “orphan” line to ntp.conf on each NTP server:

tos orphan 9

And the NTP servers will do the right thing and stay synchronized among themselves if there’s a problem reaching the servers on the outside.

That’s all it takes. One relatively simple system administration project can save a bunch of troubleshooting grief later on. Once your NTP servers are working, have a look at the instructions for joining the NTP pool yourself, so that you can help share the correct time with others

Knowledge Base