a self-sourcing cassandra cluster with saltstack and ec2

A Self-Sourcing Cassandra Cluster with SaltStack and EC2

Anybody doing something interesting to a production Cassandra cluster is generally advised, for a host of excellent reasons, to try it out in a test environment first. Here's how to make those environments effectively disposable.

The something interesting we're trying to do to our Cassandra cluster is actually two somethings: upgrading from v2 to v3, while also factoring Cassandra itself out from the group of EC2 servers that currently run Cassandra-and-also-some-other-important-stuff. We have a "pets" situation and want a "cattle" situation, per Bill Baker: pets have names and you care deeply about each one's welfare, while cattle are, not to put too fine a point on it, fungible. If we can bring new dedicated nodes into the cluster, start removing the original nodes as replication takes its course, and finally upgrade this Database of Theseus, that'll be some significant progress -- and without downtime, even! But it's going to take a lot of testing, to say nothing of managing the new nodes for real.

We already use SaltStack to monitor and manage other areas of our infrastructure besides the data pipeline, and SaltStack includes a "salt-cloud" module which can work with EC2. I'd rather have a single infra-as-code solution, so that part's all good. What isn't: the official Cassandra formula is geared more towards single-node instances or some-assembly-required clusters, and provisioning is a separate concern. I expect to be creating and destroying clusters with abandon, so I need this to be as automatic as possible.

Salt-Cloud Configuration
- etc/cloud.profiles.d/ec2.conf
- cassandra-test.map
Pillar and Mine
The Cassandra State
Highstate
- srv/salt/top.sls changes
Startup!

Salt-Cloud Configuration

The first part of connecting salt-cloud is to set up a provider and profile. On the Salt master, these are in /etc/cloud.providers.d and /etc/cloud.profiles.d. We keep everything in source control and symlink these directories.

Our cloud stuff is hosted on AWS, so we're using the EC2 provider. That part is basically stock, but in profiles we do need to define a template for the Cassandra nodes themselves.

etc/cloud.profiles.d/ec2.conf

cassandra_node:
  provider: [your provider from etc/cloud.providers.d/ec2.conf]
  image: ami-abc123
  ssh_interface: private_ips
  size: m5.large
  securitygroup:
    - default
    - others

cassandra-test.map

With the cassandra_node template defined in the profile configuration, we can establish the cluster layout in a map file. The filename doesn't matter; mine is cassandra-test.map. One important thing to note is that we're establishing a naming convention for our nodes: cassandra-*. Each node is also defined as t2.small size, overriding the default m5.large -- we don't need all that horsepower while we're just testing! t2.micro instances, however, did prove to be too underpowered to run Cassandra.

cassandra_node:
  - cassandra-1:
      size: t2.small
      cassandra-seed: true
  - cassandra-2:
      size: t2.small
      cassandra-seed: true
  - cassandra-3:
      size: t2.small

cassandra-seed (and size, for that matter) is a grain, a fact each Salt-managed "minion" knows about itself. When Cassandra comes up in a multi-node configuration, each node looks for help joining the cluster from a list of "seed" nodes. Without seeds, nothing can join the cluster; however, only non-seeds will bootstrap data from the seeds on joining so it's not a good idea to make everything a seed. And the seed layout needs to toposort: if A has B and C for seeds, B has A and C, and C has A and B, it's the same situation as no seeds. If two instances know that they're special somehow, we can use grain matching to target them specifically.

Pillar and Mine

The Salt "pillar" is a centralized configuration database stored on the master. Minions make local copies on initialization, and their caches can be updated with salt minion-name saltutil.refresh_pillar. Pillars can target nodes based on name, grains, or other criteria, and are commonly used to store configuration. We have a lot of configuration, and most of it will be the same for all nodes, so using pillars is a natural fit.

srv/salt/pillar/top.sls

Like the top.sls for Salt itself, the Pillar top.sls defines a highstate or default state for new minions. First, we declare the pillars we're adding appertain to minions whose names match the pattern cassandra-*.

base:
  'cassandra-*':
    - system-user-ubuntu
    - mine-network-info
    - java
    - cassandra

srv/salt/pillar/system-user-ubuntu.sls

Nothing special here, just a user so we can ssh in and poke things. The private key for the user is defined in the cloud provider configuration.

system:
  user: ubuntu
  home: /home/ubuntu

srv/salt/pillar/mine-network-info.sls

The Salt "mine" is another centralized database, this one storing grain information so minions can retrieve facts about other minions from the master instead of dealing with peer-to-peer communication. Minions use a mine_functions pillar (or salt-minion configuration, but we're sticking with the pillar) to determine whether and what to store. For Cassandra nodes, we want internal network configuration and the public DNS name, which latter each node has to get by asking AWS where it is with curl.

mine_functions:
  network.interfaces: [eth0]
  network.ip_addrs: [eth0]
  # ask amazon's network config what we're public as
  public_dns:
    - mine_function: cmd.run
    - 'curl -s http://169.254.169.254/latest/meta-data/public-hostname'

srv/salt/pillar/java.sls

Cassandra requires Java 8 to be installed (prospective Java 9 support became prospective Java 11 support and is due with Cassandra 4). This pillar sets up the official Java formula accordingly -- or rather, it did until Oracle archived the Java 8 binaries in April 2019. We're now pulling it from Artifactory, which is a whole other thing.

java:
  # vitals
  release: '8'
  major: '0'
  minor: '202'
  development: false
  
  # tarball
  prefix: /usr/share/java # unpack here
  version_name: jdk1.8.0_202 # root directory name
  source_url: https://download.oracle.com/otn-pub/java/jdk/8u202-b08/1961070e4c9b4e26a04e7f5a083f551e/server-jre-8u202-linux-x64.tar.gz
  source_hash: sha256=61292e9d9ef84d9702f0e30f57b208e8fbd9a272d87cd530aece4f5213c98e4e
  dl_opts: -b oraclelicense=accept-securebackup-cookie -L

srv/salt/pillar/cassandra.sls

Finally, the Cassandra pillar defines properties common to all nodes in the cluster. My upgrade plan is to bring everything up on 2.2.12, switch the central pillar definition over, and then supply the new version number to each minion by refreshing its pillar as part of the upgrade process.

cassandra:
  version: '2.2.12'
  cluster_name: 'Test Cluster'
  authenticator: 'AllowAllAuthenticator'
  endpoint_snitch: 'Ec2Snitch'
  twcs_jar:
    '2.2.12': 'TimeWindowCompactionStrategy-2.2.5.jar'
    '3.0.8': 'TimeWindowCompactionStrategy-3.0.0.jar'

The twcs_jar dictionary gets into one of the reasons I'm not using the official formula: we're using the TimeWindowCompactionStrategy. TWCS was integrated into Cassandra starting in 3.0.8 or 3.8, but it has to be compiled and installed separately for earlier versions. Pre-integration versions of TWCS also have a different package name (com.jeffjirsa instead of org.apache). 3.0.8 is the common point, having the org.apache TWCS built in but also being a valid compilation target for the com.jeffjirsa TWCS. After upgrading to 3.0.8 I'll be able to ALTER TABLE to apply the org.apache version before proceeding.

With the provider, profile, map file, and pillar setup we can actually spin up a barebones cluster of Ubuntu VMs now and retrieve the centrally-stored network information from the Salt mine:

sudo salt-cloud -m cassandra-test.map

sudo salt 'cassandra-1' 'mine.get' '*' 'public_dns'

We can't do much else, since we don't have anything installed on the nodes yet, but it's progress!

The Cassandra State

The state definition includes everything a Cassandra node has to have in order to be part of the cluster: the installed binaries, a cassandra group and user, a config file, a data directory, and a running SystemD unit. The definition itself is sort of an ouroboros of YAML and Jinja:

srv/salt/cassandra/defaults.yaml

First, there's a perfectly ordinary YAML file with some defaults. These could easily be in the pillar we set up above (or the pillar config could all be in this file); the principal distinction seems to be in whether you want to propagate changes via saltutil.refresh_pillar, or by (re)applying the Cassandra state either directly or via highstate. This is definitely more complicated than it needs to be right now, but given that this is my first major SaltStack project, I don't yet know enough to land on one side or the other, or if combining a defaults file with the pillar configuration will eventually be necessary.

cassandra:
  dc: dc1
  rack: rack1

srv/salt/cassandra/map.jinja

The map template loads the defaults file and merges them with the pillar, creating a server dictionary with all the Cassandra parameters we're setting.

{% import_yaml "cassandra/defaults.yaml" as default_settings %}

{% set server = salt['pillar.get']('cassandra', default=default_settings.cassandra, merge=True) %}

srv/salt/cassandra/init.sls

Finally, the Cassandra state entrypoint init.sls is another Jinja template that happens to look a lot like a YAML file and renders a YAML file, which for SaltStack is good enough. Jinja is required here since values from the server dictionary, like the server version or the TWCS JAR filename, need to be interpolated at the time the state is applied.

When the Cassandra state is applied to a fresh minion:

wget will be installed
A CASSANDRA_VERSION environment variable will be set to the value defined in the pillar
A user and group named cassandra will be created
A script named install.sh will download and extract Cassandra itself, once the above three conditions are met
A node configuration file named cassandra.yaml will be generated from a Jinja template and installed to /etc/cassandra
If necessary, the TWCS jar will be added to the Cassandra lib directory
The directory /var/lib/cassandra will be created and chowned to the cassandra user
A SystemD unit for Cassandra will be installed and started once all its prerequisites are in order

{% from "cassandra/map.jinja" import server with context %}

wget:
  pkg.installed

cassandra:
  environ.setenv:
    - name: CASSANDRA_VERSION
    - value: {{ server.version }}

  cmd.script:
    - require:
      - pkg: wget
      - user: cassandra
      - environ: CASSANDRA_VERSION
    - source: salt://cassandra/files/install.sh
    - user: root
    - cwd: ~

  group.present: []

  user.present:
    - require:
      - group: cassandra
    - gid_from_name: True
    - createhome: False

  service.running:
    - enable: True
    - require:
      - file: /etc/cassandra/cassandra.yaml
      - file: /etc/systemd/system/cassandra.service
{%- if server.twcs_jar[server.version] %}
      - file: /opt/cassandra/lib/{{ server.twcs_jar[server.version] }}
{%- endif %}

# Main configuration
/etc/cassandra/cassandra.yaml:
  file.managed:
    - source: salt://cassandra/files/{{ server.version }}/cassandra.yaml
    - template: jinja
    - makedirs: True
    - user: cassandra
    - group: cassandra
    - mode: 644

# Load TWCS jar if necessary
{%- if server.twcs_jar[server.version] %}
/opt/cassandra/lib/{{ server.twcs_jar[server.version] }}:
  file.managed:
    - require:
      - user: cassandra
      - group: cassandra
    - source: salt://cassandra/files/{{ server.version }}/{{ server.twcs_jar[server.version] }}
    - user: cassandra
    - group: cassandra
    - mode: 644
{%- endif %}

# Data directory
/var/lib/cassandra:
  file.directory:
    - user: cassandra
    - group: cassandra
    - mode: 755

# SystemD unit
/etc/systemd/system/cassandra.service:
  file.managed:
    - source: salt://cassandra/files/cassandra.service
    - user: root
    - group: root
    - mode: 644

srv/salt/cassandra/files/install.sh

This script downloads and extracts the target version of Cassandra and points the symlink /opt/cassandra to it. If the target version already exists, it just updates the symlink since everything else is already set up.

#!/bin/bash

update_symlink() {
  rm /opt/cassandra
  ln -s "/opt/apache-cassandra-$CASSANDRA_VERSION" /opt/cassandra

  echo "Updated symlink"
}

# already installed?
if [ -d "/opt/apache-cassandra-$CASSANDRA_VERSION" ]; then
  echo "Cassandra $CASSANDRA_VERSION is already installed!"

  update_symlink

  exit 0
fi

# download and extract
wget "https://archive.apache.org/dist/cassandra/$CASSANDRA_VERSION/apache-cassandra-$CASSANDRA_VERSION-bin.tar.gz"
tar xf "apache-cassandra-$CASSANDRA_VERSION-bin.tar.gz"
rm "apache-cassandra-$CASSANDRA_VERSION-bin.tar.gz"

# install to /opt and link /opt/cassandra
mv "apache-cassandra-$CASSANDRA_VERSION" /opt
update_symlink

# create log directory
mkdir -p /opt/cassandra/logs

# set ownership
chown -R cassandra:cassandra "/opt/apache-cassandra-$CASSANDRA_VERSION"
chown cassandra:cassandra /opt/cassandra

It's probably possible to do most of this, at least the symlink juggling and directory management, with "pure" Salt (and the environment variable could be eliminated by rendering install.sh as a Jinja template with the server dictionary), but the script does what I want it to and it's already idempotent and centrally managed.

srv/salt/cassandra/files/cassandra.service

This is a basic SystemD unit, with some system limits customized to give Cassandra enough room to run. It starts whatever Cassandra executable it finds at /opt/cassandra, so all that's necessary to resume operations after the symlink changes during the upgrade is to restart the service.

[Unit]
Description=Apache Cassandra database server
Documentation=http://cassandra.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=forking
User=cassandra
Group=cassandra
ExecStart=/opt/cassandra/bin/cassandra -Dcassandra.config=file:///etc/cassandra/cassandra.yaml
LimitNOFILE=100000
LimitNPROC=32768
LimitMEMLOCK=infinity
LimitAS=infinity

[Install]
WantedBy=multi-user.target

srv/salt/cassandra/files/2.2.12/cassandra.yaml

The full cassandra.yaml is enormous, so I won't reproduce it here in full. The interesting parts are where values are being automatically interpolated by Salt. Like the Cassandra state, this is actually a Jinja template which renders a YAML file.

First, we get a list of internal IP addresses corresponding to cassandra-seed minions from the Salt mine and build a list of known_seeds.

{%- from 'cassandra/map.jinja' import server with context -%}
{% set known_seeds = [] %}
{% for minion, ip_array in salt['mine.get']('cassandra-seed:true', 'network.ip_addrs', 'grain').items() if ip_array is not sameas false and known_seeds|length < 2 %}
{%   for ip in ip_array %}
{%     do known_seeds.append(ip) %}
{%   endfor %}
{% endfor %}

This becomes the list of seeds the node looks for when trying to join the cluster.

seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "{{ known_seeds|unique|join(',') }}"

Listen and broadcast addresses are configured per node. The broadcast addresses are a little special due to our network configuration needs: each node has to get its public dns name from the Salt mine. This is perhaps a bit overcomplicated compared to a custom grain or capturing the output from running the Salt modules at render time, but it's there and it works and at this point messing with it isn't a great use of time.

listen_address: {{ grains['fqdn'] }}
broadcast_address: {{ salt['mine.get'](grains['id'], 'public_dns').items()[0][1] }}
rpc_address: {{ grains['fqdn'] }}
broadcast_rpc_address: {{ salt['mine.get'](grains['id'], 'public_dns').items()[0][1] }}

The cluster name and other central settings are interpolated from the pillar+defaults server dictionary.

cluster_name: "{{ server.cluster_name }}"
...
authenticator: "{{ server.authenticator }}"
...
endpoint_snitch: "{{ server.endpoint_snitch }}"

The changes to the Cassandra 3.0.8 configuration are identical.

srv/salt/cassandra/files/2.2.12/TimeWindowCompactionStrategy-2.2.5.jar

See this post on TheLastPickle for directions on building the TWCS jar.

Highstate

Finally, the Salt highstate needs to ensure that our cassandra-* nodes have the Java and Cassandra states applied. Since Salt-Cloud minions come configured, however, we have to ensure the default salt.minion state is excluded from our Cassandra nodes since otherwise a highstate will blow away the cloud-specific configuration.

srv/salt/top.sls changes

base:
  'not cassandra-*':
    - match: compound
    - salt.minion
  'cassandra-*':
    - sun-java
    - sun-java.env
    - cassandra

Startup!

Set the Salt config dir to etc with -c and pass in the map file with -m:

sudo salt-cloud -c etc -m cassandra-test.map

To clean up:

sudo salt-cloud -d cassandra-1 cassandra-2 cassandra-3