Skip to content

adobe-platform/mesos-systemd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mesos-systemd

Adobe Platform scripts to bootstrap a CoreOS cluster & run Mesos/Marathon/Zookeeper-Exhibitor.

Provides node-level services as Fleet Units for every machine in the cluster.

Most services (logging, metrics, monitoring) run on all nodes, some only run on specific tiers based on the metadata that is injected into Fleet.

The aim of this setup is to move instance provisioning steps into the CoreOS machine level, automated via fleetctl/systemctl. Almost all of our systemd units utilize docker to run our services. Consequently, we're able to use the vanilla CoreOS EC2 AMI (i.e.: we don't bake AMIs at all). That being said, we have methods in this repo that also deal with sensitive data/secrets to configure various services (more below).

DISCLAIMER:

This repository may reference private repositories or scripts. Most should be replaceable with your own, but either way - proceed with caution as this project is highly experimental and certain nuances may not be well documented. If you want to use this repo, you may have to prune the code a bit and edit/delete certain files.

Concepts

The purpose of this repository is to house all setup scripts and systemd/fleetd units in a central location, separate of our infrastructure provisioning scripts (cloudformation).

All setup behavior is defined in the init script.

Assumptions:

  • Your infrastructure has 3 tiers: control, proxy, worker
  • ALL nodes run a bootstrap.service, whatever that may be.
  • Some of the scripts require /etc/environment to contain certain variables (usually cloudformation parameters such as route53 entries)
  • S3 buckets are set correctly and all required credential files (SSH keys, datadog & sumologic credentials) are properly provided to init & can be downloaded using behance/docker-aws-s3-downloader

init bootstrap

Our bootstrap.service just clones this repo and runs the init script.

From there, it does a couple of things:

  1. ensure that any credentials/secure files are downloaded from S3 (to allow docker & git to pull private dependencies)
  2. configure SSH configs to allow github.com access
  3. copy .dockercfg into /root # TODO: refactor process as this is a hack
  4. runs ALL scripts in v2/setup
    • these scripts will always be run with sudo (i.e.: as root)
    • set things up like create motds, aliases, dropins for various services
  5. starts up tier-specific template units that are specified by the running machines' IP (provided by CoreOS / cloudinit)
    • these are started via fleet, event though they are NOT global units and run on specific machines
    • rationale for this is to give us granular control over certain units, such as mesos-slaves. It allows us to control individual nodes, or perform rolling actions (such as deploys) while retaining visibility into the cluster as a whole.
  6. submits and starts generic fleet units

Services

Global Services (run on ALL nodes in ALL tiers)

Monitoring

Util/Automated Maintenance

MISC

Control Tier Nodes:

Proxy Tier Nodes:

  • CAPCOM - private Container-Proxy Manager (stay tuned!)
  • Heatshield Proxy (our version of nginx) or HAProxy

Worker Tier Nodes:

  • Mesos Slave

Key/Secret Management & Configuration

All secrets & key management is a bit adhoc. Most of the setup scripts, which house the logic for setting up the data for then fleet units to use, require a few things to download secrets & keys:

  • the $CONTROL_TIER_S3SECURE_BUCKET environment variable, written into /etc/environment by cloudformation
  • behance/docker-aws-s3-downloader container to download files
  • IAM roles to access $CONTROL_TIER_S3SECURE_BUCKET

Secrets make it onto the nodes in the form of flat text files that live within $CONTROL_TIER_S3SECURE_BUCKET. The setup files individually know which file(s) they need to download & how to read, set or use the data for their corresponding units. So for example, the datadog unit requires an etcd key, /ddapikey. Knowing this, we have a datadog setup script which downloads a .datadog file from $CONTROL_TIER_S3SECURE_BUCKET, expects it to be in a certain format, and sets the etcd key.

Files in S3

We are planning to deprecate the following in favor other solutions (DynamoDB + KMS?).

Services, dotfiles, dotfile formats
Service File Format
Datadog .datadog Just the key. Nothing else.
Sumologic .sumologic ID=YOURID
SECRET=YOURSECRET
Flight Director .flight-director /FD/GITHUB_CLIENT_ID (YOUR GITHUB APP ID)
/FD/GITHUB_CLIENT_SECRET (YOUR GITHUB APP SECRET)
/FD/GITHUB_ALLOWED_TEAMS org/team
HUD .hud /HUD/client-id (GITHUB_APP_ID can == value in .flight-director)
/HUD/client-secret (GITHUB_APP_SECRET can == value in .flight-director)
Marathon .marathon /marathon/username a-username
/marathon/password a-password
MISC
  • .dockercfg to download private containers
  • id_rsa to clone any private repositories

Nothing special needs to be done for these two just as long as the cloudformation templates sets the following in /etc/environment

$SECURE_FILES=.dockercfg:id_rsa,0600,.ssh/id_rsa

The format of this environment variable just needs to conform to behance/docker-aws-s3-downloader