Simple architecture for Docker Swarm and Traefik footprints

If you are just getting started in the world of Docker and are beyond the point of running simple standalone containers that live and die on one host with no container orchestrator managing them for you, then Docker Swarm is a good next step into the world of container orchestration. Its simple to setup, stable, pretty powerful and is a natural next evolution in your skills after working in the world of vanilla Docker.

With Docker Swarm you stop thinking about containers, and start thinking about services instead. Services are made up of many replicas (container instances) and you can scale them up and down as you see fit. The other benefit docker services brings you is that services on a swarm have a published port on the swarm routing mesh ingress network, that is accessible via any swarm node in the cluster and it will properly route it to a replica container for the service regardless of which node it is running on. This is quite powerful.

However once you get setup and comfortable with these concepts you will quickly realize the downsides to having to work with and manage “published ports” for your deployed swarm services. You are forced to either publish your known services on fixed published ports so your upstream load-balancers can route traffic properly to the right backends for given FQDNs, OR you will have to write your own service discovery bridge to a layer7 routing engine to dynamically manage this for you.  Point being is the latter is a complex thing to wrangle with. One fantastic tool out there is Traefik which is an open source layer 7 proxy/router of sorts that works with many container orchestration systems out of the box like Docker Swarm. It also works with Kubernetes, Rancher and other service discovery backends like Consul for example. In any case, the general architecture principals discussed below would be applicable when using Traefik regardless of the orchestrator your services, pods or otherwise are deployed on.

In short one of the most powerful things (out of many) about Traefik is it simply lets you –label your docker services, to dictate by which FQDNs (i.e. Host headers) they will receive HTTP(S) traffic for. This is huge because as you deploy services you can simply dictate what layer 7 criteria dictates how they get traffic (not just Host headers, but any header and or path combos amongst other things), and all expressed with labels.

That said this post is not about explaining Docker Swarm, nor Traefik, but rather to share a simple architecture and naming conventions for using these two technologies which enables a pattern of sorts for both a physical and logical service topology you can use for any workload.

Lets assume you are now managing this and have that solved; well… how quick are you to respond to new releases of your backend services? Can you run multiple versions of a given app concurrently? (i.e. old version and new version)? Can you hot swap traffic between service version X or Y? How easy is it to deploy a new application and have it immediately accessible via a fqdn?

The following is intended to aid with some of that.

Physical architecture overview

  • You have a Docker Swarm cluster made up of N hosts
  • The Swarm has two possible classifications of docker overlay networks on it
  • external: for services that receive requests directly from outside sources
  • internal: for services that receive requests from internal sources
  • You can run both networks on one target swarm, or only one of them depending on your posture preference.
  • Each network has a single designated Traefik service that is published on a fixed (well known and reserved) swarm port. For this example; lets say the external port is 45900 and Traefik internal port is 45800.
  • Each Traefik service is constrained to run only on swarm manager nodes (a requirement w/ Traefik at the time of this post)
  • This Traefik service proxies all inbound HTTP/S traffic to other application services on the shared internal or external network respectively.
  • Each fixed internal/external Traefik published port on the swarm receives its traffic from a corresponding designated load balancer device (hardware or software) that resides on the network. This load balancer (as shown in the diagrams) may actually have one or more other devices on either side of it for network security purposes etc; point being it represents another device chain in the network that resides outside of your swarm footprint who’s ultimately responsibility is routing HTTP(S) traffic to your internal or external Traefik services.
  • DNS for your deployed services points an appropriate device that will eventually proxy traffic to the appropriate internal or external load balancer chain
  • Each of these DNS names are specified within a traefik.[servicename].frontend.rule=Host:[name1],[nameN]..label on each docker service
  • Each docker service to be proxied by Traefik also denotes its local http(s) port that Traefik should proxy to on the shared external/internal network via the docker service label traefik.port
  • Other Traefik routing labels may also be applied if needed (path, or header based)
  • The above results in all HTTP(S) traffic being routed through the Traefik service and on to the target swarm service as appropriate based on the label configuration.
  • Provisioning new swarm/traefik cluster footprints should be completely automated and reproducible easily. Want to upgrade your swarm? Provision and configure a new cluster with latest software, migrate your services and throw away the old one.
physical architecture overview diagram

Logical architecture overview

  • Your applications are deployed as docker “services” on a target swarm cluster
  • An application runs within the scope of logical “context” (i.e. “dev”, “pre-prod”, “production”, or “qa” etc)
  • A “context” generally implies a set of corresponding configuration that is different in some way to any other “context” and many services deployed in the same “context” generally talk to one another in one larger cohesive unit. Think of “context” as a qualifier of sorts for service discovery and or configuration resolution.
  • An application can have an optional “classifier” to give it additional categorization
  • A binary Docker image (i.e. myapp:1.0.0) paired with a “context”, “version” and optional “classifier” yields an unique deployed docker service.
    • a deployed docker service has a naming convention [appname]-[context]-[version][-classifier] (i.e. myapp-pre-prod-1-0-0)
  • This generated “name” can be used to automatically provision appropriate convention based DNS names for each service deployed
  • The combination of a Docker image version, in scope of a “context” falls into one of three categories:
    • current: The current version of the application receiving live traffic bound to FQDNs representative of live traffic (i.e. myapp.test.com)
    • previous: The previous version of the application which receives traffic bound to unique FQDNs (i.e. myapp-pv.test.com)
    • next: The upcoming version of the application which receives traffic bound to special testing FQDNs (i.e. myapp-nv.test.com)
  • “Where” HTTP/S traffic goes for given standard FQDNs can easily be controlled by hot-swapping Traefik frontend rules via Docker service labels (i.e. traefik.[servicename].frontend.rule)
Which “version” a particular conduit of traffic flows to (current, previous, next) is determined based on convention based names for services, dns and managed via docker service labels manually of via automated DevOps tooling.

Deploying services

Everything previously discussed is simply the model, pattern, architecture… whatever you’d like to call it. All of the tooling to deploy and configure services onto such a footprint can be done with all the native Docker tooling, compose files, stacks etc and “docker service” commands. However you probably don’t want to do that manually and secondly the Docker compose format, while great for defining services and everything required to launch them, really does not provide any way to define the higher level logical architecture previously described: services, their contexts and current/next/previous traffic conduits for dropping services into.

In the next section I give some configuration examples (YAML) that lets you describe swarms and services and their desired state at a higher level. To be clear this is NOT a docker compose compatible format, but rather simply a proposed way you could describe things… and subsequently build automation on top of it around deploying services and hot-swapping what versions are current, previous and or next.

The “how” of how you deploy and manage your services based on this swarm traefik footprint architecture and service naming conventions is up to you. You can build automation around this using any programming language of your choice or DevOps tooling such as Salt, Python, Ansible, Go, Bash… whatever… use whatever works for you!

Defining “swarms”

The following is an sample YAML config file format you could use to define a target “swarm” footprint instance as described above. Using this information along with “service state” files (next section) you could automate a lot of things around management and deployment of services into the previously described architecture.

[swarm-name].yml files

swarm_mgr_uri: "http://myswarm1.test.com:[port]"

# The dns names of your upstream load balancers
# that manage ingress access to the swarm cluster nodes
swarm_lb_endpoint_internal: "myswarm1-intlb.test.com"
swarm_lb_endpoint_external: "myswarm1-extlb.test.com"

# The published Traefik ports on the target swarm
# for external/internal networks respectively
traefik_swarm_port_internal_https: 45800
traefik_swarm_port_external_https: 45900

# List of logical supported "contexts" this swarm would
# accept workloads for (as defined in service state files below)
contexts:
  - "dev"
  - "qa"
  - "production"
  - "pre-prod"

# This could be better described via other discovery
# means or inventory apis describing all hosts in
# the target swarm cluster. {id} here could be used
# for iteration by a program assuming numerically described
# docker swarm hosts
swarm_host_info:
  template: "myswarm1-node{id}.test.com"
  total_nodes: 5

Defining “service state”

How do you define services at a higher level that reside “above” your compose files for actually deploying each service? You need a way to convey what a service is, is naming convention components, health checks and other things like vanity names for fixed DNS names (non auto generated dns names) that are relevant to it. You also need to be able to described the desired state of what versions of the image behind the service should be deployed to the different “conduits” within each “context” as described previously (current, previous, next).

The following is an example higher level YAML config file format you can use to define a “service” and its desired state when deployed to a swarm footprint. Using this information combined with “swarm info” files (previous section) as well as standard docker compose files for each service you could automate a lot of things around management and deployment of services into the previously described architecture.

What services get deployed to which “swarm” clusters can be handled by mappings in other tooling configuration you manage. It is not defined here; rather what is defined is what “contexts” are desired targets for this service. Given the context names, you could find compatible swarms, by looking at what “contexts” are listed in your inventory of [swarm-name].yml files described previously.

[service-state].yml files


# the formal name that all generated service names/dns
# will be prefixed with.
# This generally equals the Docker image name (without tag)
formal_name: "myapp"

# an application type to qualify consuming
# deployment automation behavior or just classify
# what kind of service this app is
app_type: "ruby-sinatra | whatever!"

# Classifiers are additional descriptors for 
# classifying some feature or function this service provides
# classifiers are related to service_ports
classifiers:
  mode-a:
    desc: "operation mode A"
  mode-b:
    desc: "operation mode B"

# Aliases are other "prefixes" similar to "formal_name"
# that would be used to generate additional fqdn (dns) generated
# names for accessing to the service and subsequent labeling
aliases:
  - "myalias1"
  - "myalias2"

# The ports that the service containers EXPOSEs 
# as defined in its Dockefile, and subsequently
# would be mapped to a published swarm port
service_ports:
  443:
    name: "https access port"
    desc: "description"
    protocol: "https"
    classifiers:
      - "mode-a"
 8943: 
    name: "http monitor access port" 
    desc: "description" 
    protocol: "http" 
    classifiers: 
      - "mode-b"
# Service checks are where you could define
# bindings to service_ports and health checks
# that can be used to validate the service via
# that port.
service_checks:
  - ports: [443]
    path: "/health"
    layers: [0,1,2,3]
    headers:
      - "test2: yes"
    method: "GET"
    timeout: 10
    retries: 3
    tags: ["foo","health"]
  

# Contexts are where you define the desired contexts
# and then what image versions for the formal_name
# should be assigned to the appropriate conduits
# for traffic (next, previous, current)
contexts:
  dev:
    versions:
      current: "1.5.0"
      previous: "1.0.0"
      next: "2.0.0"
  production:
    versions:
      current: "1.0.0"
      previous: "0.9.0"
      next: "1.5.0"

Summary

Hopefully this will be of use to others out there who are looking to get started with Docker Swarm, Traefik and looking for a way to manage services and rolling out new versions of those apps in a clean and repetitive manner. I’ve seen this architecture work quite well under production usage, but do note that it does require custom development for the consumption of the config files for deploying and hot-swapping services into target swarms configured under such a footprint.

Where to next

Where to go from here? Given the major announcements coming out of Docker Inc. around ending Docker Cloud (swarm) and introducing Kubernetes support etc, long term Swarm will still be maintained given its large user base in the community, but new features will probably never be heavily developed from here on out. Is clear Kubernetes has won the orchestrator wars and the next step will likely be migrating to that platform. Its certainly a more complex beast but has many amazing capabilities and a lot of momentum behind it. Onward and upward!

2 thoughts on “Simple architecture for Docker Swarm and Traefik footprints

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s