• Ian Stuart's picture

    Docker Swarms, what are they?

    Ian Stuart / October 26, 2017
  • Before the Swarm - The Good Old Days

    In the "Good Old Days", computers where physical things... real tin... and you had either the Personal Computer or the Mainframe.

    PCs were "small": they had 1 CPU (well, modern ones have more), limited RAM (16GB is a lot), and realtively small storage-systems (a couple of TB if you're lucky.) PCs were designed to be used by 1 person at a time. PCs sat on your desk. PCs became portable, then ultra-portable, and even evolved into hand-helds (take another look at the specs for your smart-phone!)

    Mainframes, on the other hand, were designed to be a centralised resource, and used by multiple people... simultaniously. Mainframes have a large number of CPUs (128 or more), masses of RAM (128GB), and vast tracts of storage (multiple terrabytes, after RAIDing). Mainframes have redunant Power Supplies and hot-swap everything. Mainframes just run - forever!

    The upside of the mainframe is that there is a huge pool of resources available, the downside is that someone else may be using them. Oh, and that one person's bad code can wipe out the entire machine.

    The upside of the PC is that all the resources are yours, the downside is that those resources are limited. Oh, and that each machine can be different.

    Before the Swarm - Trading the Real for the Virtual

    One of the techniques used to merge the huge pool of resources from the mainframe with the isolation of the PC was to create "Virtual" computers within one physical machine. There are a number of techniques for this:

    • Solaris Containers (cap the RAM & localised libraries within a common OS)
    • VMs (allocate resources to make a virtual PC)
      Generally those resources are specific to that VM, however the VM is essentially a complete PC in all other respects
    • Docker Containers (a cross between VMs and Solaris Containers: almost a complete system, but require a computer to run on.)

    Before the Swarm - The Cloud

    The Cloud can be considered a Virtual Hosting system - its a lot of smaller bits of tin joined together to pretend to be one big system, designed to then provide resources for Virtual systems to run.

    Before the Swarm - Docker Containers

    I assume you kinda know about Docker Containers - jumping to Swarms without that knowledge is going to be hard work!

    The main point to remember is that Docker Containers run on machines - be they real or virtual.

    Docker Containers require a Docker Server to run.

    Remember this:

    FROM python:3.4-alpine
    ADD . /code
    WORKDIR /code
    RUN pip install -r requirements.txt
    CMD ["python", "app.py"]

    .... followed by cd player_engine; docker build --tag morg/player_engine . ; docker rum --rm -v /path/do/data:/path/in/container -p 12345:8000 player_engine  (and repeat for each Container, if a multi-Container application)

    Before the Swarm - Docker Compose

    In many cases, a complete application was made up from multiple Docker Containers, so the idea of docker-compose was born.

    This uses a single compose file to describe all the Docker Containers needed for the application to run, and how to tie them together.

    A Compose file lists all the Containers in the app, and how to build them:

    version: '2'
        container_name: postgres
        image: postgres:9.6.3
          - POSTGRES_PASSWORD=my_username
          - POSTGRES_USER=my_password
          - /path/to/postgres/data:/var/lib/postgresql/data/
        build: ./player_engine
          - db
        build: ./world_engine
          - db
        image: nginx
          - "80:80"
          - /path/to/web/content:/usr/share/nginx/html
        read_only: true

    Note that there are different versions of compose, version 2 is generally for single-host installations, and 3 is for Swarms.

    We only need to expose ports if the service is going to be used by the outside world (so the ui has a port, but the rest don't), and connections are done across the internal network (so the database connection would be dbi:Pg:host=db, and the http connections from ui to app2 would be http://app2/.

    We can then manage applications thus:

    • docker-compose up -d (start the app, and detatch from the console)
    • docker-compose ps (see what's running)
      Note that the Containers will be called dir_app (eg morg_db; morg_app; morg_app2; morg_ui)
    • docker-compose stop (shut everything down)

    Remember - these are all still running on the one host.
    (ref: https://docs.docker.com/compose/compose-file/compose-file-v2/)

    What is a Docker Swarm?

    A Docker Swarm is essentially a Docker-Compose'd application running over multiple machines. Each machine is called a node.

    A Docker Swarm requires a manager node and probably some worker nodes - and creates a space for Containers to run which is [mostly] independent of the underlying machines.

    A Docker Swarm is designed to run several containers (probably more than once), and provide a form of redundency via having Containers migrate from one host to another within the swarm. This is sometimes referred to as a High Availability Cluster.

    To avoid confusion, what was a Docker Image, became a [Docker] App, and is here referred to as a Service. Each running Container of a Service is called a Task.

    docker swarm init will create a new swarm on the current host/node

    (ref: https://docs.docker.com/engine/reference/commandline/swarm/ )

    Docker Swarm - Nodes

    When a brand new swarm is created, the initial node is a manager node. This creation process tells you how to join other nodes into the swarm.

    Creating the new node is not part of Docker - and thus not explained here.... however we assume they exist, and you can ssh into them

    To add a node to a swarm:

    1. ssh into an existing manager node, and type docker swarm join-token worker (or manager, to specifically add a manager node)
    2. ssh into the node to add, and cut'n'paste the output of the above command into the console

    To leave a swarm

    1. ssh into the node that's leaving, and type docker swarm leave.

    Working with nodes in the swarm

    You really want between 3 & 7 manager nodes ( ref: https://docs.docker.com/engine/swarm/how-swarm-mode-works/nodes/#manager-nodes).

    • docker node ls to see what nodes are registered with the swarm
    • docker node inspect NAME to get details about a named node
    • docker node ps NAME will tell you want's running on that node

    Nodes can also have labels (which is useful when assining containers to nodes): docker node update --label-add type=world somenode-a will add a label to the node somenode-a.

    Nodes can also be made manager/not-managers here too: docker node promote/docker node demote

    Docker Swarm - Networks

    For Containers to talk to each other across a swarm, they need to be part of an internal network.

    Docker networks have a SCOPE and a DRIVER

    • SCOPE is what can see them - generally local or swarm
    • DRIVER is
      • null - no network,
      • bridge - a gateway between node & container,
      • host - actually talking to the outside world directly via the hosts' network stack, or
      • overlay - only available to services in the swarm, but extended to any node that happens to have an instance of that service running on it.

    Any Docker host (Swarm or otherwise), gets three networks authomatically: bridge, none, and host - and unless you specify otherwise, every Container will connect to the hosts bridge network

    At ceration, every Docker Swarm will get an overlay network called ingress - this is how the containers on the various hosts talk to each other.

    NOTE: When an overlay network is created in a Swarm, it spans multiple nodes, and ties together those Services which are connected to it (and only those Services)

    From a manager node:

    • docker network create -d overlay world will create an overlay network called world
    • docker network ls to tell you what networks are defined,
    • docker network inspect NAME will tell you about it.

    Docker Swarm - Routing & Load Balancing

    The ingress overlay network provides one very important task: it enables any node in the network to respond to a request on a published port, and pass that request onto an active task - whether the task is on that node or not.

    For more details on routing & load-balancing, https://docs.docker.com/engine/swarm/ingress/ is a good starting reference.

    Docker Swarm - Stacks

    Docker Stacks work almost the same as Docker Compose, except we can also define some deploy details (such as networks to use, and the number of replicas to run.)

    Docker Stacks are an overarching term to encompass the full set of Services... a single Stack will usually hide multiple Services, each of which can be running multiple replicas of a particular Container.

    • docker stack ls tells you want Stacks are running
    • docker stack services NAME tells you what Services are running in the stack, what mode they are in, and how many replicas there are.
    • docker stack ps NAME tells you which Tasks are running, and which nodes they're [currently] on.
      (Note that docker stack ps NAME and docker service ps NAME are subtly different things)

    Creating services on specific nodes

    When a Stack is deployed, then the nodes the Tasks are on can be specified.

    • From the command line:
      docker service create --name my_thing --contraint 'node.label.type=world' container
    • Defined within a compose file
      docker stack deploy --compose-file my_world.yml world will start a complete stack, as defined by the my_world.yml compose file.
      my_world.yml (trimmed fore brevity):
      	version: '3'
      	    image: postgres
      	          - node.role == worker
      	          - node.labels.type == world

    Creating services on specific networks

    When a Stack is deployed, then the networks for the Tasks can be specified.

    • From the command line:
      docker service create --name my_thing --network world container
    • Within a compose file
      docker stack deploy --compose-file my_world.yml world will start a complete stack, as defined by the my_world.yml compose file
      my_world.yml (trimmed for brevity):
      	version: '3'
      	    image: postgres
      		    - world

    Note: when deploying a stack, services are named stack_app, rather than the dir_app  of Docker Compose.

    Tip: We have found that starting up a new Task on a node can take some time if that Image needs to be pulled from a Registry, so we tend to pre-pull all Images onto our nodes.

    Docker Swarm - Scaling

    One of the whole advantages of Swarms is that you can Scale according to need. Scaling covers two things:

    1. The number of active nodes in the swarm, and
    2. The number of replicas of a Task.

    Whilst Docker itself does not particularly control the former, it absolutely manages the second.

    Scaling - Node availability

    • Nodes have a Status, which can be Ready or Down (eg powered up, or switched off)
    • Nodes have an Availability, which can be Ready, Pause, or Drain
      • Ready means Tasks can be assigned to that Node
      • Pause means that no new Tasks can be assigned to that node, however existing Tasks will continue to run
      • Drain means that Tasks will be actively moved to another available Node
        (Note: Tasks that move Node will lose access to any data they have stored on that node - see Volumes below)

    Changing Node Status depends on the technology provider, so is not something you can do with Docker commands.

    Node Availability is settable even when the node is Down: docker node update --availability Active worker-2 will set the Availability of the node named worker-2 to Active.... even if worker-2 is powered-off.

    When updating a node, the common method is to Drain the node first (to push all Tasks off the node), then update it.

    Scaling - Service Replication

    The great advantage of a Swarm is that any particular Task can be replicated a number of times. There are two variations on replication, which is defined at create or deploy time:

    • replicated - This is the default mode, and means that a task is run on run on one of the available nodes, and the number of tasks are distributed across the nodes (using the ingress overlay network to manage communications)
    • global - In this mode, a copy of the task is run on every node in the swarm.

    When scaling an image, we define the number of tasks to run, not an adjustment: docker service scale frontend=50

    The following two commands are functioanlly equal:

    • docker service scale frontend=50
    • docker service update --replicas=50 frontend

    Using the scale command, we can scale multiple services simultaniously: docker service scale backend=30 frontend=50.

    Docker Swarm - Volumes

    Volumes are an issue in swarms.

    With non-swarm Docker, Volumes are Mounts are bits of host storage conencted into the container to provide persistent storage.

    When we switch to Swarm Mode, the Task can be started on any available Node, and could migrate to another Node - and a traditional mount can't cope with that.

    This document cannot go into all the permutations & ramifications when it comes to persistent data storage & off-container Mounts - however the output of docker inspect service will include a section on mounts (if there are any) - eg:

    "Mounts": [
            "Type": "bind",
            "Source": "/disk/remote/users",
            "Target": "/users"

    For more details on Volumes, see https://docs.docker.com/engine/admin/volumes/volumes/

    Docker Swarm - an example

    Let is make up an example.... let us create an MORG (Multiplayer Online Roleplaying Game.) The architecture has five components:

    • A World Engine (the bit that handles all the animals, scenery, and background world timeline)
    • A Player Engine (the bit that the player interacts with). Each logged-in player has their own engine.
    • A Database (obviously)
    • The Nexus (which spawns new Player Engines, and proxies the Player Engines to the outside world)
    • The Web Server (the bit with the documentation, handles the accunting, the logins, and generally sells the business)

    We know that the Web Server (the UI) and the Nexus need to talk to the outside world, so whatever nodes those services run on need to be Bridged to the Host, and have access to a public IP. We also want to isolate the Player Engines from the rest of our system (just in case someone hacks in), so we seperate them onto two Overlay networks:












    Docker Swarm - Trouble-shooting

    Like any system, an application on a swarm can go wrong.... but what can you do about it? How do you find out what's going on?

    As mentioned above docker node ls will tell you what nodes are in the swarm.

    Let me tweak our MORG slightly: lets have another node, and run a second UI on it.... this would mean our MORG could look like this:

    user@manager-0:~$ docker node ls
    ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
    cexrk3c52oak9hf85s5zen1pc     worker-3            Ready               Active              
    j7s90bc9wbc9ff59ys8mflu9d     db_worker-0         Ready               Active              
    kqmvimx60pcugt04pb8gnei5g     manager-1           Ready               Active              Reachable
    mlaqauzfetbpgvlk0fyjt8imd     worker-0            Ready               Pause              
    mtbqnf9tz1tp1jnkamr0qe8jt     worker-1            Ready               Active              
    nbnkmmz5jk1q7782vqbw171lt *   manager-0           Ready               Active              Leader
    ss2g23eb2dkp2vrequ92oyrtn     app_worker-1        Ready               Active              
    tggvrsw6uswyj7xfkqwc0mqfb     worker-2            Down                Active              
    tteeehm7yangfdwx1uw6fwuwi     manager-2           Ready               Active              Reachable
    y0ym2p4sbe21d8d892unonbt9     app_worker-0        Ready               Active              

    We have 10 nodes in the swarm: 3 manager nodes and 7 worker nodes: we've named one as a Database worker, a pair of Application workers, and the remaining 4 are general workers. We also have worker-0 Paused, and worker-2 Down.

    We can also look at what's running:

    user@manager-0:~$ docker service ls
    ID                  NAME                       MODE         REPLICAS    IMAGE                              PORTS
    0m41yy2cebyd        player_person_engine_aaa   replicated   1/1         example.com:5000/morg/player   
    dfwnvkaljlg0        player_person_engine_bbb   replicated   1/1         example.com:5000/morg/player   
    g4f5fiuk4yfn        world_postgres             replicated   1/1         postgres:9.6                                                     
    nje711xbhqk6        world_ui                   replicated   2/2         example.com:5000/morg/web:latest   *:80->80/tcp,*:443->443/tcp
    o1b028d4hevw        player_person_engine_ccc   replicated   1/1         example.com:5000/morg/player   
    ovd9kz8nft92        player_person_engine_ddd   replicated   1/1         example.com:5000/morg/player   
    p3puex972ii5        player_person_engine_aab   replicated   1/1         example.com:5000/morg/player   
    wch91afkv61h        player_nexus               replicated   1/1         example.com:5000/morg/hub:latest   *:8000->8000/tcp,*:8001->8001/tcp,*:8081->8081/tcp
    xozeg7tffe82        world_world_engine         replicated   1/1         example.com:5000/morg/world

    (note we are showing 2 replicas for the ui service in the world stack, even though the diagram doesn't)

    We can see that the Nexus is exposing ports 8000, 8001, and 8081, and the UI is exposing ports 80 & 443

    We can see that there are two stacks running: one called world and one called player - and we can deduce that the stacks seem to follow the networks.

    So where are the services actually running?

    Partly, this doesn't matter - that's the point of the swarm: the manager nodes load-balance Containers across the worker nodes... but if you really want to find out:

    user@manager-0:~$ docker service ps world_ui
    ID                  NAME         IMAGE                         NODE                DESIRED STATE       CURRENT STATE        ERROR               PORTS
    l7dycihy62sz        world_ui.1   example.com:5000/web:latest   app_worker-1        Running             Running 2 days ago                       
    oh9hfpmo4rwd        world_ui.2   example.com:5000/web:latest   app_worker-0        Running             Running 2 days ago                       

    Lets say we want to change the number of replicas for the ui: docker service scale=n world_ui. If you scale to 0, you effectively shut it down, if you ramp it up, it's gets spread across the worker Containers.

    Need to see logs?

    Logs are a challenge under Docker - mostly logs are streamed into the aether, and not kept. That stream, however, can be tapped into.

    1. To see the log stream for a service (ie, all the containers): docker service logs world_ui
    2. To see the log of a specific container: docker logs 71fd8bacae58
      (Note that this needs to be on the specific node running the container, and that 71fd8bacae58 is the container-id on that node!)

    How many services of a particular name are running, on which nodes?

    This is a common question, as it's a good indicator of Load (question: how many of your Containers can you run on your nodes, before things start to slow down/break down?)

    ubuntu@manager-0:~$ docker node ps $(docker node ls -q) --filter name=player --filter "desired-state=running" --format '{{.ID}} {{.Node}}' | sort | uniq | cut -d" " -f2 | sort | uniq -c
          3 worker-0
          2 worker-1