• Ian Stuart's picture

    Impact of a Rancher upgrade

    Ian Stuart / September 5, 2018
  • Rancher (https://rancher.com) is a Kubenetes Cluster management tool.

    When deployed as a single-node install, upgrades have consiquences - in the case of Rancher, the cluster goes off-line for a few seconds.

    The Proof is in the pudding

    Preparation

    Acquire 6 machines (in my case, I've got 6 VMs in a local cloud-provision service, so I'm calling them nodes )

    Ensure all have docker-ce installed, and whatever file-editing tools you prefer.

    Set up rancher

    Log into the node chosen as the Rancher server, and create a docker-compose.yml file:

    version: "3.3"
      services:
        rancher:
          image: rancher/rancher:v2.0.6
          container_name: rancher
          volumes:
            - rancher_data_local:/var/lib/rancher
          ports:
            - 80:80
            - 443:443
          restart: unless-stopped
      volumes:
        rancher_data_local: {}
    

    Start rancher:

    $> docker-compose up -d --no-deps rancher
    

    Set up a Kubenetes cluser 

    Log into the rancher web UI, and create a test cluster: 3 control/etcd & 2 worker

    Setup kubectl to use this cluster (there are several ways, I happen to have multiple clusters defined in ~/.kube/config, others use the KUBECONFIG environment variable)

    Confirm your looking at the right cluster:

    $> kubectl config current-context
    rancher-upgrade-test
    

    Create a kubenetes deployment

    We will test our cluster by using the busybox (https://hub.docker.com/_/busybox/) docker image that just echos the time of day, every second. We will spin up four of these pods.

    Create a deployment yaml file - counter_pod.yml:

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
     name: counter
    spec:
      replicas: 4
      template:
        metadata:
          labels:
            app: tester
        spec:
          containers:
          - name: count
            image: busybox
            args: [/bin/sh, -c, 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
    

    Start the deployment:

    $> kubectl create -f counter_pod.yml
    

    Observe the cluster:

    Use two terminals, and see data from the cluster:

    $> watch kubectl get pods -o wide
    
    Every 2.0s: kubectl get pods -o wide                      my_host: Wed Sep  5 10:21:38 2018
    NAME                      READY   STATUS   RESTARTS   AGE  IP           NODE
    counter-67c96c6f5c-k4r79  1/1     Running  0          1m   10.42.3.8    kubeworker0-test-1
    counter-67c96c6f5c-knq9l  1/1     Running  0          1m   10.42.3.7    kubeworker0-test-1
    counter-67c96c6f5c-xbg2w  1/1     Running  0          1m   10.42.4.6    kubeworker0-test-2
    counter-67c96c6f5c-zn9kn  1/1     Running  0          1m   10.42.4.5    kubeworker0-test-2
    

    and:

    $> kubectl logs -f counter-67c96c6f5c-k4r79
    
    6381: Wed Sep  5 09:22:10 UTC 2018
    6382: Wed Sep  5 09:22:11 UTC 2018
    6383: Wed Sep  5 09:22:12 UTC 2018
    6384: Wed Sep  5 09:22:13 UTC 2018
    6385: Wed Sep  5 09:22:14 UTC 2018
    6386: Wed Sep  5 09:22:15 UTC 2018
    6387: Wed Sep  5 09:22:16 UTC 2018
    6388: Wed Sep  5 09:22:17 UTC 2018
    6389: Wed Sep  5 09:22:18 UTC 2018
    6390: Wed Sep  5 09:22:19 UTC 2018
    6391: Wed Sep  5 09:22:20 UTC 2018
    6392: Wed Sep  5 09:22:21 UTC 2018
    

    Upgrade Rancher

    Go back to the Rancher node, and edit the docker-compose.yml file - update rancher from 2.0.6 to 2.0.8

    Upgrade the docker image:

    $> docker-compose up -d --no-deps rancher
    

    Observe the effects

    • The log trace will end with an error: unexpected EOF note
    • The pods list will change to give you a No resources found. Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods) error
    • The Rancher UI will put up a red box saying the "This cluster is currently Unavailable; areas that interact directly with it will not be available until the API is ready.

    After a few seconds, the pod list will come back - with all the same names - and shortly after that, the UI will start to show control again.

     

    Addendum

    [ 7th September 2018 ]
    I performed the same test using a HA rancher install (Rancher in a 3-machine Kubenetes cluster) - and the same thing happened: the clusters all go off-line whilst Rancher is upgraded, and then come back.