Standard Installation

This guides takes you through the steps required to set up Cilium on Kubernetes using the cilium-etcd-operator. The cilium-etcd-operator replaces the requirement for an external kvstore. You can learn more about it in the section What is the cilium-etcd-operator? It is suitable for small and medium scale deployments where etcd performance is not absolutely essential. Please refer to the Limitations section below to learn about the exact limitations of this deployment method.

Should you encounter any issues during the installation, please refer to the Troubleshooting section and / or seek help on the Slack channel.

Requirements

Make sure your Kubernetes environment is meeting the requirements:

  • Kubernetes >= 1.9
  • Linux kernel >= 4.9
  • Kubernetes in CNI mode
  • Running kube-dns/coredns (When using the etcd-operator installation method)
  • Mounted BPF filesystem mounted on all worker nodes
  • Enable PodCIDR allocation (--allocate-node-cidrs) in the kube-controller-manager (recommended)

Refer to the section Requirements for detailed instruction on how to prepare your Kubernetes environment.

For CoreDNS: Enable reverse lookups

In order for the TLS certificates between etcd peers to work correctly, a DNS reverse lookup on a pod IP must map back to pod name. If you are using CoreDNS, check the CoreDNS ConfigMap and validate that in-addr.arpa and ip6.arpa are listed as wildcards for the kubernetes block like this:

kubectl -n kube-system edit cm coredns
[...]
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          upstream
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
    }

The contents can look different than the above. The specific configuration that matters is to make sure that in-addr.arpa and ip6.arpa are listed as wildcards next to cluster.local.

You can validate this by looking up a pod IP with the host utility from any pod:

host 10.60.20.86
86.20.60.10.in-addr.arpa domain name pointer cilium-etcd-972nprv9dp.cilium-etcd.kube-system.svc.cluster.local.

Deploy Cilium + cilium-etcd-operator

The following all-in-one YAML will deploy all required components to bring up Cilium including an etcd cluster managed by the cilium-etcd-operator.

Note

It is important to always deploy Cilium and the cilium-etcd-operator together. The cilium-etcd-operator is not able to bootstrap without running Cilium instances. It requires a CNI plugin to provide networking between the etcd pods forming the cluster. Cilium has special logic built in that allows etcd pods to communicate during the bootstrapping phase of Cilium.

For Docker as container runtime:

kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.14/cilium.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.13/cilium.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.12/cilium.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.11/cilium.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.10/cilium.yaml

For CRI-O as container runtime:

kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.14/cilium-crio.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.13/cilium-crio.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.12/cilium-crio.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.11/cilium-crio.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.10/cilium-crio.yaml

For containerd as container runtime:

kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.14/cilium-containerd.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.13/cilium-containerd.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.12/cilium-containerd.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.11/cilium-containerd.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.5.3/examples/kubernetes/1.10/cilium-containerd.yaml

Validate the Installation

You can monitor as Cilium and all required components are being installed:

kubectl -n kube-system get pods --watch
NAME                                    READY   STATUS              RESTARTS   AGE
cilium-etcd-operator-6ffbd46df9-pn6cf   1/1     Running             0          7s
cilium-operator-cb4578bc5-q52qk         0/1     Pending             0          8s
cilium-s8w5m                            0/1     PodInitializing     0          7s
coredns-86c58d9df4-4g7dd                0/1     ContainerCreating   0          8m57s
coredns-86c58d9df4-4l6b2                0/1     ContainerCreating   0          8m57s

It may take a couple of minutes for the etcd-operator to bring up the necessary number of etcd pods to achieve quorum. Once it reaches quorum, all components should be healthy and ready:

cilium-etcd-8d95ggpjmw                  1/1     Running   0          78s
cilium-etcd-operator-6ffbd46df9-pn6cf   1/1     Running   0          4m12s
cilium-etcd-t695lgxf4x                  1/1     Running   0          118s
cilium-etcd-zw285m6t9g                  1/1     Running   0          2m41s
cilium-operator-cb4578bc5-q52qk         1/1     Running   0          4m13s
cilium-s8w5m                            1/1     Running   0          4m12s
coredns-86c58d9df4-4g7dd                1/1     Running   0          13m
coredns-86c58d9df4-4l6b2                1/1     Running   0          13m
etcd-operator-5cf67779fd-hd9j7          1/1     Running   0          2m42s

Troubleshooting

  • Make sure that kube-dns or coredns is running and healthy in the kube-system namespace. A functioning Kubernetes DNS is strictly required in order for Cilium to resolve the ClusterIP of the etcd cluster. If either kube-dns or coredns were already running before Cilium was deployed, the pods may be managed by a former CNI plugin. cilium-operator will automatically restart the pods to ensure that they are being managed by the Cilium CNI plugin. You can manually restart the pods as well if required and validate that Cilium is managing kube-dns or coredns by running:

    kubectl -n kube-system get cep
    

    You should see kube-dns-xxx or coredns-xxx pods.

  • In order for the entire system to come up, the following components have to be running at the same time:

    • kube-dns or coredns
    • cilium-xxx
    • cilium-etcd-operator
    • etcd-operator
    • etcd-xxx

    All timeouts are configured that this will typically work out smoothly even if some of the pods restart once or twice. In case any of the above pods get into a long CrashLoopBackoff, bootstrapping can be expedited by restarting the pods to reset the CrashLoopBackoff time.

What is the cilium-etcd-operator?

The cilium-etcd-operator uses and extends the etcd-operator to guarantee quorum, auto-create certificates, and manage compaction:

  • Automatic re-creation of the etcd cluster when the cluster loses quorum. The standard etcd-operator will refuse to bring up new etcd nodes and the etcd cluster becomes unusable.
  • Automatic creation of certificates and keys. This simplifies the installation of the operator and makes the certificates and keys required to access the etcd cluster available to Cilium using a well known Kubernetes secret name.
  • Compaction is automatically handled.

Limitations

Use of the cilium-etcd-operator offers a lot of advantages including simplicity of installation, automatic management of the etcd cluster including compaction, restart on quorum loss, and automatic use of TLS. There are several disadvantages which can become of relevance as you scale up your clusters:

  • etcd nodes operated by the etcd-operator will not use persistent storage. Once the etcd cluster looses quorum, the etcd cluster is automatically re-created by the cilium-etcd-operator. Cilium will automatically recover and re-create all state in etcd. This operation can take can couple of seconds and may cause minor disruptions as ongoing distributed locks are invalidated and security identities have to be re-allocated.
  • etcd is very sensitive to disk IO latency and requires fast disk access at a certain scale. The cilium-etcd-operator will not take any measures to provide fast disk access and performance will depend whatever is provided to the pods in your Kubernetes cluster. See etcd Hardware recommendations for more details.