Installation on AWS EKS

Create an EKS Cluster

The first step is to create an EKS cluster. This guide will use eksctl but you can also follow the Getting Started with Amazon EKS guide.

Install eksctl

curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
brew install weaveworks/tap/eksctl

Create the cluster

Create an EKS cluster with eksctl see the eksctl Documentation for details on how to set credentials, change region, VPC, cluster size, etc.

eksctl create cluster

You should see something like this:

[ℹ]  using region us-west-2
[ℹ]  setting availability zones to [us-west-2b us-west-2a us-west-2c]
[ℹ]  subnets for us-west-2b - public:192.168.0.0/19 private:192.168.96.0/19
[ℹ]  subnets for us-west-2a - public:192.168.32.0/19 private:192.168.128.0/19
[ℹ]  subnets for us-west-2c - public:192.168.64.0/19 private:192.168.160.0/19
[ℹ]  nodegroup "ng-1e83ec43" will use "ami-0a2abab4107669c1b" [AmazonLinux2/1.11]
[ℹ]  creating EKS cluster "ridiculous-gopher-1548608219" in "us-west-2" region
[ℹ]  will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --name=ridiculous-gopher-1548608219'
[ℹ]  creating cluster stack "eksctl-ridiculous-gopher-1548608219-cluster"
[ℹ]  creating nodegroup stack "eksctl-ridiculous-gopher-1548608219-nodegroup-ng-1e83ec43"
[✔]  all EKS cluster resource for "ridiculous-gopher-1548608219" had been created
[✔]  saved kubeconfig as "/Users/tgraf/.kube/config"
[ℹ]  nodegroup "ng-1e83ec43" has 0 node(s)
[ℹ]  waiting for at least 2 node(s) to become ready in "ng-1e83ec43"
[ℹ]  nodegroup "ng-1e83ec43" has 2 node(s)
[ℹ]  node "ip-192-168-4-64.us-west-2.compute.internal" is ready
[ℹ]  node "ip-192-168-42-60.us-west-2.compute.internal" is ready
[ℹ]  kubectl command should work with "/Users/tgraf/.kube/config", try 'kubectl get nodes'
[✔]  EKS cluster "ridiculous-gopher-1548608219" in "us-west-2" region is ready

Disable SNAT in aws-node agent

Disable the SNAT behavior of the aws-node DaemonSet which causes all traffic leaving a node to be automatically be masqueraded.

kubectl -n kube-system set env ds aws-node AWS_VPC_K8S_CNI_EXTERNALSNAT=true

For CoreDNS: Enable reverse lookups

In order for the TLS certificates between etcd peers to work correctly, a DNS reverse lookup on a pod IP must map back to pod name. If you are using CoreDNS, check the CoreDNS ConfigMap and validate that in-addr.arpa and ip6.arpa are listed as wildcards for the kubernetes block like this:

kubectl -n kube-system edit cm coredns
[...]
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          upstream
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
    }

The contents can look different than the above. The specific configuration that matters is to make sure that in-addr.arpa and ip6.arpa are listed as wildcards next to cluster.local.

You can validate this by looking up a pod IP with the host utility from any pod:

host 10.60.20.86
86.20.60.10.in-addr.arpa domain name pointer cilium-etcd-972nprv9dp.cilium-etcd.kube-system.svc.cluster.local.

Deploy Cilium + cilium-etcd-operator

The following all-in-one YAML will deploy all required components to bring up Cilium including an etcd cluster managed by the cilium-etcd-operator.

Note

It is important to always deploy Cilium and the cilium-etcd-operator together. The cilium-etcd-operator is not able to bootstrap without running Cilium instances. It requires a CNI plugin to provide networking between the etcd pods forming the cluster. Cilium has special logic built in that allows etcd pods to communicate during the bootstrapping phase of Cilium.

For Docker as container runtime:

kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.14/cilium.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.13/cilium.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.12/cilium.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.11/cilium.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.10/cilium.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.9/cilium.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.8/cilium.yaml

For CRI-O as container runtime:

kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.14/cilium-crio.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.13/cilium-crio.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.12/cilium-crio.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.11/cilium-crio.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.10/cilium-crio.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.9/cilium-crio.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.8/cilium-crio.yaml

Note

You may notice that the kube-dns-* pods get restarted. The cilium-operator will automatically restart CoreDNS if the pods are not managed by the Cilium CNI plugin.

Validate the Installation

You can monitor as Cilium and all required components are being installed:

kubectl -n kube-system get pods --watch
NAME                                    READY   STATUS              RESTARTS   AGE
aws-node-vgc7n                          1/1     Running             0          2m55s
aws-node-x6sjm                          1/1     Running             0          3m35s
cilium-cvp8q                            0/1     Init:0/1            0          53s
cilium-etcd-operator-6d9975f5df-2vflw   0/1     ContainerCreating   0          54s
cilium-operator-788c55554-gkpbf         0/1     ContainerCreating   0          54s
cilium-tdzcx                            0/1     Init:0/1            0          53s
coredns-77b578f78d-km6r4                1/1     Running             0          11m
coredns-77b578f78d-qr6gq                1/1     Running             0          11m
kube-proxy-l47rx                        1/1     Running             0          6m28s
kube-proxy-zj6v5                        1/1     Running             0          6m28s

It may take a couple of minutes for the etcd-operator to bring up the necessary number of etcd pods to achieve quorum. Once it reaches quorum, all components should be healthy and ready:

kubectl -n=kube-system get pods
NAME                                    READY   STATUS    RESTARTS   AGE
aws-node-vgc7n                          1/1     Running   0          2m
aws-node-x6sjm                          1/1     Running   0          3m
cilium-cvp8q                            1/1     Running   0          42s
cilium-etcd-operator-6d9975f5df-2vflw   1/1     Running   0          43s
cilium-etcd-p2ggsb22nc                  1/1     Running   0          28s
cilium-operator-788c55554-gkpbf         1/1     Running   2          43s
cilium-tdzcx                            1/1     Running   0          42s
coredns-77b578f78d-2khwp                1/1     Running   0          13s
coredns-77b578f78d-bs6rp                1/1     Running   0          13s
etcd-operator-7b9768bc99-294wf          1/1     Running   0          37s
kube-proxy-l47rx                        1/1     Running   0          6m
kube-proxy-zj6v5                        1/1     Running   0          6m