Setting up Cilium in AWS ENI mode

Note

The AWS ENI integration is still subject to some limitations. See Limitations for details.

Create an AWS cluster

Setup a Kubernetes on AWS. You can use any method you prefer, but for the simplicity of this tutorial, we are going to use eksctl. For more details on how to set up an EKS cluster using eksctl, see the section Installation on AWS EKS.

eksctl create cluster -n eni-cluster -N 0

Disable the aws-node DaemonSet (EKS only)

If you are running an EKS cluster, disable the aws-node DaemonSet so it does not interfere with the ENIs managed by Cilium:

kubectl -n kube-system set image daemonset/aws-node aws-node=docker.io/spaster/alpine-sleep

Prepare & Deploy Cilium

Install Helm to prepare generating the deployment artifacts based on the Helm templates.

Setup helm repository:

helm repo add cilium https://helm.cilium.io/

Deploy Cilium release via Helm:

helm install cilium cilium/cilium --version 1.7.0 \
  --namespace kube-system \
  --set global.eni=true \
  --set global.egressMasqueradeInterfaces=eth0 \
  --set global.tunnel=disabled \
  --set global.nodeinit.enabled=true

Note

The above options are assuming that masquerading is desired and that the VM is connected to the VPC using eth0. It will route all traffic that does not stay in the VPC via eth0 and masquerade it.

If you want to avoid masquerading, set global.masquerade=false. You must ensure that the security groups associated with the ENIs (eth1, eth2, …) allow for egress traffic to outside of the VPC. By default, the security groups for pod ENIs are derived from the primary ENI (eth0).

Scale up the cluster

eksctl get nodegroup --cluster eni-cluster
CLUSTER                     NODEGROUP       CREATED                 MIN SIZE        MAX SIZE        DESIRED CAPACITY        INSTANCE TYPE   IMAGE ID
test-cluster                ng-25560078     2019-07-23T06:05:35Z    0               2               0                       m5.large        ami-0923e4b35a30a5f53
eksctl scale nodegroup --cluster eni-cluster -n ng-25560078 -N 2
[]  scaling nodegroup stack "eksctl-test-cluster-nodegroup-ng-25560078" in cluster eksctl-test-cluster-cluster
[]  scaling nodegroup, desired capacity from 0 to 2

Validate the Installation

You can monitor as Cilium and all required components are being installed:

kubectl -n kube-system get pods --watch
NAME                                    READY   STATUS              RESTARTS   AGE
cilium-operator-cb4578bc5-q52qk         0/1     Pending             0          8s
cilium-s8w5m                            0/1     PodInitializing     0          7s
coredns-86c58d9df4-4g7dd                0/1     ContainerCreating   0          8m57s
coredns-86c58d9df4-4l6b2                0/1     ContainerCreating   0          8m57s

It may take a couple of minutes for all components to come up:

cilium-operator-cb4578bc5-q52qk         1/1     Running   0          4m13s
cilium-s8w5m                            1/1     Running   0          4m12s
coredns-86c58d9df4-4g7dd                1/1     Running   0          13m
coredns-86c58d9df4-4l6b2                1/1     Running   0          13m

Deploy the connectivity test

You can deploy the “connectivity-check” to test connectivity between pods.

kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.7.0/examples/kubernetes/connectivity-check/connectivity-check.yaml

It will deploy a series of deployments which will use various connectivity paths to connect to each other. Connectivity paths include with and without service load-balancing and various network policy combinations. The pod name indicates the connectivity variant and the readiness and liveness gate indicates success or failure of the test:

kubectl get pods
NAME                                                     READY   STATUS             RESTARTS   AGE
echo-a-9b85dd869-292s2                                   1/1     Running            0          8m37s
echo-b-c7d9f4686-gdwcs                                   1/1     Running            0          8m37s
host-to-b-multi-node-clusterip-6d496f7cf9-956jb          1/1     Running            0          8m37s
host-to-b-multi-node-headless-bd589bbcf-jwbh2            1/1     Running            0          8m37s
pod-to-a-7cc4b6c5b8-9jfjb                                1/1     Running            0          8m36s
pod-to-a-allowed-cnp-6cc776bb4d-2cszk                    1/1     Running            0          8m36s
pod-to-a-external-1111-5c75bd66db-sxfck                  1/1     Running            0          8m35s
pod-to-a-l3-denied-cnp-7fdd9975dd-2pp96                  1/1     Running            0          8m36s
pod-to-b-intra-node-9d9d4d6f9-qccfs                      1/1     Running            0          8m35s
pod-to-b-multi-node-clusterip-5956c84b7c-hwzfg           1/1     Running            0          8m35s
pod-to-b-multi-node-headless-6698899447-xlhfw            1/1     Running            0          8m35s
pod-to-external-fqdn-allow-google-cnp-667649bbf6-v6rf8   0/1     Running            0          8m35s

Install Hubble

Hubble is a fully distributed networking and security observability platform for cloud native workloads. It is built on top of Cilium and eBPF to enable deep visibility into the communication and behavior of services as well as the networking infrastructure in a completely transparent manner. Visit Hubble Github page.

Generate the deployment files using Helm and deploy it:

git clone https://github.com/cilium/hubble.git
cd hubble/install/kubernetes

helm template hubble \
    --namespace kube-system \
    --set metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}" \
    --set ui.enabled=true \
> hubble.yaml

Deploy Hubble:

kubectl apply -f hubble.yaml

Limitations

  • The AWS ENI integration of Cilium is currently only enabled for IPv4.
  • When applying L7 policies at egress, the source identity context is lost as it is currently not carried in the packet. This means that traffic will look like it is coming from outside of the cluster to the receiving pod.

Troubleshooting

Make sure to disable DHCP on ENIs

Cilium will use both the primary and secondary IP addresses assigned to ENIs. Use of the primary IP address optimizes the number of IPs available to pods but can conflict with a DHCP agent running on the node and assigning the primary IP of the ENI to the interface of the node. A common scenario where this happens is if NetworkManager is running on the node and automatically performing DHCP on all network interfaces of the VM. Be sure to disable DHCP on any ENIs that get attached to the node or disable NetworkManager entirely.