Kubernetes Cilium Upgrade¶
Cilium should be upgraded using Kubernetes rolling upgrade functionality in order to minimize network disruptions for running workloads.
If you have followed the installation guide from Deploying the DaemonSet, you probably
have deployed a single
cilium.yaml file. That file contains all the
necessary components to run Cilium in your Kubernetes cluster. Those components
DaemonSet, and the
Cilium to access the Kubernetes api-server.
Since Cilium might need more, or fewer permissions to access Kubernetes
api-server between releases, the
RBAC might change between versions as well.
The safest way to upgrade Cilium to version “v1.1” is by updating the
RBAC rules and the
DaemonSet files separately, which makes sure the
ConfigMap initially set up by
cilium.yaml and already stored in Kubernetes
will not be affected by the upgrade.
It is also recommended to upgrade the
ConfigMap, but this is a process that
should be done manually before upgrading the
RBAC and the
ConfigMap first will not affect current Cilium pods as the
ConfigMap configurations are only used when a pod is restarted.
- To update your current
ConfigMapstore it locally so you can modify it:
$ kubectl get configmap -n kube-system cilium-config -o yaml --export > cilium-cm-old.yaml $ cat ./cilium-cm-old.yaml apiVersion: v1 data: clean-cilium-state: "false" debug: "true" disable-ipv4: "false" etcd-config: |- --- endpoints: - https://192.168.33.11:2379 # # In case you want to use TLS in etcd, uncomment the 'ca-file' line # and create a kubernetes secret by following the tutorial in # https://cilium.link/etcd-config ca-file: '/var/lib/etcd-secrets/etcd-ca' # # In case you want client to server authentication, uncomment the following # lines and add the certificate and key in cilium-etcd-secrets below key-file: '/var/lib/etcd-secrets/etcd-client-key' cert-file: '/var/lib/etcd-secrets/etcd-client-crt' kind: ConfigMap metadata: creationTimestamp: null name: cilium-config selfLink: /api/v1/namespaces/kube-system/configmaps/cilium-config
- Download the
ConfigMapwith the changes for “v1.1”:
Verify its contents:
apiVersion: v1 kind: ConfigMap metadata: name: cilium-config namespace: kube-system data: # This etcd-config contains the etcd endpoints of your cluster. If you use # TLS please make sure you follow the tutorial in https://cilium.link/etcd-config etcd-config: |- --- endpoints: - http://127.0.0.1:31079 # # In case you want to use TLS in etcd, uncomment the 'ca-file' line # and create a kubernetes secret by following the tutorial in # https://cilium.link/etcd-config #ca-file: '/var/lib/etcd-secrets/etcd-ca' # # In case you want client to server authentication, uncomment the following # lines and create a kubernetes secret by following the tutorial in # https://cilium.link/etcd-config #key-file: '/var/lib/etcd-secrets/etcd-client-key' #cert-file: '/var/lib/etcd-secrets/etcd-client-crt' # If you want to run cilium in debug mode change this value to true debug: "false" disable-ipv4: "false" # If a serious issue occurs during Cilium startup, this # invasive option may be set to true to remove all persistent # state. Endpoints will not be restored using knowledge from a # prior Cilium run, so they may receive new IP addresses upon # restart. This also triggers clean-cilium-bpf-state. clean-cilium-state: "false" # If you want to clean cilium BPF state, set this to true; # Removes all BPF maps from the filesystem. Upon restart, # endpoints are restored with the same IP addresses, however # any ongoing connections may be disrupted briefly. # Loadbalancing decisions will be reset, so any ongoing # connections via a service may be loadbalanced to a different # backend after restart. clean-cilium-bpf-state: "false" legacy-host-allows-world: "false" # Regular expression matching compatible Istio sidecar istio-proxy # container image names sidecar-istio-proxy-image: "cilium/istio_proxy"
3. Add the new options manually to your old
ConfigMap, and make the necessary
In this example, the
debug option is meant to be kept with
etcd-config is kept unchanged, and
legacy-host-allows-world is a new
option, but after reading the Upgrade notes the value was kept unchanged
from the default value.
After making the necessary changes, the old
ConfigMap was migrated with the
new options while keeping the configuration that we wanted:
$ cat ./cilium-cm-old.yaml apiVersion: v1 data: debug: "true" disable-ipv4: "false" # If you want to clean cilium state; change this value to true clean-cilium-state: "false" legacy-host-allows-world: "false" etcd-config: |- --- endpoints: - https://192.168.33.11:2379 # # In case you want to use TLS in etcd, uncomment the 'ca-file' line # and create a kubernetes secret by following the tutorial in # https://cilium.link/etcd-config ca-file: '/var/lib/etcd-secrets/etcd-ca' # # In case you want client to server authentication, uncomment the following # lines and add the certificate and key in cilium-etcd-secrets below key-file: '/var/lib/etcd-secrets/etcd-client-key' cert-file: '/var/lib/etcd-secrets/etcd-client-crt' kind: ConfigMap metadata: creationTimestamp: null name: cilium-config selfLink: /api/v1/namespaces/kube-system/configmaps/cilium-config
After adding the options, manually save the file with your changes and install
ConfigMap in the
kube-system namespace of your cluster.
$ kubectl apply -n kube-system -f ./cilium-cm-old.yaml
Upgrading Cilium DaemonSet and RBAC¶
Simply pick your Kubernetes version and run
kubectl apply for the
Both files are dedicated to “v1.1” for each Kubernetes version.
You can also substitute the desired Cilium version number for vX.Y.Z in the command below, but be aware that copy of the spec file stored in Kubernetes might run out-of-sync with the CLI flags, or options, specified by each Cilium version.
kubectl set image daemonset/cilium -n kube-system cilium-agent=docker.io/cilium/cilium:vX.Y.Z
To monitor the rollout and confirm it is complete, run:
kubectl rollout status daemonset/cilium -n kube-system
To undo the rollout via rollback, run:
kubectl rollout undo daemonset/cilium -n kube-system
Cilium will continue to forward traffic at L3/L4 during the roll-out, and all endpoints and their configuration will be preserved across the upgrade rollout. However, because the L7 proxies implementing HTTP, gRPC, and Kafka-aware filtering currently reside in the same Pod as Cilium, they are removed and re-installed as part of the rollout. As a result, any proxied connections will be lost and clients must reconnect.
Occasionally, when encountering issues with a particular version of Cilium, it may be useful to alternatively downgrade an instance or deployment. The above instructions may be used, replacing the “v1.1” version with the desired version.
Particular versions of Cilium may introduce new features, however, so if Cilium is configured with the newer feature, and a downgrade is performed, then the downgrade may leave Cilium in a bad state. Below is a table of features which have been introduced in later versions of Cilium. If you are using a feature in the below table, then a downgrade cannot be safely implemented unless you also disable the usage of the feature.
|Feature||Minimum version||Mitigation||Feature Link|
|CIDR policies matching on IPv6 prefix ranges||
||Remove policies that contain IPv6 CIDR rules||Github PR|
|CIDR policies matching on default prefix||
||Remove policies that match a
The below issues have been fixed in Cilium 1.1, but require user interaction to mitigate or remediate the issue for users upgrading from an earlier release.
Traffic from world to endpoints is classified as from host¶
In Cilium 1.0, all traffic from the host, including from local processes and
traffic that is masqueraded from the outside world to the host IP, would be
classified as from the
host entity (
Furthermore, to allow Kubernetes agents to perform health checks over IP into
the endpoints, the host is allowed by default. This means that all traffic from
the outside world is also allowed by default, regardless of security policy.
- Cilium 1.0 or earlier deployed using the DaemonSet and ConfigMap YAMLs provided with that release, or
- Later versions of Cilium deployed using the YAMLs provided with Cilium 1.0 or earlier.
Affected environments will see no output for one or more of the below commands:
$ kubectl get ds cilium -n kube-system -o yaml | grep -B 3 -A 2 -i legacy-host-allows-world $ kubectl get cm cilium-config -n kube-system -o yaml | grep -i legacy-host-allows-world
Unaffected environments will see the following output (note the configMapKeyRef key in the Cilium DaemonSet and the
legacy-host-allows-world: "false" setting of the ConfigMap):
$ kubectl get ds cilium -n kube-system -o yaml | grep -B 3 -A 2 -i legacy-host-allows-world - name: CILIUM_LEGACY_HOST_ALLOWS_WORLD valueFrom: configMapKeyRef: name: cilium-config optional: true key: legacy-host-allows-world $ kubectl get cm cilium-config -n kube-system -o yaml | grep -i legacy-host-allows-world legacy-host-allows-world: "false"
Users who are not reliant upon IP-based health checks for their kubernetes pods
may mitigate this issue on earlier versions of Cilium by adding the argument
--allow-localhost=policy to the Cilium DaemonSet for the Cilium container.
This prevents the automatic insertion of L3 allow policy in kubernetes
environments. Note however that with this option, if the Cilium Network Policy
allows traffic from the host, then it will still allow access from the outside
$ kubectl edit ds cilium -n kube-system (Edit the "args" section to add the option "--allow-localhost=policy") $ kubectl rollout status daemonset/cilium -n kube-system (Wait for kubernetes to redeploy Cilium with the new options)
Cilium 1.1 and later only classify traffic from a process on the local host as
host entity; other traffic that is masqueraded to the host IP is
now classified as from the
world entity (
Fresh deployments using the Cilium 1.1 YAMLs are not affected.
Affected users are recommended to upgrade using the steps below.
Redeploy the Cilium DaemonSet with the YAMLs provided with the Cilium 1.1 or later release. The instructions for this are found at the top of the Upgrade Guide.
Add the config option
legacy-host-allows-world: "false"to the Cilium ConfigMap under the “data” paragraph.
$ kubectl edit configmap cilium-config -n kube-system (Add a new line with the config option above in the "data" paragraph)
(Optional) Update the Cilium Network Policies to allow specific traffic from the outside world. For more information, see Network Policy.
MTU handling behavior change in Cilium 1.1¶
Cilium 1.0 by default configured the MTU of all Cilium-related devices and
endpoint devices to 1450 bytes, to guarantee that packets sent from an endpoint
would remain below the MTU of a tunnel. This had the side-effect that when a
Cilium-managed pod made a request to an outside (world) IP, if the response
came back in 1500B chunks, then it would be fragmented when transmitted to the
cilium_host device. These fragments then pass through the Cilium policy
logic. Latter IP fragments would not contain L4 ports, so if any L4 or L4+L7
policy was applied to the destination endpoint, then the fragments would be
dropped. This could cause disruption to network traffic.
- Cilium 1.0 or earlier.
Cilium 1.1 and later are not affected.
There is no known mitigation for users running Cilium 1.0 at this time.
Cilium 1.1 fixes the above issue by increasing the MTU of the Cilium-related devices and endpoint devices to 1500B (or larger based on container runtime settings), then configuring a route within the endpoint at a lower MTU to ensure that transmitted packets will fit within tunnel encapsulation. This addresses the above issue for all new pods.
Endpoints that were deployed on Cilium 1.0 must be redeployed to remediate this issue.
When upgrading from Cilium 1.0 to 1.1 or later, existing pods will not automatically inherit these new settings. To apply the new MTU settings to existing endpoints, they must be re-deployed. To fetch a list of affected pods in kubernetes environments, run the following command:
$ kubectl get cep --all-namespaces NAMESPACE NAME AGE default deathstar-765fd545f9-m6bpt 50m default deathstar-765fd545f9-vlfth 50m default tiefighter 50m default xwing 50m kube-system cilium-health-k8s1 27s kube-system cilium-health-k8s2 25s kube-system kube-dns-59d8c5f9b5-g2pnt 2h
cilium-health endpoints do not need to be redeployed, as Cilium will
redeploy them automatically upon upgrade. Depending on how the endpoints were
originally deployed, this may be as simple as running
kubectl delete pod <podname>. Once each pod has been redeployed, you can
fetch a list of the related interfaces and confirm that the new MTU settings
have been applied via the following commands:
$ kubectl get cep --all-namespaces -o yaml | grep -e "pod-name:" -e "interface-name" pod-name: default:deathstar-765fd545f9-m6bpt interface-name: lxc55330 pod-name: default:deathstar-765fd545f9-vlfth interface-name: lxc4fe9b pod-name: default:tiefighter interface-name: lxcf1e94 pod-name: default:xwing interface-name: lxc7cb0f pod-name: ':' interface-name: cilium_health pod-name: ':' interface-name: cilium_health pod-name: kube-system:kube-dns-59d8c5f9b5-g2pnt interface-name: lxc0e2f6 $ ip link show lxc0e2f6 | grep mtu 22: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
The first command above lists all Cilium endpoints and their corresponding interface names, and the second command demonstrates how to find the MTU for the interface. Typically the MTU should be 1500 bytes after the endpoints have been re-deployed, unless the Cilium CNI configuration requests a different MTU.