This upgrade guide is intended for Cilium 1.3 or later running on Kubernetes. It is assuming that Cilium has been deployed using standard procedures as described in the Deployment. If you have installed Cilium using the guide Requirements, then this is automatically the case. If you are looking for instructions for upgrading from a version of Cilium prior to 1.3, then please consult the documentation from that release.
Running a pre-flight DaemonSet¶
When rolling out an upgrade with Kubernetes, Kubernetes will first terminate the pod followed by pulling the new image version and then finally spin up the new image. In order to reduce the downtime of the agent, the new image version can be pre-pulled. It also verifies that the new image version can be pulled and avoids ErrImagePull errors during the rollout.
After running the cilium-pre-flight.yaml, make sure the number of READY pods is the same number of Cilium pods running.
kubectl get daemonset -n kube-system | grep cilium NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE cilium 2 2 2 2 2 <none> 1h20m cilium-pre-flight-check 2 2 2 2 2 <none> 7m15s
Once the number of READY pods are the same, you can delete cilium-pre-flight-check
DaemonSet and proceed with the upgrade.
kubectl -n kube-system delete ds cilium-pre-flight-check
Upgrading Micro Versions¶
Micro versions within a particular minor version, e.g. 1.2.x -> 1.2.y, are
always 100% compatible for both up- and downgrades. Upgrading or downgrading is
as simple as changing the image tag version in the
kubectl -n kube-system set image daemonset/cilium cilium-agent=docker.io/cilium/cilium:vX.Y.Z kubectl -n kube-system rollout status daemonset/cilium
Kubernetes will automatically restart all Cilium according to the
UpgradeStrategy specified in the
Direct version upgrade between minor versions is not recommended as RBAC and DaemonSet definitions are subject to change between minor versions. See Upgrading Minor Versions for instructions on how to up or downgrade between different minor versions.
Upgrading Minor Versions¶
Step 1: Upgrade to latest micro version (Recommended)¶
When upgrading from one minor release to another minor release, for example 1.x to 1.y, it is recommended to first upgrade to the latest micro release as documented in (Upgrading Micro Versions). This ensures that downgrading by rolling back on a failed minor release upgrade is always possible and seamless.
Step 2: Upgrade the ConfigMap (Optional)¶
The configuration of Cilium is stored in a
cilium-config. The format is compatible between minor releases so
configuration parameters are automatically preserved across upgrades. However,
new minor releases may introduce new functionality that require opt-in via the
ConfigMap. Refer to the Version Specific Notes for a list of new
configuration options for each minor version.
Step 3: Apply new RBAC and DaemonSet definitions¶
As minor versions typically introduce new functionality which require changes
RBAC definitions, the YAML definitions have to be
upgraded. The following links refer to version specific DaemonSet files which
Both files are dedicated to “v1.5” for each Kubernetes version.
Below we will show examples of how Cilium should be upgraded using Kubernetes
rolling upgrade functionality in order to preserve any existing Cilium
configuration changes (e.g., etc configuration) and minimize network
disruptions for running workloads. These instructions upgrade Cilium to version
“v1.5” by updating the
RBAC rules and
DaemonSet files separately. Rather than installing all configuration in one
cilium.yaml file, which could override any custom
configuration, installing these files separately allows upgrade to be staged
and for user configuration to not be affected by the upgrade.
Occasionally, it may be necessary to undo the rollout because a step was missed
or something went wrong during upgrade. To undo the rollout, change the image
tag back to the previous version or undo the rollout using
$ kubectl rollout undo daemonset/cilium -n kube-system
This will revert the latest changes to the Cilium
DaemonSet and return
Cilium to the state it was in prior to the upgrade.
When rolling back after new features of the new minor version have already
been consumed, consult an eventual existing downgrade section in the
Version Specific Notes to check and prepare for incompatible feature use
before downgrading/rolling back. This step is only required after new
functionality introduced in the new minor version has already been
explicitly used by importing policy or by opting into new features via the
Version Specific Notes¶
This section documents the specific steps required for upgrading from one version of Cilium to another version of Cilium. There are particular version transitions which are suggested by the Cilium developers to avoid known issues during upgrade, then subsequently there are sections for specific upgrade transitions, ordered by version.
The table below lists suggested upgrade transitions, from a specified current
version running in a cluster to a specified target version. If a specific
combination is not listed in the table below, then it may not be safe. In that
case, consider staging the upgrade, for example upgrading from
1.1.x to the
1.1.y release before subsequently upgrading to
|Current version||Target version||
||L3 impact||L7 impact|
||Required||N/A||Clients must reconnect|
||Required||Temporary disruption||Clients must reconnect|
||Required||Minimal to None||Clients must reconnect|
||Required||Minimal to None||Clients must reconnect|
- Clients must reconnect: Any traffic flowing via a proxy (for example, because an L7 policy is in place) will be disrupted during upgrade. Endpoints communicating via the proxy must reconnect to re-establish connections.
- Temporary disruption: All traffic may be temporarily disrupted during upgrade. Connections should successfully re-establish without requiring clients to reconnect.
1.5 Upgrade Notes¶
Cilium versions 1.5.0 and 1.5.1 contain two serious regressions, do not use either version.
- The load-balancing layer contained a defect which could result in incorrect routing decisions being made as well as reverse NAT not being performed correctly.
- A bug in the handling of kvstore keys resulted in the lease of kvstore keys to not be updated correctly after a restart of the agent. This would result in the deletion of all keys subject to the lease about 15-30 minutes after the agent was last restarted. The majority of kvstore keys would get re-created immediately by the owning agent to recover. However, ipcache entries which would remain lost for about 5 minutes until a periodic synchronization kicked in or until pods were restarted.
Upgrading from >=1.4.0 to 1.5.y¶
In v1.4, the TCP conntrack table size
ct-global-max-entries-tcpConfigMap parameter was ineffective due to a bug and thus, the default value (
1000000) was used instead. To prevent from breaking established TCP connections,
bpf-ct-global-tcp-maxmust be set to
1000000in the ConfigMap before upgrading. Refer to the section Upgrading a ConfigMap on how to upgrade the
If you previously upgraded to v1.5, downgraded to <v1.5, and now want to upgrade to v1.5 again, then you must run the following
DaemonSetbefore doing the upgrade:
Follow the standard procedures to perform the upgrade as described in Upgrading Minor Versions.
New Default Values¶
- The connection-tracking garbage collector interval is now dynamic. It will automatically adjust based n on the percentage of the connection tracking table that has been cleared in the last run. The interval will vary between 10 seconds and 30 minutes or 12 hours for LRU based maps. This should automatically optimize CPU consumption as much as possible while keeping the connection tracking table utilization below 25%. If needed, the interval can be set to a static interval with the option
--conntrack-gc-interval. If connectivity fails and
cilium monitor --type dropshows
xx drop (CT: Map insertion failed), then it is likely that the connection tracking table is filling up and the automatic adjustment of the garbage collector interval is insufficient. Set
--conntrack-gc-intervalto an interval lower than the default. Alternatively, the value for
bpf-ct-global-tcp-maxcan be increased. Setting both of these options will be a trade-off of CPU for
conntrack-gc-interval, and for
bpf-ct-global-tcp-maxthe amount of memory consumed.
New ConfigMap Options¶
All options available in the cilium-agent can now be specified in the Cilium ConfigMap without requiring to set an environment variable in the DaemonSet.
enable-k8s-event-handover: enables use of the kvstore to optimize Kubernetes event handling by listening for k8s events in the operator and mirroring it into the kvstore for reduced overhead in large clusters.
enable-legacy-services: enables legacy services (prior v1.5) to prevent from terminating established connections to services when upgrading Cilium from < v1.5 to v1.5. When the option is not disabled, legacy services are enabled by default. Legacy services need to stay enabled until a user is confident that they will not need to downgrade to < v1.5 anymore. Disabling and then enabling legacy services is not possible without breaking the established connections from
--conntrack-garbage-collector-intervalhas been deprecated. Please use the option
--conntrack-gc-intervalwhich parses a duration as string instead of a integer in seconds. Support for the deprecated option will be removed in 1.6.
legacy-host-allows-worldoption is now removed as planned.
monitor-aggregation-level: Superseded by
ct-global-max-entries-tcp: Superseded by
ct-global-max-entries-other: Superseded by
cilium-metrics-configConfigMap is superseded by
--auto-ipv6-node-routeswas removed as planned. Use
1.4 Upgrade Notes¶
Upgrading from >=1.2.5 to 1.4.y¶
- Follow the standard procedures to perform the upgrade as described in Upgrading Minor Versions.
Changes that may require action¶
--serveoption was removed from cilium-bugtool in favor of a much reduced binary size. If you want to continue using the option, please use an older version of the cilium-bugtool binary.
- The DNS Polling option used by
toFQDNs.matchNamerules is disabled by default in 1.4.x due to limitations in the implementation. It has been replaced by DNS Proxy support, which must be explicitly enabled via changes to the policy described below. To ease upgrade, users may opt to enable the DNS Polling in v1.4.x by adding the
--tofqdns-enable-polleroption to cilium-agent without changing policies. For instructions on how to safely upgrade see Upgrading DNS Polling deployments to DNS Proxy.
- The DaemonSet now uses
dnsPolicy: ClusterFirstWithHostNetin order for Cilium to look up Kubernetes service names via DNS. This in turn requires the cluster to run a cluster DNS such as kube-dns or CoreDNS. If you are not running cluster DNS, remove the
dnsPolicyfield. This will mean that you cannot use the etcd-operator. More details can be found in the kube-dns section.
New ConfigMap Options¶
true, all endpoints are allocated an IPv4 address.
true, all endpoints are allocated an IPv6 address.
true, reduce per-packet latency at the expense of up-front memory allocation for entries in BPF maps. If this value is modified, then during the next Cilium startup the restore of existing endpoints and tracking of ongoing connections may be disrupted. This may lead to policy drops or a change in loadbalancing decisions for a connection for some time. Endpoints may need to be recreated to restore connectivity. If this option is set to
falseduring an upgrade to 1.4.0 or later, then it may cause one-time disruptions during the upgrade.
true, then enable automatic L2 routing between nodes. This is useful when running in direct routing mode and can be used as an alternative to running a routing daemon. Routes to other Cilium managed nodes will then be installed on automatically.
install-iptables-rules: If set to
falsethen Cilium will not install any iptables rules which are mainly for interaction with kube-proxy. By default it is set to
masquerade: The agent can optionally be set up for masquerading all network traffic leaving the main networking device if
masqueradeis set to
true. By default it is set to
datapath-mode: Cilium can operate in two different datapath modes, that is, either based upon
vethdevices (default) or
ipvlandevices (beta). The latter requires an additional setting to specify the ipvlan master device.
- New ipvlan-specific CNI integration mode options (beta):
ipvlan-master-device: When running Cilium in ipvlan datapath mode, an ipvlan master device must be specified. This is typically pointing to a networking device that is facing the external network. Be aware that this will be used by all nodes, so it is required that the device name is consistent on all nodes where this is going to be deployed.
- New flannel-specific CNI integration mode options (beta):
flannel-master-device: When running Cilium with policy enforcement enabled on top of Flannel, the BPF programs will be installed on the network interface specified in this option and on each network interface belonging to a pod.
flannel-master-deviceis specified, this determines whether Cilium should remove BPF programs from the master device and interfaces belonging to pods when the Cilium
DaemonSetis deleted. If true, Cilium will remove programs from the pods.
flannel-manage-existing-containers: On startup, install a BPF programs to allow for policy enforcement on pods that are currently managed by Flannel. This also requires the Cilium
DaemonSetto be running with
hostPID: true, which is not enabled by default.
Deprecated ConfigMap Options¶
disable-ipv4: Superseded by
enable-ipv4, with the logic reversed.
legacy-host-allows-world: This option allowed users to specify Cilium 1.0-style policies that treated traffic that is masqueraded from the outside world as though it arrived from the local host. As of Cilium 1.4, the option is disabled by default if not specified in the ConfigMap, and the option is scheduled to be removed in Cilium 1.5 or later.
1.3 Upgrade Notes¶
Upgrading from 1.2.x to 1.3.y¶
- If you are running Cilium 1.0.x or 1.1.x, please upgrade to 1.2.x first. It is also possible to upgrade from 1.0 or 1.1 directly to 1.3 by combining the upgrade instructions for each minor release. See the documentation for said releases for further information.
- Upgrade to Cilium
1.2.4or later using the guide Upgrading Micro Versions.
- Follow the standard procedures to perform the upgrade as described in Upgrading Minor Versions.
New ConfigMap Options¶
ct-global-max-entries-tcp/ct-global-max-entries-other:Specifies the maximum number of connections supported across all endpoints, split by protocol: tcp or other. One pair of maps uses these values for IPv4 connections, and another pair of maps use these values for IPv6 connections. If these values are modified, then during the next Cilium startup the tracking of ongoing connections may be disrupted. This may lead to brief policy drops or a change in loadbalancing decisions for a connection.
clean-cilium-bpf-state: Similar to
clean-cilium-statebut only cleans the BPF state while preserving all other state. Endpoints will still be restored and IP allocations will prevail but all datapath state is cleaned when Cilium starts up. Not required for normal operation.
Upgrades are designed to have minimal impact on your running deployment. Networking connectivity, policy enforcement and load balancing will remain functional in general. The following is a list of operations that will not be available during the upgrade:
- API aware policy rules are enforced in user space proxies and are currently running as part of the Cilium pod unless Cilium is configured to run in Istio mode. Upgrading Cilium will cause the proxy to restart which will result in a connectivity outage and connection to be reset.
- Existing policy will remain effective but implementation of new policy rules will be postponed to after the upgrade has been completed on a particular node.
- Monitoring components such as
cilium monitorwill experience a brief outage while the Cilium pod is restarting. Events are queued up and read after the upgrade. If the number of events exceeds the event buffer size, events will be lost.
Upgrading a ConfigMap¶
This section describes the procedure to upgrade an existing
ConfigMap to the
template of another version.
Export the current ConfigMap¶
$ kubectl get configmap -n kube-system cilium-config -o yaml --export > cilium-cm-old.yaml $ cat ./cilium-cm-old.yaml apiVersion: v1 data: clean-cilium-state: "false" debug: "true" disable-ipv4: "false" etcd-config: |- --- endpoints: - https://192.168.33.11:2379 # # In case you want to use TLS in etcd, uncomment the 'ca-file' line # and create a kubernetes secret by following the tutorial in # https://cilium.link/etcd-config ca-file: '/var/lib/etcd-secrets/etcd-client-ca.crt' # # In case you want client to server authentication, uncomment the following # lines and add the certificate and key in cilium-etcd-secrets below key-file: '/var/lib/etcd-secrets/etcd-client.key' cert-file: '/var/lib/etcd-secrets/etcd-client.crt' kind: ConfigMap metadata: creationTimestamp: null name: cilium-config selfLink: /api/v1/namespaces/kube-system/configmaps/cilium-config
Download the ConfigMap with the changes for v1.5¶
Verify its contents:
--- apiVersion: v1 kind: ConfigMap metadata: name: cilium-config namespace: kube-system data: # This etcd-config contains the etcd endpoints of your cluster. If you use # TLS please make sure you follow the tutorial in https://cilium.link/etcd-config etcd-config: |- --- endpoints: - https://cilium-etcd-client.kube-system.svc:2379 # # In case you want to use TLS in etcd, uncomment the 'ca-file' line # and create a kubernetes secret by following the tutorial in # https://cilium.link/etcd-config ca-file: '/var/lib/etcd-secrets/etcd-client-ca.crt' # # In case you want client to server authentication, uncomment the following # lines and create a kubernetes secret by following the tutorial in # https://cilium.link/etcd-config key-file: '/var/lib/etcd-secrets/etcd-client.key' cert-file: '/var/lib/etcd-secrets/etcd-client.crt' # If you want to run cilium in debug mode change this value to true debug: "false" # If you want metrics enabled in all of your Cilium agents, set the port for # which the Cilium agents will have their metrics exposed. # This option deprecates the "prometheus-serve-addr" in the # "cilium-metrics-config" ConfigMap # NOTE that this will open the port on ALL nodes where Cilium pods are # scheduled. # prometheus-serve-addr: ":9090" # Enable IPv4 addressing. If enabled, all endpoints are allocated an IPv4 # address. enable-ipv4: "true" # Enable IPv6 addressing. If enabled, all endpoints are allocated an IPv6 # address. enable-ipv6: "false" # If a serious issue occurs during Cilium startup, this # invasive option may be set to true to remove all persistent # state. Endpoints will not be restored using knowledge from a # prior Cilium run, so they may receive new IP addresses upon # restart. This also triggers clean-cilium-bpf-state. clean-cilium-state: "false" # If you want to clean cilium BPF state, set this to true; # Removes all BPF maps from the filesystem. Upon restart, # endpoints are restored with the same IP addresses, however # any ongoing connections may be disrupted briefly. # Loadbalancing decisions will be reset, so any ongoing # connections via a service may be loadbalanced to a different # backend after restart. clean-cilium-bpf-state: "false" # Users who wish to specify their own custom CNI configuration file must set # custom-cni-conf to "true", otherwise Cilium may overwrite the configuration. custom-cni-conf: "false" # If you want cilium monitor to aggregate tracing for packets, set this level # to "low", "medium", or "maximum". The higher the level, the less packets # that will be seen in monitor output. monitor-aggregation: "none" # ct-global-max-entries-* specifies the maximum number of connections # supported across all endpoints, split by protocol: tcp or other. One pair # of maps uses these values for IPv4 connections, and another pair of maps # use these values for IPv6 connections. # # If these values are modified, then during the next Cilium startup the # tracking of ongoing connections may be disrupted. This may lead to brief # policy drops or a change in loadbalancing decisions for a connection. # # For users upgrading from Cilium 1.2 or earlier, to minimize disruption # during the upgrade process, comment out these options. bpf-ct-global-tcp-max: "524288" bpf-ct-global-any-max: "262144" # Pre-allocation of map entries allows per-packet latency to be reduced, at # the expense of up-front memory allocation for the entries in the maps. The # default value below will minimize memory usage in the default installation; # users who are sensitive to latency may consider setting this to "true". # # This option was introduced in Cilium 1.4. Cilium 1.3 and earlier ignore # this option and behave as though it is set to "true". # # If this value is modified, then during the next Cilium startup the restore # of existing endpoints and tracking of ongoing connections may be disrupted. # This may lead to policy drops or a change in loadbalancing decisions for a # connection for some time. Endpoints may need to be recreated to restore # connectivity. # # If this option is set to "false" during an upgrade from 1.3 or earlier to # 1.4 or later, then it may cause one-time disruptions during the upgrade. preallocate-bpf-maps: "false" # Regular expression matching compatible Istio sidecar istio-proxy # container image names sidecar-istio-proxy-image: "cilium/istio_proxy" # Encapsulation mode for communication between nodes # Possible values: # - disabled # - vxlan (default) # - geneve tunnel: "vxlan" # Name of the cluster. Only relevant when building a mesh of clusters. cluster-name: default # Unique ID of the cluster. Must be unique across all conneted clusters and # in the range of 1 and 255. Only relevant when building a mesh of clusters. #cluster-id: 1 # Interface to be used when running Cilium on top of a CNI plugin. # For flannel, use "cni0" flannel-master-device: "" # When running Cilium with policy enforcement enabled on top of a CNI plugin # the BPF programs will be installed on the network interface specified in # 'flannel-master-device' and on all network interfaces belonging to # a container. When the Cilium DaemonSet is removed, the BPF programs will # be kept in the interfaces unless this option is set to "true". flannel-uninstall-on-exit: "false" # Installs a BPF program to allow for policy enforcement in already running # containers managed by Flannel. # NOTE: This requires Cilium DaemonSet to be running in the hostPID. # To run in this mode in Kubernetes change the value of the hostPID from # false to true. Can be found under the path `spec.spec.hostPID` flannel-manage-existing-containers: "false" # DNS Polling periodically issues a DNS lookup for each `matchName` from # cilium-agent. The result is used to regenerate endpoint policy. # DNS lookups are repeated with an interval of 5 seconds, and are made for # A(IPv4) and AAAA(IPv6) addresses. Should a lookup fail, the most recent IP # data is used instead. An IP change will trigger a regeneration of the Cilium # policy for each endpoint and increment the per cilium-agent policy # repository revision. # # This option is disabled by default starting from version 1.4.x in favor # of a more powerful DNS proxy-based implementation, see  for details. # Enable this option if you want to use FQDN policies but do not want to use # the DNS proxy. # # To ease upgrade, users may opt to set this option to "true". # Otherwise please refer to the Upgrade Guide  which explains how to # prepare policy rules for upgrade. # #  http://docs.cilium.io/en/stable/policy/language/#dns-based #  http://docs.cilium.io/en/stable/install/upgrade/#changes-that-may-require-action tofqdns-enable-poller: "false" # wait-bpf-mount makes init container wait until bpf filesystem is mounted wait-bpf-mount: "false" # Enable legacy services (prior v1.5) to prevent from terminating existing # connections with services when upgrading Cilium from < v1.5 to v1.5. enable-legacy-services: "false"
Add new options¶
Add the new options manually to your old
ConfigMap, and make the necessary
In this example, the
debug option is meant to be kept with
etcd-config is kept unchanged, and
legacy-host-allows-world is a new
option, but after reading the Version Specific Notes the value was kept unchanged
from the default value.
After making the necessary changes, the old
ConfigMap was migrated with the
new options while keeping the configuration that we wanted:
$ cat ./cilium-cm-old.yaml apiVersion: v1 data: debug: "true" disable-ipv4: "false" # If you want to clean cilium state; change this value to true clean-cilium-state: "false" legacy-host-allows-world: "false" etcd-config: |- --- endpoints: - https://192.168.33.11:2379 # # In case you want to use TLS in etcd, uncomment the 'ca-file' line # and create a kubernetes secret by following the tutorial in # https://cilium.link/etcd-config ca-file: '/var/lib/etcd-secrets/etcd-client-ca.crt' # # In case you want client to server authentication, uncomment the following # lines and add the certificate and key in cilium-etcd-secrets below key-file: '/var/lib/etcd-secrets/etcd-client.key' cert-file: '/var/lib/etcd-secrets/etcd-client.crt' kind: ConfigMap metadata: creationTimestamp: null name: cilium-config selfLink: /api/v1/namespaces/kube-system/configmaps/cilium-config
Apply new ConfigMap¶
After adding the options, manually save the file with your changes and install
ConfigMap in the
kube-system namespace of your cluster.
$ kubectl apply -n kube-system -f ./cilium-cm-old.yaml
Restrictions on unique prefix lengths for CIDR policy rules¶
The Linux kernel applies limitations on the complexity of BPF code that is loaded into the kernel so that the code may be verified as safe to execute on packets. Over time, Linux releases become more intelligent about the verification of programs which allows more complex programs to be loaded. However, the complexity limitations affect some features in Cilium depending on the kernel version that is used with Cilium.
One such limitation affects Cilium’s configuration of CIDR policies. On Linux kernels 4.10 and earlier, this manifests as a restriction on the number of unique prefix lengths supported in CIDR policy rules.
Unique prefix lengths are counted by looking at the prefix portion of CIDR
rules and considering which prefix lengths are unique. For example, in the
following policy example, the
toCIDR section specifies a
/32, and the
toCIDRSet section specifies a
/8 with a
/12 removed from it. In
addition, three prefix lengths are always counted: the host prefix length for
the protocol (IPv4:
/128), the default prefix length
/0), and the cluster prefix length (default IPv4:
All in all, the following example counts as seven unique prefix lengths in IPv4:
toCIDR, also from host prefix)
/8(from cluster prefix)
/0(from default prefix)
- Any version of Cilium running on Linux 4.10 or earlier
When a CIDR policy with too many unique prefix lengths is imported, Cilium will reject the policy with a message like the following:
The supported count of unique prefix lengths may differ between Cilium minor releases, for example Cilium 1.1 supports 20 unique prefix lengths on Linux 4.10 or older, while Cilium 1.2 only supports 18 (for IPv4) or 4 (for IPv6).
Users may construct CIDR policies that use fewer unique prefix lengths. This can be achieved by composing or decomposing adjacent prefixes.
Upgrade the host Linux version to 4.11 or later. This step is beyond the scope of the Cilium guide.
Upgrading DNS Polling deployments to DNS Proxy (preferred)¶
In cilium versions 1.2 and 1.3 DNS Polling was automatically used to
obtain IP information for use in
toFQDNs.matchName rules in DNS based
Cilium 1.4 and later have switched to a DNS Proxy scheme - the
DNS Polling behaviour may be enabled via the a CLI option - and expect a
pod to make a DNS request that can be intercepted. Existing pods may have
already-cached DNS lookups that the proxy cannot intercept and thus cilium will
block these on upgrade. New connections with DNS requests that can be
intercepted will be allowed per-policy without special action.
Cilium deployments already configured with DNS Proxy rules are not
impacted and will retain DNS data when restarted or upgraded.
Deployments that require a seamless transition to DNS Proxy
may use Running a pre-flight DaemonSet to create a copy of DNS information on each cilium
node for use by the upgraded cilium-agent at startup. This data is used to
allow L3 connections (via
rules) without a DNS request from pods.
Running a pre-flight DaemonSet accomplishes this via the
--tofqdns-pre-cache CLI option,
which reads DNS cache data for use on startup.
DNS data obtained via polling must be recorded for use on startup and rules added to intercept DNS lookups. The steps are split into a section on seamlessly upgrading DNS Polling and then further beginning to intercept DNS data via a DNS Proxy.
Policy rules may be prepared to use the DNS Proxy before an
upgrade to 1.4. The new policy rule fields
toPorts.rules.dns.matchName/matchPattern will be ignored by older cilium
versions and can be safely implemented prior to an upgrade.
The following example allows DNS access to
kube-dns via the DNS Proxy and allows all DNS requests to
kube-dns. For completeness,
toFQDNs rules are included for examples of the syntax for those L3 policies
as well. Existing
toFQDNs rules do not need to be modified but will now use
IPs seen by DNS requests and allowed by the
Upgrade steps - DNS Polling¶
- Set the
tofqdns-enable-pollerfield to true in the cilium ConfigMap used in the upgrade. Alternatively, pass
--tofqdns-enable-poller=trueto the upgraded cilium-agent.
tofqdns-pre-cache: "/var/run/cilium/dns-precache-upgrade.json"to the ConfigMap. Alternatively, pass
- Deploy the cilium Running a pre-flight DaemonSet helper. This will download the cilium
container image and also create DNS pre-cache data at
/var/run/cilium/dns-precache-upgrade.json. This data will have a TTL of 1 week.
- Deploy the new cilium DaemonSet
- (optional) Remove
tofqdns-pre-cache: "/var/run/cilium/dns-precache-upgrade.json"from the cilium ConfigMap. The data will automatically age-out after 1 week.
Conversion steps - DNS Proxy (preferred)¶
- Update existing policies to intercept DNS requests. See DNS Policy and IP Discovery or the example above
- Allow pods to make DNS requests to populate the cilium-agent cache. To check
which exact queries are in the DNS cache and when they will expire use
cilium fqdn cache list
- Set the
tofqdns-enable-pollerfield to false in the cilium ConfigMap
- Restart the cilium pods with the new ConfigMap. They will restore Endpoint policy with DNS information from intercepted DNS requests stored in the cache