Setting up Cluster Mesh¶
This is a step-by-step guide on how to build a mesh of Kubernetes clusters by connecting them together, enabling pod-to-pod connectivity across all clusters, define global services to load-balance between clusters and enforce security policies to restrict access.
- PodCIDR ranges in all clusters must be non-conflicting.
- This guide and the referenced scripts assume that Cilium was installed using the Standard Installation instructions which leads to etcd being managed by Cilium using etcd-operator. You can use any way to manage etcd but you will have to adjust some of the scripts to account for different secret names and adjust the LoadBalancer to expose the etcd pods.
- Nodes in all clusters must have IP connectivity between each other. This requirement is typically met by establishing peering or VPN tunnels between the networks of the nodes of each cluster.
- All nodes must have a unique IP address assigned them. Node IPs of clusters being connected together may not conflict with each other.
- Cilium must be configured to use etcd as the kvstore. Consul is not supported by cluster mesh at this point.
- It is highly recommended to use a TLS protected etcd cluster with Cilium. The
server certificate of etcd must whitelist the host name
*.mesh.cilium.io. If you are using the
cilium-etcd-operatoras set up in the Standard Installation instructions then this is automatically taken care of.
- The network between clusters must allow the inter-cluster communication. The exact ports are documented in the Firewall Rules section.
Prepare the clusters¶
Specify the cluster name and ID¶
Each cluster must be assigned a unique human-readable name. The name will be
used to group nodes of a cluster together. The cluster name is specified with
--cluster-name=NAME argument or
cluster-name ConfigMap option.
To ensure scalability of identity allocation and policy enforcement, each
cluster continues to manage its own security identity allocation. In order to
guarantee compatibility with identities across clusters, each cluster is
configured with a unique cluster ID configured with the
cluster-id ConfigMap option. The value must be between 1 and
kubectl -n kube-system edit cm cilium-config [ ... add/edit ... ] cluster-name: cluster1 cluster-id: "1"
Repeat this step for each cluster.
Expose the Cilium etcd to other clusters¶
The Cilium etcd must be exposed to other clusters. There are many ways to
achieve this. The method documented in this guide will work with cloud
providers that implement the Kubernetes
LoadBalancer service type:
The example used here exposes the etcd cluster as managed by
cilium-etcd-operator installed by the standard installation instructions as
an internal service which means that it is only exposed inside of a VPC and not
publicly accessible outside of the VPC. It is recommended to use a static IP
for the ServiceIP to avoid requiring to update the IP mapping as done in one of
the later steps.
If you are running the cilium-etcd-operator you can simply apply the following service to expose etcd:
Make sure that you create the service in namespace in which cilium and/or
etcd is running. Depending on which installation method you chose, this
Extract the TLS keys and generate the etcd configuration¶
The cluster mesh control plane performs TLS based authentication and encryption. For this purpose, the TLS keys and certificates of each etcd need to be made available to all clusters that wish to connect.
cilium/clustermesh-toolsrepository. It contains scripts to extracts the secrets and generate a Kubernetes secret in form of a YAML file:
git clone https://github.com/cilium/clustermesh-tools.git cd clustermesh-tools
Ensure that the kubectl context is pointing to the cluster you want to extract the secret from.
Extract the TLS certificate, key and root CA authority.
This will extract the keys that Cilium is using to connect to the etcd in the local cluster. The key files are written to
Repeat this step for all clusters you want to connect with each other.
Generate a single Kubernetes secret from all the keys and certificates extracted. The secret will contain the etcd configuration with the service IP or host name of the etcd including the keys and certificates to access it.
./generate-secret-yaml.sh > clustermesh.yaml
The key files in
config/ and the secret represented as YAML are
sensitive. Anyone gaining access to these files is able to connect to the
etcd instances in the local cluster. Delete the files after the you are done
setting up the cluster mesh.
Ensure that the etcd service names can be resolved¶
For TLS authentication to work properly, agents will connect to etcd in remote
clusters using a pre-defined naming schema
order for DNS resolution to work on these virtual host name, the names are
statically mapped to the service IP via the
The following script will generate the required segment which has to be inserted into the
./generate-name-mapping.sh > ds.patch
ds.patchwill look something like this:
spec: template: spec: hostAliases: - ip: "10.138.0.18" hostnames: - cluster1.mesh.cilium.io - ip: "10.138.0.19" hostnames: - cluster2.mesh.cilium.io
Apply the patch to all DaemonSets in all clusters:
kubectl -n kube-system patch ds cilium -p "$(cat ds.patch)"
Establish connections between clusters¶
1. Import the
cilium-clustermesh secret that you generated in the last
chapter into all of your clusters:
kubectl apply -f clustermesh.yaml
- Restart the cilium-agent in all clusters so it picks up the new cluster
name, cluster id and mounts the
cilium-clustermeshsecret. Cilium will automatically establish connectivity between the clusters.
kubectl -n kube-system delete pod -l k8s-app=cilium
Test pod connectivity between clusters¶
cilium node list to see the full list of nodes discovered. You can run
this command inside any Cilium pod in any cluster:
$ kubectl -n kube-system exec -ti cilium-g6btl cilium node list Name IPv4 Address Endpoint CIDR IPv6 Address Endpoint CIDR cluster5/ip-172-0-117-60.us-west-2.compute.internal 22.214.171.124 10.2.2.0/24 <nil> f00d::a02:200:0:0/112 cluster5/ip-172-0-186-231.us-west-2.compute.internal 126.96.36.199 10.2.3.0/24 <nil> f00d::a02:300:0:0/112 cluster5/ip-172-0-50-227.us-west-2.compute.internal 188.8.131.52 10.2.0.0/24 <nil> f00d::a02:0:0:0/112 cluster5/ip-172-0-51-175.us-west-2.compute.internal 184.108.40.206 10.2.1.0/24 <nil> f00d::a02:100:0:0/112 cluster7/ip-172-0-121-242.us-west-2.compute.internal 220.127.116.11 10.4.2.0/24 <nil> f00d::a04:200:0:0/112 cluster7/ip-172-0-58-194.us-west-2.compute.internal 18.104.22.168 10.4.1.0/24 <nil> f00d::a04:100:0:0/112 cluster7/ip-172-0-60-118.us-west-2.compute.internal 22.214.171.124 10.4.0.0/24 <nil> f00d::a04:0:0:0/112
$ kubectl exec -ti pod-cluster5-xxx curl <pod-ip-cluster7> [...]
Load-balancing with Global Services¶
Establishing load-balancing between clusters is achieved by defining a
Kubernetes service with identical name and namespace in each cluster and adding
io.cilium/global-service: "true"` to declare it global.
Cilium will automatically perform load-balancing to pods in both clusters.
apiVersion: v1 kind: Service metadata: name: rebel-base annotations: io.cilium/global-service: "true" spec: type: ClusterIP ports: - port: 80 selector: name: rebel-base
Deploying a simple example service¶
In cluster 1, deploy:
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/clustermesh/global-service-example/cluster1.yaml
In cluster 2, deploy:
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/clustermesh/global-service-example/cluster2.yaml
From either cluster, access the global service:
kubectl exec -ti xwing-xxx -- curl rebel-base
You will see replies from pods in both clusters.
As addressing and network security is decoupled, network security enforcement
automatically spans across clusters. Note that Kubernetes security policies are
not automatically distributed across clusters, it is your responsibility to
NetworkPolicy in all clusters.
Allowing specific communication between clusters¶
The following policy illustrates how to allow particular pods to allow
communicate between two clusters. The cluster name refers to the name given via
--cluster-name agent option or
cluster-name ConfigMap option.
apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy metadata: name: "allow-cross-cluster" description: "Allow x-wing in cluster1 to contact rebel-base in cluster2" spec: endpointSelector: matchLabels: name: x-wing io.cilium.k8s.policy.cluster: cluster1 egress: - toEndpoints: - matchLabels: name: rebel-base io.cilium.k8s.policy.cluster: cluster2
Check the Cilium pod logs for
Look for the following message to ensure that ClusterMesh is initialized:level=info msg="Initializing ClusterMesh routing" path=/var/lib/cilium/clustermesh/ subsys=daemon
As remote clusters are discovered an info log message
New remote cluster discoveredalong with the remote cluster name is logged.
kubectl exec -ti [...] bashin one of the Cilium pods and check the contents of the directory
/var/lib/cilium/clustermesh/. It must contain a configuration file for each remote cluster along with all the required SSL certificates and keys. The filenames must match the cluster names as provided by the
cluster-nameConfigMap option. If the directory is empty or incomplete, regenerate the secret again and ensure that the secret is correctly mounted into the DaemonSet.
cilium node listin one of the Cilium pods and validate that all nodes are discovered correctly. If the node discovery is not working, runcilium kvstore get --recursive cilium/state/nodes/v1/
and check if an entry exists for each node.
When using global services, ensure that the
cilium-operatordeployment is running and healthy. It is responsible to propagate Kubernetes services into the kvstore. You can validate the correct functionality of the operator by running:cilium kvstore get --recursive cilium/state/services/v1/
An entry must exist for each global service
You can confirm the correct propagation of service backend endpoints by running
cilium service listand
cilium bpf lb listto see the complete mapping of service IPs to backend/pod IPs.
cilium debuginfoand look for the section “k8s-service-cache”. In that section, you will find the contents of the service correlation cache. it will list the Kubernetes services and endpoints of the local cluster. It will also have a section
externalEndpointswhich must list all endpoints of remote clusters.
- L7 security policies currently only work across multiple clusters if worker nodes have routes installed allowing to route pod IPs of all clusters. This is given when running in direct routing mode by running a routing daemon or
--auto-direct-node-routesbut won’t work automatically when using tunnel/encapsulation mode.
- The number of clusters that can be connected together is currently limited to 255. This limitation will be lifted in the future when running in direct routing mode or when running in encapsulation mode with encryption enabled.
- Future versions will put an API server before etcd to provide better scalability and simplify the installation to support any etcd support
- Introduction of IPSec and use of ESP or utilization of the traffic class field in the IPv6 header will allow to use more than 8 bits for the cluster-id and thus support more than 256 clusters.