Installation on Microsoft Azure Cloud (beta)¶
This guide explains how to configure Cilium in Azure Cloud to use Azure IPAM (beta).
Note
This is a beta feature. See the Limitations and Troubleshooting sections if you face any issues. Please provide feedback and file a GitHub issue if you experience any problems.
Create an Azure Kubernetes cluster¶
Setup a Kubernetes cluster on Azure. You can use any method available as long as your Kubernetes cluster has CNI enabled in the kubelet configuration. For simplicity of this guide, we will set up a managed AKS cluster:
Note
Do NOT specify the ‘–network-policy’ flag when creating the cluster, as this will cause the Azure CNI plugin to push down unwanted iptables rules:
export RESOURCE_GROUP_NAME=aks-test
export CLUSTER_NAME=aks-test
export LOCATION=westeurope
az group create --name $RESOURCE_GROUP_NAME --location $LOCATION
az aks create \
--resource-group $RESOURCE_GROUP_NAME \
--name $CLUSTER_NAME \
--location $LOCATION \
--node-count 2 \
--network-plugin azure
Note
When setting up AKS, it is important to use the flag
--network-plugin azure
to ensure that CNI mode is enabled.
Create a service principal for cilium-operator¶
In order to allow cilium-operator to interact with the Azure API, a service principal is required. You can reuse an existing service principal if you want but it is recommended to create a dedicated service principal for cilium-operator:
az ad sp create-for-rbac --name cilium-operator > azure-sp.json
The contents of azure-sp.json
should look like this:
{
"appId": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
"displayName": "cilium-operator",
"name": "http://cilium-operator",
"password": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb",
"tenant": "cccccccc-cccc-cccc-cccc-cccccccccccc"
}
Extract the relevant credentials to access the Azure API:
AZURE_SUBSCRIPTION_ID="$(az account show | jq -r .id)"
AZURE_CLIENT_ID="$(jq -r .appId < azure-sp.json)"
AZURE_CLIENT_SECRET="$(jq -r .password < azure-sp.json)"
AZURE_TENANT_ID="$(jq -r .tenant < azure-sp.json)"
AZURE_NODE_RESOURCE_GROUP="$(az aks show --resource-group $RESOURCE_GROUP_NAME --name $CLUSTER_NAME | jq -r .nodeResourceGroup)"
Note
AZURE_NODE_RESOURCE_GROUP
must be set to the resource group of the
node pool, not the resource group of the AKS cluster.
Retrieve Credentials to access cluster¶
az aks get-credentials --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP_NAME
Deploy Cilium¶
Note
First, make sure you have Helm 3 installed.
If you have (or planning to have) Helm 2 charts (and Tiller) in the same cluster, there should be no issue as both version are mutually compatible in order to support gradual migration. Cilium chart is targeting Helm 3 (v3.0.3 and above).
Setup Helm repository:
helm repo add cilium https://helm.cilium.io/
Deploy Cilium release via Helm:
helm install cilium cilium/cilium --version 1.8.7 \ --namespace kube-system \ --set global.azure.enabled=true \ --set global.azure.resourceGroup=$AZURE_NODE_RESOURCE_GROUP \ --set global.azure.subscriptionID=$AZURE_SUBSCRIPTION_ID \ --set global.azure.tenantID=$AZURE_TENANT_ID \ --set global.azure.clientID=$AZURE_CLIENT_ID \ --set global.azure.clientSecret=$AZURE_CLIENT_SECRET \ --set global.tunnel=disabled \ --set config.ipam=azure \ --set global.masquerade=false \ --set global.nodeinit.enabled=true
Restart unmanaged Pods¶
If you did not use the nodeinit.restartPods=true
in the Helm options when
deploying Cilium, then unmanaged pods need to be restarted manually. Restart
all already running pods which are not running in host-networking mode to
ensure that Cilium starts managing them. This is required to ensure that all
pods which have been running before Cilium was deployed have network
connectivity provided by Cilium and NetworkPolicy applies to them:
kubectl get pods --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,HOSTNETWORK:.spec.hostNetwork --no-headers=true | grep '<none>' | awk '{print "-n "$1" "$2}' | xargs -L 1 -r kubectl delete pod
pod "event-exporter-v0.2.3-f9c896d75-cbvcz" deleted
pod "fluentd-gcp-scaler-69d79984cb-nfwwk" deleted
pod "heapster-v1.6.0-beta.1-56d5d5d87f-qw8pv" deleted
pod "kube-dns-5f8689dbc9-2nzft" deleted
pod "kube-dns-5f8689dbc9-j7x5f" deleted
pod "kube-dns-autoscaler-76fcd5f658-22r72" deleted
pod "kube-state-metrics-7d9774bbd5-n6m5k" deleted
pod "l7-default-backend-6f8697844f-d2rq2" deleted
pod "metrics-server-v0.3.1-54699c9cc8-7l5w2" deleted
Note
This may error out on macOS due to -r
being unsupported by
xargs
. In this case you can safely run this command without -r
with the symptom that this will hang if there are no pods to
restart. You can stop this with ctrl-c
.
Validate the Installation¶
You can monitor as Cilium and all required components are being installed:
kubectl -n kube-system get pods --watch
NAME READY STATUS RESTARTS AGE
cilium-operator-cb4578bc5-q52qk 0/1 Pending 0 8s
cilium-s8w5m 0/1 PodInitializing 0 7s
coredns-86c58d9df4-4g7dd 0/1 ContainerCreating 0 8m57s
coredns-86c58d9df4-4l6b2 0/1 ContainerCreating 0 8m57s
It may take a couple of minutes for all components to come up:
cilium-operator-cb4578bc5-q52qk 1/1 Running 0 4m13s
cilium-s8w5m 1/1 Running 0 4m12s
coredns-86c58d9df4-4g7dd 1/1 Running 0 13m
coredns-86c58d9df4-4l6b2 1/1 Running 0 13m
Deploy the connectivity test¶
You can deploy the “connectivity-check” to test connectivity between pods. It is recommended to create a separate namespace for this.
kubectl create ns cilium-test
Deploy the check with:
kubectl apply -n cilium-test -f https://raw.githubusercontent.com/cilium/cilium/v1.8/examples/kubernetes/connectivity-check/connectivity-check.yaml
It will deploy a series of deployments which will use various connectivity paths to connect to each other. Connectivity paths include with and without service load-balancing and various network policy combinations. The pod name indicates the connectivity variant and the readiness and liveness gate indicates success or failure of the test:
$ kubectl get pods -n cilium-test
NAME READY STATUS RESTARTS AGE
echo-a-6788c799fd-42qxx 1/1 Running 0 69s
echo-b-59757679d4-pjtdl 1/1 Running 0 69s
echo-b-host-f86bd784d-wnh4v 1/1 Running 0 68s
host-to-b-multi-node-clusterip-585db65b4d-x74nz 1/1 Running 0 68s
host-to-b-multi-node-headless-77c64bc7d8-kgf8p 1/1 Running 0 67s
pod-to-a-allowed-cnp-87b5895c8-bfw4x 1/1 Running 0 68s
pod-to-a-b76ddb6b4-2v4kb 1/1 Running 0 68s
pod-to-a-denied-cnp-677d9f567b-kkjp4 1/1 Running 0 68s
pod-to-b-intra-node-nodeport-8484fb6d89-bwj8q 1/1 Running 0 68s
pod-to-b-multi-node-clusterip-f7655dbc8-h5bwk 1/1 Running 0 68s
pod-to-b-multi-node-headless-5fd98b9648-5bjj8 1/1 Running 0 68s
pod-to-b-multi-node-nodeport-74bd8d7bd5-kmfmm 1/1 Running 0 68s
pod-to-external-1111-7489c7c46d-jhtkr 1/1 Running 0 68s
pod-to-external-fqdn-allow-google-cnp-b7b6bcdcb-97p75 1/1 Running 0 68s
Note
If you deploy the connectivity check to a single node cluster, pods that check multi-node
functionalities will remain in the Pending
state. This is expected since these pods
need at least 2 nodes to be scheduled successfully.
Specify Environment Variables¶
Specify the namespace in which Cilium is installed as CILIUM_NAMESPACE
environment variable. Subsequent commands reference this environment variable.
export CILIUM_NAMESPACE=kube-system
Enable Hubble¶
Hubble is a fully distributed networking and security observability platform for cloud native workloads. It is built on top of Cilium and eBPF to enable deep visibility into the communication and behavior of services as well as the networking infrastructure in a completely transparent manner.
Hubble can be configured to be in local mode or distributed mode (beta).
In local mode, Hubble listens on a UNIX domain socket. You can connect to a Hubble instance by running
hubble
command from inside the Cilium pod. This provides networking visibility for traffic observed by the local Cilium agent.helm upgrade cilium cilium/cilium --version 1.8.7 \ --namespace $CILIUM_NAMESPACE \ --reuse-values \ --set global.hubble.enabled=true \ --set global.hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}"
In distributed mode (beta), Hubble listens on a TCP port on the host network. This allows Hubble Relay to communicate with all the Hubble instances in the cluster. Hubble CLI and Hubble UI in turn connect to Hubble Relay to provide cluster-wide networking visibility.
Warning
In Distributed mode, Hubble runs a gRPC service over plain-text HTTP on the host network without any authentication/authorization. The main consequence is that anybody who can reach the Hubble gRPC service can obtain all the networking metadata from the host. It is therefore strongly discouraged to enable distributed mode in a production environment.
helm upgrade cilium cilium/cilium --version 1.8.7 \ --namespace $CILIUM_NAMESPACE \ --reuse-values \ --set global.hubble.enabled=true \ --set global.hubble.listenAddress=":4244" \ --set global.hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}" \ --set global.hubble.relay.enabled=true \ --set global.hubble.ui.enabled=true
Restart the Cilium daemonset to allow Cilium agent to pick up the ConfigMap changes:
kubectl rollout restart -n $CILIUM_NAMESPACE ds/cilium
To pick one Cilium instance and validate that Hubble is properly configured to listen on a UNIX domain socket:
kubectl exec -n $CILIUM_NAMESPACE -t ds/cilium -- hubble observe
(Distributed mode only) To validate that Hubble Relay is running, install the
hubble
CLI:Download the latest hubble release:
export HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt) curl -LO "https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-amd64.tar.gz" curl -LO "https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-amd64.tar.gz.sha256sum" sha256sum --check hubble-linux-amd64.tar.gz.sha256sum tar zxf hubble-linux-amd64.tar.gz
and move the
hubble
CLI to a directory listed in the$PATH
environment variable. For example:sudo mv hubble /usr/local/bin
Download the latest hubble release:
export HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt) curl -LO "https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-darwin-amd64.tar.gz" curl -LO "https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-darwin-amd64.tar.gz.sha256sum" shasum -a 256 -c hubble-darwin-amd64.tar.gz.sha256sum tar zxf hubble-darwin-amd64.tar.gz
and move the
hubble
CLI to a directory listed in the$PATH
environment variable. For example:sudo mv hubble /usr/local/bin
Download the latest hubble release:
curl -LO "https://raw.githubusercontent.com/cilium/hubble/master/stable.txt" set /p HUBBLE_VERSION=<stable.txt curl -LO "https://github.com/cilium/hubble/releases/download/%HUBBLE_VERSION%/hubble-windows-amd64.tar.gz" curl -LO "https://github.com/cilium/hubble/releases/download/%HUBBLE_VERSION%/hubble-windows-amd64.tar.gz.sha256sum" certutil -hashfile hubble-windows-amd64.tar.gz SHA256 type hubble-windows-amd64.tar.gz.sha256sum :: verify that the checksum from the two commands above match tar zxf hubble-windows-amd64.tar.gz
and move the
hubble.exe
CLI to a directory listed in the%PATH%
environment variable after extracting it from the tarball.Once the
hubble
CLI is installed, set up a port forwarding forhubble-relay
service and runhubble observe
command:kubectl port-forward -n $CILIUM_NAMESPACE svc/hubble-relay 4245:80 hubble observe --server localhost:4245
(For Linux / MacOS) For convenience, you may set and export the
HUBBLE_DEFAULT_SOCKET_PATH
environment variable:$ export HUBBLE_DEFAULT_SOCKET_PATH=localhost:4245
This will allow you to use
hubble status
andhubble observe
commands without having to specify the server address via the--server
flag.(Distributed mode only) To validate that Hubble UI is properly configured, set up a port forwarding for
hubble-ui
service:kubectl port-forward -n $CILIUM_NAMESPACE svc/hubble-ui 12000:80
and then open http://localhost:12000/.
Limitations¶
- All VMs and VM scale sets used in a cluster must belong to the same resource group.
Troubleshooting¶
- If
kubectl exec
to a pod fails to connect, restarting thetunnelfront
pod may help. - Pods may fail to gain a
.spec.hostNetwork
status even if restarted and managed by Cilium. - If some connectivity tests fail to reach the ready state you may need to restart the unmanaged pods again.
- Some connectivity tests may fail. This is being tracked in Cilium GitHub issue #12113.
hubble observe
may report one or more nodes being unavailable andhubble-ui
may fail to connect to the backends.