Installation on OpenShift OKD

OpenShift Requirements

  1. Choose preferred cloud provider. This guide was tested in AWS, Azure & GCP.
  2. Read OpenShift documentation to find out about provider-specific prerequisites.
  3. Get OpenShift Installer.

Note

It’s highly recommended to read the docs, unless you have installed OpenShift in the past. Here are a few notes that you may find useful.

  • with the AWS provider openshift-install will not work properly when MFA credentials are stored in ~/.aws/credentials, traditional credentials are required
  • with the Azure provider openshift-install will prompt for credentials and store them in ~/.azure/osServicePrincipal.json, it doesn’t simply pickup az login credentials. It’s recommended to setup a dedicated service principal and use it
  • with the GCP provider openshift-install will only work with a service account key, which has to be set using GOOGLE_CREDENTIALS environment variable (e.g. GOOGLE_CREDENTIALS=service-account.json). Follow Openshift Installer documentation to assign required roles to your service account.

Create an OpenShift OKD Cluster

First, set cluster name:

CLUSTER_NAME="cluster-1"

Now, create configuration files:

Note

The sample output below is showing the AWS provider, but it should work the same way with other providers.

$ openshift-install create install-config --dir "${CLUSTER_NAME}"
? SSH Public Key ~/.ssh/id_rsa.pub
? Platform aws
INFO Credentials loaded from default AWS environment variables
? Region eu-west-1
? Base Domain openshift-test-1.cilium.rocks
? Cluster Name cluster-1
? Pull Secret [? for help] **********************************

And set networkType: Cilium:

sed -i 's/networkType:\ OVNKubernetes/networkType:\ Cilium/' "${CLUSTER_NAME}/install-config.yaml"

Resulting configuration will look like this:

apiVersion: v1
baseDomain: ilya-openshift-test-1.cilium.rocks
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: cluster-1
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: Cilium
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: eu-west-1
publish: External
pullSecret: '{"auths":{"fake":{"auth": "bar"}}}'
sshKey: |
  ssh-rsa <REDACTED>

You may wish to make a few changes, e.g. increase the number of nodes. If you do change any of the CIDRs, you will need to make sure that Helm values used below reflect those changes. Namely - clusterNetwork should match clusterPoolIPv4PodCIDR & clusterPoolIPv4MaskSize. Also make sure that the clusterNetwork does not conflict with machineNetwork (which represents the VPC CIDR in AWS).

Next, generate OpenShift manifests:

openshift-install create manifests --dir "${CLUSTER_NAME}"

Now, define cilium namespace:

cat << EOF > "${CLUSTER_NAME}/manifests/cluster-network-03-cilium-namespace.yaml"
apiVersion: v1
kind: Namespace
metadata:
  name: cilium
  annotations:
    # node selector is required to make cilium-operator run on control plane nodes
    openshift.io/node-selector: ""
  labels:
    name: cilium
    # run level sets priority for Cilium to be deployed prior to other components
    openshift.io/run-level: "0"
    # enable cluster logging for Cilium namespace
    openshift.io/cluster-logging: "true"
    # enable cluster monitoring for Cilium namespace
    openshift.io/cluster-monitoring: "true"
EOF

Note

First, make sure you have Helm 3 installed. Helm 2 is no longer supported.

Setup Helm repository:

helm repo add cilium https://helm.cilium.io/

Next, render Cilium manifest:

helm template cilium/cilium --version 1.9.0  \
   --namespace cilium \
   --set ipam.mode=cluster-pool \
   --set cni.binPath=/var/lib/cni/bin \
   --set cni.confPath=/var/run/multus/cni/net.d \
   --set ipam.operator.clusterPoolIPv4PodCIDR=10.128.0.0/14 \
   --set ipam.operator.clusterPoolIPv4MaskSize=23 \
   --set nativeRoutingCIDR=10.128.0.0/14 \
   --set bpf.masquerade=false \
   --set endpointRoutes.enabled=true \
   --output-dir "${OLDPWD}"
cd "${OLDPWD}"

Copy Cilium manifest to ${CLUSTER_NAME}/manifests:

for resource in cilium/templates/*
    do cp "${resource}" "${CLUSTER_NAME}/manifests/cluster-network-04-cilium-$(basename ${resource})"
done

Create the cluster:

Note

The sample output below is showing the AWS provider, but it should work the same way with other providers.

$ openshift-install create cluster --dir "${CLUSTER_NAME}"
WARNING   Discarding the Bootstrap Ignition Config that was provided in the target directory because its dependencies are dirty and it needs to be regenerated
INFO Consuming OpenShift Install (Manifests) from target directory
INFO Consuming Master Machines from target directory
INFO Consuming Worker Machines from target directory
INFO Consuming Bootstrap Ignition Config from target directory
INFO Consuming Common Manifests from target directory
INFO Consuming Openshift Manifests from target directory
INFO Credentials loaded from default AWS environment variables
INFO Creating infrastructure resources...
INFO Waiting up to 20m0s for the Kubernetes API at https://api.cluster-1.openshift-test-1.cilium.rocks:6443...
INFO API v1.18.3 up
INFO Waiting up to 40m0s for bootstrapping to complete...

Next, firewall configuration must be updated to allow Cilium ports. Please note that openshift-install doesn’t support custom firewall rules, so you will need to use one of the following scripts if you are using AWS or GCP. Azure does not need additional configuration.

Warning

You need to execute the following command to configure firewall rules just after INFO Waiting up to 40m0s for bootstrapping to complete... appears in the logs, or the installation will fail. It is safe to apply these changes once, OpenShift will not override these.

This script depends on jq & AWS CLI (aws). Make sure to run it inside of the same working directory where ${CLUSTER_NAME} directory is present.

infraID="$(jq -r < "${CLUSTER_NAME}/metadata.json" '.infraID')"
aws_region="$(jq -r < "${CLUSTER_NAME}/metadata.json" '.aws.region')"
cluster_tag="$(jq -r < "${CLUSTER_NAME}/metadata.json" '.aws.identifier[0] | to_entries | "Name=tag:\(.[0].key),Values=\(.[0].value)"')"

worker_sg="$(aws ec2 describe-security-groups --region "${aws_region}" --filters "${cluster_tag}" "Name=tag:Name,Values=${infraID}-worker-sg" | jq -r '.SecurityGroups[0].GroupId')"
master_sg="$(aws ec2 describe-security-groups --region "${aws_region}" --filters "${cluster_tag}" "Name=tag:Name,Values=${infraID}-master-sg" | jq -r '.SecurityGroups[0].GroupId')"

aws ec2 authorize-security-group-ingress --region "${aws_region}" \
   --ip-permissions \
      "IpProtocol=udp,FromPort=8472,ToPort=8472,UserIdGroupPairs=[{GroupId=${worker_sg}},{GroupId=${master_sg}}]" \
      "IpProtocol=tcp,FromPort=4240,ToPort=4240,UserIdGroupPairs=[{GroupId=${worker_sg}},{GroupId=${master_sg}}]" \
   --group-id "${worker_sg}"

aws ec2 authorize-security-group-ingress --region "${aws_region}" \
   --ip-permissions \
      "IpProtocol=udp,FromPort=8472,ToPort=8472,UserIdGroupPairs=[{GroupId=${worker_sg}},{GroupId=${master_sg}}]" \
      "IpProtocol=tcp,FromPort=4240,ToPort=4240,UserIdGroupPairs=[{GroupId=${worker_sg}},{GroupId=${master_sg}}]" \
   --group-id "${master_sg}"

This script depends on jq & Google Cloud SDK (gcloud). Make sure to run it inside of the same working directory where ${CLUSTER_NAME} directory is present.

infraID="$(jq -r < "${CLUSTER_NAME}/metadata.json" '.infraID')"
gcp_projectID="$(jq -r < "${CLUSTER_NAME}/metadata.json" '.gcp.projectID')"

gcloud compute firewall-rules create \
   --project="${gcp_projectID}" \
   --network="${infraID}-network" \
   --allow=tcp:4240,udp:8472,icmp \
   --source-tags="${infraID}-worker,${infraID}-master" \
   --target-tags="${infraID}-worker,${infraID}-master" \
     "${infraID}-cilium"

Accessing the cluster

To access the cluster you will need to use kubeconfig file from the ${CLUSTER_NAME}/auth directory:

export KUBECONFIG="${CLUSTER_NAME}/auth/kubeconfig"

Prepare cluster for Cilium connectivity test

In order for Cilium connectivity test pods to run on OpenShift, a simple custom SecurityContextConstraints object is required. It will to allow hostPort/hostNetwork that some of the connectivity test pods rely on, it sets only allowHostPorts and allowHostNetwork without any other privileges.

kubectl apply -f - << EOF
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
  name: cilium-test
allowHostPorts: true
allowHostNetwork: true
users:
  - system:serviceaccount:cilium-test:default
priority: null
readOnlyRootFilesystem: false
runAsUser:
  type: MustRunAsRange
seLinuxContext:
  type: MustRunAs
volumes: null
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostPID: false
allowPrivilegeEscalation: false
allowPrivilegedContainer: false
allowedCapabilities: null
defaultAddCapabilities: null
requiredDropCapabilities: null
groups: null
EOF

Deploy the connectivity test

You can deploy the “connectivity-check” to test connectivity between pods. It is recommended to create a separate namespace for this.

kubectl create ns cilium-test

Deploy the check with:

kubectl apply -n cilium-test -f https://raw.githubusercontent.com/cilium/cilium/v1.9/examples/kubernetes/connectivity-check/connectivity-check.yaml

It will deploy a series of deployments which will use various connectivity paths to connect to each other. Connectivity paths include with and without service load-balancing and various network policy combinations. The pod name indicates the connectivity variant and the readiness and liveness gate indicates success or failure of the test:

$ kubectl get pods -n cilium-test
NAME                                                     READY   STATUS    RESTARTS   AGE
echo-a-76c5d9bd76-q8d99                                  1/1     Running   0          66s
echo-b-795c4b4f76-9wrrx                                  1/1     Running   0          66s
echo-b-host-6b7fc94b7c-xtsff                             1/1     Running   0          66s
host-to-b-multi-node-clusterip-85476cd779-bpg4b          1/1     Running   0          66s
host-to-b-multi-node-headless-dc6c44cb5-8jdz8            1/1     Running   0          65s
pod-to-a-79546bc469-rl2qq                                1/1     Running   0          66s
pod-to-a-allowed-cnp-58b7f7fb8f-lkq7p                    1/1     Running   0          66s
pod-to-a-denied-cnp-6967cb6f7f-7h9fn                     1/1     Running   0          66s
pod-to-b-intra-node-nodeport-9b487cf89-6ptrt             1/1     Running   0          65s
pod-to-b-multi-node-clusterip-7db5dfdcf7-jkjpw           1/1     Running   0          66s
pod-to-b-multi-node-headless-7d44b85d69-mtscc            1/1     Running   0          66s
pod-to-b-multi-node-nodeport-7ffc76db7c-rrw82            1/1     Running   0          65s
pod-to-external-1111-d56f47579-d79dz                     1/1     Running   0          66s
pod-to-external-fqdn-allow-google-cnp-78986f4bcf-btjn7   0/1     Running   0          66s

Note

If you deploy the connectivity check to a single node cluster, pods that check multi-node functionalities will remain in the Pending state. This is expected since these pods need at least 2 nodes to be scheduled successfully.

Cleanup after connectivity test

Remove cilium-test namespace:

kubectl delete ns cilium-test

Remove SecurityContextConstraints:

kubectl delete scc cilium-test