BGP Control Plane Troubleshooting Guide

This document enumerates typical troubles and their solutions when configuring the BGP Control Plane.

Even though CiliumBGPPeeringPolicy was applied, BGP peering is not established

Check if the target Node is correctly selected by the nodeSelector of the CiliumBGPPeeringPolicy. The easiest way to do this is to use the cilium bgp peers command:

$ cilium bgp peers
Node                              Local AS   Peer AS   Peer Address   Session State   Uptime   Family         Received   Advertised
node0                             65001      65000     10.0.1.1       active          0s       ipv4/unicast   0          0
                                                                                               ipv6/unicast   0          0

If the Node is selected correctly, even if the session is not established, the name of the Node and the BGP state will be displayed. If nothing is displayed, there may be an error in the nodeSelector. If the Node is correctly selected, but the state does not become established, check the settings of both Cilium and the target peer.

Node is selected by CiliumBGPPeeringPolicy, but BGP peer is not established

You can identify the cause by referring to the logs of your peer router or Cilium. The errors logged by the BGP Control Plane have a field named subsys=bgp-control-plane, which can be used to filter logs for errors specific to BGP Control Plane:

$ kubectl -n <your namespace> <cilium pod running on the target node> logs | grep bgp-control-plane
...
level=warning msg="sent notification" Data="as number mismatch expected 65003, received 65000" Key=10.0.1.1 Topic=Peer asn=65001 component=gobgp.BgpServerInstance subsys=bgp-control-plane

In the example above, it can be seen that the BGP session was not established because there was a mismatch between the configured peerASN and the actual ASN of the peer.

There could be various reasons why BGP peering is not established, such as a mismatch in BGP capability or an incorrect Peer IP address. BGP layer errors are likely to appear in the logs, but there are cases where low-level errors, such as lack of connectivity to the Peer IP or when an eBGP peer is more than 1 hop away, may not be reflected in the logs. In such cases, using tools like WireShark or tcpdump can be effective.

The existing BGP session went down immediately after applying the new CiliumBGPPeeringPolicy

A node may be selected by multiple CiliumBPFPeeringPolicy objects based on the configured nodeSelector fields. If multiple policies are applied, the BGP control plane will clear all pre-existing state configured on the node. First, rollback the last applied CiliumBGPPeeringPolicy and check the logs of the node where the BGP session went down. If multiple policies were applied, there should be logs indicating this:

level=error msg="Policy selection failed" component=Controller.Reconcile error="more then one CiliumBGPPeeringPolicy applies to this node, please ensure only a single Policy matches this node's labels" subsys=bgp-control-plane

If you find logs like this, please review the configuration of nodeSelector and make sure that each node only has one associated CiliumBGPPeeringPolicy.