Monitoring & Metrics
Cilium and Hubble can both be configured to serve Prometheus metrics. Prometheus is a pluggable metrics collection and storage system and can act as a data source for Grafana, a metrics visualization frontend. Unlike some metrics collectors like statsd, Prometheus requires the collectors to pull metrics from each source.
Cilium and Hubble metrics can be enabled independently of each other.
Cilium Metrics
Cilium metrics provide insights into the state of Cilium itself, namely
of the cilium-agent
, cilium-envoy
, and cilium-operator
processes.
To run Cilium with Prometheus metrics enabled, deploy it with the
prometheus.enabled=true
Helm value set.
Cilium metrics are exported under the cilium_
Prometheus namespace. Envoy
metrics are exported under the envoy_
Prometheus namespace, of which the
Cilium-defined metrics are exported under the envoy_cilium_
namespace.
When running and collecting in Kubernetes they will be tagged with a pod name
and namespace.
Installation
You can enable metrics for cilium-agent
(including Envoy) with the Helm value
prometheus.enabled=true
. To enable metrics for cilium-operator
,
use operator.prometheus.enabled=true
.
helm install cilium cilium/cilium --version 1.14.4 \ --namespace kube-system \ --set prometheus.enabled=true \ --set operator.prometheus.enabled=true
The ports can be configured via prometheus.port
,
envoy.prometheus.port
, or operator.prometheus.port
respectively.
When metrics are enabled, all Cilium components will have the following annotations. They can be used to signal Prometheus whether to scrape metrics:
prometheus.io/scrape: true
prometheus.io/port: 9962
To collect Envoy metrics the Cilium chart will create a Kubernetes headless
service named cilium-agent
with the prometheus.io/scrape:'true'
annotation set:
prometheus.io/scrape: true
prometheus.io/port: 9964
This additional headless service in addition to the other Cilium components is needed as each component can only have one Prometheus scrape and port annotation.
Prometheus will pick up the Cilium and Envoy metrics automatically if the following
option is set in the scrape_configs
section:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
Hubble Metrics
While Cilium metrics allow you to monitor the state Cilium itself, Hubble metrics on the other hand allow you to monitor the network behavior of your Cilium-managed Kubernetes pods with respect to connectivity and security.
Installation
To deploy Cilium with Hubble metrics enabled, you need to enable Hubble with
hubble.enabled=true
and provide a set of Hubble metrics you want to
enable via hubble.metrics.enabled
.
Some of the metrics can also be configured with additional options. See the Hubble exported metrics section for the full list of available metrics and their options.
helm install cilium cilium/cilium --version 1.14.4 \ --namespace kube-system \ --set prometheus.enabled=true \ --set operator.prometheus.enabled=true \ --set hubble.enabled=true \ --set hubble.metrics.enableOpenMetrics=true \ --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}"
The port of the Hubble metrics can be configured with the
hubble.metrics.port
Helm value.
Note
L7 metrics such as HTTP, are only emitted for pods that enable Layer 7 Protocol Visibility.
When deployed with a non-empty hubble.metrics.enabled
Helm value, the
Cilium chart will create a Kubernetes headless service named hubble-metrics
with the prometheus.io/scrape:'true'
annotation set:
prometheus.io/scrape: true
prometheus.io/port: 9965
Set the following options in the scrape_configs
section of Prometheus to
have it scrape all Hubble metrics from the endpoints automatically:
scrape_configs:
- job_name: 'kubernetes-endpoints'
scrape_interval: 30s
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
OpenMetrics
Additionally, you can opt-in to OpenMetrics by
setting hubble.metrics.enableOpenMetrics=true
.
Enabling OpenMetrics configures the Hubble metrics endpoint to support exporting
metrics in OpenMetrics format when explicitly requested by clients.
Using OpenMetrics supports additional functionality such as Exemplars, which enables associating metrics with traces by embedding trace IDs into the exported metrics.
Prometheus needs to be configured to take advantage of OpenMetrics and will only scrape exemplars when the exemplars storage feature is enabled.
OpenMetrics imposes a few additional requirements on metrics names and labels, so this functionality is currently opt-in, though we believe all of the Hubble metrics conform to the OpenMetrics requirements.
Cluster Mesh API Server Metrics
Cluster Mesh API Server metrics provide insights into the state of the
clustermesh-apiserver
process, the kvstoremesh
process (if enabled),
and the sidecar etcd instance.
Cluster Mesh API Server metrics are exported under the cilium_clustermesh_apiserver_
Prometheus namespace. KVStoreMesh metrics are exported under the cilium_kvstoremesh_
Prometheus namespace. Etcd metrics are exported under the etcd_
Prometheus namespace.
Installation
You can enable the metrics for different Cluster Mesh API Server components by setting the following values:
clustermesh-apiserver:
clustermesh.apiserver.metrics.enabled=true
kvstoremesh:
clustermesh.apiserver.metrics.kvstoremesh.enabled=true
sidecar etcd instance:
clustermesh.apiserver.metrics.etcd.enabled=true
helm install cilium cilium/cilium --version 1.14.4 \ --namespace kube-system \ --set clustermesh.useAPIServer=true \ --set clustermesh.apiserver.metrics.enabled=true \ --set clustermesh.apiserver.metrics.kvstoremesh.enabled=true \ --set clustermesh.apiserver.metrics.etcd.enabled=true
You can figure the ports by way of clustermesh.apiserver.metrics.port
,
clustermesh.apiserver.metrics.kvstoremesh.port
and
clustermesh.apiserver.metrics.etcd.port
respectively.
You can automatically create a
Prometheus Operator
ServiceMonitor
by setting clustermesh.apiserver.metrics.serviceMonitor.enabled=true
.
Example Prometheus & Grafana Deployment
If you don’t have an existing Prometheus and Grafana stack running, you can deploy a stack with:
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.14.4/examples/kubernetes/addons/prometheus/monitoring-example.yaml
It will run Prometheus and Grafana in the cilium-monitoring
namespace. If
you have either enabled Cilium or Hubble metrics, they will automatically
be scraped by Prometheus. You can then expose Grafana to access it via your browser.
kubectl -n cilium-monitoring port-forward service/grafana --address 0.0.0.0 --address :: 3000:3000
Open your browser and access http://localhost:3000/
Metrics Reference
cilium-agent
Configuration
To expose any metrics, invoke cilium-agent
with the
--prometheus-serve-addr
option. This option takes a IP:Port
pair but
passing an empty IP (e.g. :9962
) will bind the server to all available
interfaces (there is usually only one in a container).
To customize cilium-agent
metrics, configure the --metrics
option with
"+metric_a -metric_b -metric_c"
, where +/-
means to enable/disable
the metric. For example, for really large clusters, users may consider to
disable the following two metrics as they generate too much data:
cilium_node_connectivity_status
cilium_node_connectivity_latency_seconds
You can then configure the agent with --metrics="-cilium_node_connectivity_status -cilium_node_connectivity_latency_seconds"
.
Exported Metrics by Default
Endpoint
Name |
Labels |
Default |
Description |
---|---|---|---|
|
Enabled |
Number of endpoints managed by this agent |
|
|
Disabled |
Maximum interface index observed for existing endpoints |
|
|
|
Enabled |
Count of all endpoint regenerations that have completed |
|
|
Enabled |
Endpoint regeneration time stats |
|
|
Enabled |
Count of all endpoints |
The default enabled status of endpoint_max_ifindex
is dynamic. On earlier
kernels (typically with version lower than 5.10), Cilium must store the
interface index for each endpoint in the conntrack map, which reserves 16 bits
for this field. If Cilium is running on such a kernel, this metric will be
enabled by default. It can be used to implement an alert if the ifindex is
approaching the limit of 65535. This may be the case in instances of
significant Endpoint churn.
Services
Name |
Labels |
Default |
Description |
---|---|---|---|
|
Enabled |
Number of services events labeled by action type |
Cluster health
Name |
Labels |
Default |
Description |
---|---|---|---|
|
Enabled |
Number of nodes that cannot be reached |
|
|
Enabled |
Number of health endpoints that cannot be reached |
Node Connectivity
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
The last observed status of both ICMP and HTTP connectivity between the current Cilium agent and other Cilium nodes |
|
|
Enabled |
The last observed latency between the current Cilium agent and other Cilium nodes in seconds |
Clustermesh
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
The total number of global services in the cluster mesh |
|
|
Enabled |
The total number of remote clusters meshed with the local cluster |
|
|
Enabled |
The total number of failures related to the remote cluster |
|
|
Enabled |
The total number of nodes in the remote cluster |
|
|
Enabled |
The timestamp of the last failure of the remote cluster |
|
|
Enabled |
The readiness status of the remote cluster |
Datapath
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Number of conntrack dump resets. Happens when a BPF entry gets removed while dumping the map is in progress. |
|
|
Enabled |
Number of times that the conntrack garbage collector process was run |
|
Enabled |
The number of alive and deleted conntrack entries at the end of a garbage collector run labeled by datapath family |
|
|
|
Enabled |
The number of alive and deleted conntrack entries at the end of a garbage collector run |
|
|
Enabled |
Duration in seconds of the garbage collector process |
IPSec
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Total number of xfrm errors |
|
Enabled |
Number of keys in use |
|
|
|
Enabled |
Number of XFRM states |
|
|
Enabled |
Number of XFRM policies |
eBPF
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
Duration of eBPF system call performed |
|
|
Enabled |
Number of eBPF map operations performed. |
|
|
Enabled |
Map pressure is defined as a ratio of the required map size compared to its configured size. Values < 1.0 indicate the map’s utilization, while values >= 1.0 indicate that the map is full. Policy map metrics are only reported when the ratio is over 0.1, ie 10% full. |
|
Enabled |
Max memory used by eBPF maps installed in the system |
|
|
Enabled |
Max memory used by eBPF programs installed in the system |
Both bpf_maps_virtual_memory_max_bytes
and bpf_progs_virtual_memory_max_bytes
are currently reporting the system-wide memory usage of eBPF that is directly
and not directly managed by Cilium. This might change in the future and only
report the eBPF memory usage directly managed by Cilium.
Drops/Forwards (L3/L4)
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Total dropped packets |
|
|
Enabled |
Total dropped bytes |
|
|
Enabled |
Total forwarded packets |
|
|
Enabled |
Total forwarded bytes |
Policy
Name |
Labels |
Default |
Description |
---|---|---|---|
|
Enabled |
Number of policies currently loaded |
|
|
Enabled |
Total number of policies regenerated successfully |
|
|
|
Enabled |
Policy regeneration time stats labeled by the scope |
|
Enabled |
Highest policy revision number in the agent |
|
|
Enabled |
Number of times a policy import has failed |
|
|
Enabled |
Number of policy changes by outcome |
|
|
Enabled |
Number of endpoints labeled by policy enforcement status |
|
|
|
Enabled |
Time in seconds between a policy change and it being fully deployed into the datapath, labeled by the policy’s source |
Policy L7 (HTTP/Kafka)
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Number of redirects installed for endpoints |
|
Enabled |
Seconds waited for upstream server to reply to a request |
|
|
Disabled |
Number of total datapath update timeouts due to FQDN IP updates |
|
|
|
Enabled |
Number of total L7 requests/responses |
Identity
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Number of identities currently allocated |
|
|
Enabled |
Number of errors interacting with the ipcache |
|
|
Enabled |
Number of events interacting with the ipcache |
Events external to Cilium
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Last timestamp when we received an event |
Controllers
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Number of times that a controller process was run |
|
|
Enabled |
Duration in seconds of the controller process |
|
Enabled |
Number of failing controllers |
SubProcess
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Number of times that Cilium has started a subprocess |
Kubernetes
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Number of Kubernetes events received |
|
|
Enabled |
Number of Kubernetes events processed |
|
|
Enabled |
Duration in seconds in how long it took to complete a CNP status update |
|
Enabled |
Number of terminating endpoint events received from Kubernetes |
Kubernetes Rest Client
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Duration of processed API calls labeled by path and method |
|
|
Enabled |
Kubernetes client rate limiter latency in seconds. Broken down by path and method |
|
|
Enabled |
Number of API calls made to kube-apiserver labeled by host, method and return code |
IPAM
Name |
Labels |
Default |
Description |
---|---|---|---|
|
Enabled |
Number of IPAM events received labeled by action and datapath family type |
|
|
|
Enabled |
Number of allocated IP addresses |
KVstore
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Duration of kvstore operation |
|
|
Enabled |
Seconds waited before a received event was queued |
|
|
Enabled |
Number of quorum errors |
|
|
Enabled |
Number of elements queued for synchronization in the kvstore |
|
|
Enabled |
Whether the initial synchronization from/to the kvstore has completed |
Agent
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Duration of various bootstrap phases |
|
Enabled |
Processing time of all the API calls made to the cilium-agent, labeled by API method, API path and returned HTTP code. |
FQDN
Name |
Labels |
Default |
Description |
---|---|---|---|
|
Enabled |
Number of FQDNs that have been cleaned on FQDN garbage collector job |
|
|
|
Disabled |
Number of domains inside the DNS cache that have not expired (by TTL), per endpoint |
|
|
Disabled |
Number of IPs inside the DNS cache associated with a domain that has not expired (by TTL), per endpoint |
|
|
Disabled |
Number of IPs associated with domains that have expired (by TTL) yet still associated with an active connection (aka zombie), per endpoint |
API Rate Limiting
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Most recent adjustment factor for automatic adjustment |
|
|
Enabled |
Total number of API requests processed |
|
|
Enabled |
Mean and estimated processing duration in seconds |
|
|
Enabled |
Current rate limiting configuration (limit and burst) |
|
|
Enabled |
Current and maximum allowed number of requests in flight |
|
|
Enabled |
Mean, min, and max wait duration |
|
|
Disabled |
Histogram of wait duration per API call processed |
cilium-operator
Configuration
cilium-operator
can be configured to serve metrics by running with the
option --enable-metrics
. By default, the operator will expose metrics on
port 9963, the port can be changed with the option
--operator-prometheus-serve-addr
.
Exported Metrics
All metrics are exported under the cilium_operator_
Prometheus namespace.
IPAM
Note
IPAM metrics are all Enabled
only if using the AWS, Alibabacloud or Azure IPAM plugins.
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Number of IPs allocated |
|
|
Enabled |
Number of IP allocation operations. |
|
|
Enabled |
Number of IP release operations. |
|
|
Enabled |
Number of interfaces creation operations. |
|
|
Enabled |
Release ip or interface latency in seconds |
|
|
Enabled |
Allocation ip or interface latency in seconds |
|
Enabled |
Number of interfaces with addresses available |
|
|
Enabled |
Number of nodes unable to allocate more addresses |
|
|
Enabled |
Number of synchronization operations with external IPAM API |
|
|
|
Enabled |
Duration of interactions with external IPAM API. |
|
|
Enabled |
Duration of rate limiting while accessing external IPAM API |
|
|
Enabled |
Number of available IPs on a node (taking into account plugin specific NIC/Address limits). |
|
|
Enabled |
Number of currently used IPs on a node. |
|
|
Enabled |
Number of IPs needed to satisfy allocation on a node. |
Hubble
Configuration
Hubble metrics are served by a Hubble instance running inside cilium-agent
.
The command-line options to configure them are --enable-hubble
,
--hubble-metrics-server
, and --hubble-metrics
.
--hubble-metrics-server
takes an IP:Port
pair, but
passing an empty IP (e.g. :9965
) will bind the server to all available
interfaces. --hubble-metrics
takes a comma-separated list of metrics.
Some metrics can take additional semicolon-separated options per metric, e.g.
--hubble-metrics="dns:query;ignoreAAAA,http:destinationContext=workload-name"
will enable the dns
metric with the query
and ignoreAAAA
options,
and the http
metric with the destinationContext=workload-name
option.
Context Options
Hubble metrics support configuration via context options. Supported context options for all metrics:
sourceContext
- Configures thesource
label on metrics for both egress and ingress traffic.sourceEgressContext
- Configures thesource
label on metrics for egress traffic (takes precedence oversourceContext
).sourceIngressContext
- Configures thesource
label on metrics for ingress traffic (takes precedence oversourceContext
).destinationContext
- Configures thedestination
label on metrics for both egress and ingress traffic.destinationEgressContext
- Configures thedestination
label on metrics for egress traffic (takes precedence overdestinationContext
).destinationIngressContext
- Configures thedestination
label on metrics for ingress traffic (takes precedence overdestinationContext
).labelsContext
- Configures a list of labels to be enabled on metrics.
There are also some context options that are specific to certain metrics. See the documentation for the individual metrics to see what options are available for each.
See below for details on each of the different context options.
Most Hubble metrics can be configured to add the source and/or destination
context as a label using the sourceContext
and destinationContext
options. The possible values are:
Option Value |
Description |
---|---|
|
All Cilium security identity labels |
|
Kubernetes namespace name |
|
Kubernetes pod name and namespace name in the form of |
|
Kubernetes pod name. |
|
All known DNS names of the source or destination (comma-separated) |
|
The IPv4 or IPv6 address |
|
Reserved identity label. |
|
Kubernetes pod’s workload name (workloads are: Deployment, Statefulset, Daemonset, ReplicationController, CronJob, Job, DeploymentConfig (OpenShift), etc). |
|
Kubernetes pod’s app name, derived from pod labels ( |
When specifying the source and/or destination context, multiple contexts can be
specified by separating them via the |
symbol.
When multiple are specified, then the first non-empty value is added to the
metric as a label. For example, a metric configuration of
flow:destinationContext=dns|ip
will first try to use the DNS name of the
target for the label. If no DNS name is known for the target, it will fall back
and use the IP address of the target instead.
Note
There are 3 cases in which the identity label list contains multiple reserved labels:
reserved:kube-apiserver
andreserved:host
reserved:kube-apiserver
andreserved:remote-node
reserved:kube-apiserver
andreserved:world
In all of these 3 cases, reserved-identity
context returns reserved:kube-apiserver
.
Hubble metrics can also be configured with a labelsContext
which allows providing a list of labels
that should be added to the metric. Unlike sourceContext
and destinationContext
, instead
of different values being put into the same metric label, the labelsContext
puts them into different label values.
Option Value |
Description |
---|---|
|
The source IP of the flow. |
|
The namespace of the pod if the flow source is from a Kubernetes pod. |
|
The pod name if the flow source is from a Kubernetes pod. |
|
The name of the source pod’s workload (Deployment, Statefulset, Daemonset, ReplicationController, CronJob, Job, DeploymentConfig (OpenShift)). |
|
The app name of the source pod, derived from pod labels ( |
|
The destination IP of the flow. |
|
The namespace of the pod if the flow destination is from a Kubernetes pod. |
|
The pod name if the flow destination is from a Kubernetes pod. |
|
The name of the destination pod’s workload (Deployment, Statefulset, Daemonset, ReplicationController, CronJob, Job, DeploymentConfig (OpenShift)). |
|
The app name of the source pod, derived from pod labels ( |
|
Identifies the traffic direction of the flow. Possible values are |
When specifying the flow context, multiple values can be specified by separating them via the ,
symbol.
All labels listed are included in the metric, even if empty. For example, a metric configuration of
http:labelsContext=source_namespace,source_pod
will add the source_namespace
and source_pod
labels to all Hubble HTTP metrics.
Note
To limit metrics cardinality hubble will remove data series bound to specific pod after one minute from pod deletion. Metric is considered to be bound to a specific pod when at least one of the following conditions is met:
sourceContext
is set topod
and metric series hassource
label matching<pod_namespace>/<pod_name>
destinationContext
is set topod
and metric series hasdestination
label matching<pod_namespace>/<pod_name>
labelsContext
contains bothsource_namespace
andsource_pod
and metric series labels match namespace and name of deleted podlabelsContext
contains bothdestination_namespace
anddestination_pod
and metric series labels match namespace and name of deleted pod
Exported Metrics
Hubble metrics are exported under the hubble_
Prometheus namespace.
lost events
This metric, unlike other ones, is not directly tied to network flows. It’s enabled if any of the other metrics is enabled.
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Enabled |
Number of lost events |
Labels
source
identifies the source of lost events, one of:
- perf_event_ring_buffer
- observer_events_queue
- hubble_ring_buffer
dns
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
Number of DNS queries observed |
|
|
Disabled |
Number of DNS responses observed |
|
|
Disabled |
Number of DNS response types |
Options
Option Key |
Option Value |
Description |
---|---|---|
|
N/A |
Include the query as label “query” |
|
N/A |
Ignore any AAAA requests/responses |
This metric supports Context Options.
drop
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
Number of drops |
Options
This metric supports Context Options.
flow
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
Total number of flows processed |
Options
This metric supports Context Options.
flows-to-world
This metric counts all non-reply flows containing the reserved:world
label in their
destination identity. By default, dropped flows are counted if and only if the drop reason
is Policy denied
. Set any-drop
option to count all dropped flows.
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
Total number of flows to |
Options
Option Key |
Option Value |
Description |
---|---|---|
|
N/A |
Count any dropped flows regardless of the drop reason. |
|
N/A |
Include the destination port as label |
|
N/A |
Only count non-reply SYNs for TCP flows. |
This metric supports Context Options.
http
Deprecated, use httpV2
instead.
These metrics can not be enabled at the same time as httpV2
.
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
Count of HTTP requests |
|
|
Disabled |
Count of HTTP responses |
|
|
Disabled |
Histogram of HTTP request duration in seconds |
Labels
method
is the HTTP method of the request/response.protocol
is the HTTP protocol of the request, (For example:HTTP/1.1
,HTTP/2
).status
is the HTTP status code of the response.reporter
identifies the origin of the request/response. It is set toclient
if it originated from the client,server
if it originated from the server, orunknown
if its origin is unknown.
Options
This metric supports Context Options.
httpV2
httpV2
is an updated version of the existing http
metrics.
These metrics can not be enabled at the same time as http
.
The main difference is that http_requests_total
and
http_responses_total
have been consolidated, and use the response flow
data.
Additionally, the http_request_duration_seconds
metric source/destination
related labels now are from the perspective of the request. In the http
metrics, the source/destination were swapped, because the metric uses the
response flow data, where the source/destination are swapped, but in httpV2
we correctly account for this.
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
Count of HTTP requests |
|
|
Disabled |
Histogram of HTTP request duration in seconds |
Labels
method
is the HTTP method of the request/response.protocol
is the HTTP protocol of the request, (For example:HTTP/1.1
,HTTP/2
).status
is the HTTP status code of the response.reporter
identifies the origin of the request/response. It is set toclient
if it originated from the client,server
if it originated from the server, orunknown
if its origin is unknown.
Options
Option Key |
Option Value |
Description |
---|---|---|
|
|
Include extracted trace IDs in HTTP metrics. Requires OpenMetrics to be enabled. |
This metric supports Context Options.
icmp
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
Number of ICMP messages |
Options
This metric supports Context Options.
kafka
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
Count of Kafka requests by topic |
|
|
Disabled |
Histogram of Kafka request duration by topic |
Options
This metric supports Context Options.
port-distribution
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
Numbers of packets distributed by destination port |
Options
This metric supports Context Options.
tcp
Name |
Labels |
Default |
Description |
---|---|---|---|
|
|
Disabled |
TCP flag occurrences |
Options
This metric supports Context Options.
clustermesh-apiserver
Configuration
To expose any metrics, invoke clustermesh-apiserver
with the
--prometheus-serve-addr
option. This option takes a IP:Port
pair but
passing an empty IP (e.g. :9962
) will bind the server to all available
interfaces (there is usually only one in a container).
Exported Metrics
All metrics are exported under the cilium_clustermesh_apiserver_
Prometheus namespace.
KVstore
Name |
Labels |
Description |
---|---|---|
|
|
Duration of kvstore operation |
|
|
Seconds waited before a received event was queued |
|
|
Number of quorum errors |
|
|
Number of elements queued for synchronization in the kvstore |
|
|
Whether the initial synchronization from/to the kvstore has completed |
API Rate Limiting
Name |
Labels |
Description |
---|---|---|
|
|
Total number of API requests processed |
|
|
Mean and estimated processing duration in seconds |
|
|
Current rate limiting configuration (limit and burst) |
|
|
Current and maximum allowed number of requests in flight |
|
|
Mean, min, and max wait duration |
kvstoremesh
Configuration
To expose any metrics, invoke kvstoremesh
with the
--prometheus-serve-addr
option. This option takes a IP:Port
pair but
passing an empty IP (e.g. :9964
) binds the server to all available
interfaces (there is usually only one interface in a container).
Exported Metrics
All metrics are exported under the cilium_kvstoremesh_
Prometheus namespace.
Remote clusters
Name |
Labels |
Description |
---|---|---|
|
|
The total number of remote clusters meshed with the local cluster |
|
|
The total number of failures related to the remote cluster |
|
|
The timestamp of the last failure of the remote cluster |
|
|
The readiness status of the remote cluster |
KVstore
Name |
Labels |
Description |
---|---|---|
|
|
Duration of kvstore operation |
|
|
Seconds waited before a received event was queued |
|
|
Number of quorum errors |
|
|
Number of elements queued for synchronization in the kvstore |
|
|
Whether the initial synchronization from/to the kvstore has completed |
API Rate Limiting
Name |
Labels |
Description |
---|---|---|
|
|
Total number of API requests processed |
|
|
Mean and estimated processing duration in seconds |
|
|
Current rate limiting configuration (limit and burst) |
|
|
Current and maximum allowed number of requests in flight |
|
|
Mean, min, and max wait duration |