StateDB in Cilium
Warning
StateDB and the reconciler are still under active development and the APIs & metrics documented here are not guaranteed to be stable yet.
Introduction
StateDB is an in-memory database developed for the Cilium project to manage control-plane state. It aims to simplify access and indexing of state and to increase resilience, modularity and testability by separating the control-plane state from the controllers that operates on it.
This document focuses on how StateDB is leveraged by Cilium and how to develop new features using it. For a detailed guide on StateDB API itself see the StateDB documentation.
We assume familiarity with the Hive framework. If you’re not familiar with it, consider reading through Guide to the Hive first.
Motivation
StateDB is a project born from lessons learned from development and production struggles. It aims to be a tool to systematically improve the resilience, testability and inspectability of the Cilium agent.
For developers it aims to offer simpler and safer ways to extend the agent by giving a unified
API (Table[Obj]
) for accessing shared state. The immutable data structures backing StateDB allow for
lockless readers which improves resiliency compared to the RWMutex+hashmap+callback pattern
where a bug in a controller observing the state may cause critical functions to either stop or significantly
decrease throughput. Additionally having flexible ways to access and index the state allows for opportunities to deduplicate
the state. Many components of the agent have historically functioned through callback-based subscriptions to
and maintained their own copies of state which has a significant impact on memory usage and GC overhead.
Unifying state storage behind a database-like abstraction allows building reusable utilities for
inspecting the state (cilium-dbg shell -- db
), reconciling state (StateDB reconciler) and observing
operations on state (StateDB metrics). At scale this leads to an architecture that is easier to
understand (smaller API surface), operate (state can be inspected) and extend (easy to access data).
The separation of state from logic operating on it (e.g. moving away from kitchen-sink “Manager” pattern) also opens up the ability to do wider and more meaningful integration testing on components of the agent. When most of the inputs and outputs of a component are tables, we can combine multiple components into an integration test that is solely defined in terms of test inputs and expected outputs. This allows more validation to be performed with fairly simple integration tests rather than with slower and costly end-to-end tests.
Architecture vision
The agent in this architectural style can be broadly considered to consist of:
User intent tables: objects from external data sources that tell the agent what it should do. These would be for example the Kubernetes core objects like Pods or the Cilium specific CRDs such as CiliumNetworkPolicy, or data ingested from other sources such as kvstore.
Controllers: control-loops that observe the user intent tables and compute the contents of the desired state tables.
Desired state tables: the internal state that the controllers produce to succinctly describe what should be done. For example a desired state table could describe what the contents of a BPF map should be or what routes should be installed.
Reconcilers: control-loops that observe the desired state tables and reconcile them against a target such as a BPF map or the Linux routing table. The reconciler is usually an instance of the StateDB reconciler which is defined in terms of a table of objects with a status field and the operations Update, Delete and Prune.
Dividing the agent this way we achieve a nice separation of concerns:
Separating the user intent into its own tables keeps the parsing and validation from the computation we’ll perform on the data. It also makes it nicer to reuse as it’s purely about representing the outside intent internally in an efficient way without tying it too much into implementation details of a specific feature.
By defining the controller as essentially the function from input tables to output tables it becomes easy to understand and test.
Separating the reconciliation from the desired state computation the complex logic of dealing with low-level errors and retrying is separate from the pure “business logic” computation.
Using the generic reconcilers allows using tried-and-tested and instrumented retry implementation.
The control-plane of the agent is essentially everything outside the reconcilers This allows us to integration test, simulate or benchmark the control-plane code without unreasonable amount of scaffolding. The easier it is to write reliable integration tests the more resilient the codebase becomes.
What we’re trying to achieve is well summarized by Fred Brooks in “The Mythical Man Month”:
Show me your flowchart and conceal your tables, and I shall continue to be mystified.Show me your tables, and I won’t usually need your flowchart; it’ll be obvious.
Defining tables
StateDB documentation gives a good introduction into how to create a table and its indexes, so we won’t repeat that here, but instead focus on Cilium specific details.
Let’s start off with some guidelines that you might want to consider:
By default publicly provide
Table[Obj]
so new features can build on it and it can be used in tests. Also export the table’s indexes or the query functions (var ByName = nameIndex.Query
).Do not export
RWTable[Obj]
if outside modules do not need to directly write into the table. If other modules do write into the table, consider defining “writer functions” that validate that the writes are well-formed.If the table is closely associated with a specific feature, define it alongside the implementation of the feature. If the table is shared by many modules, consider defining it in
daemon/k8s
orpkg/datapath/tables
so it is easy to discover.Make sure the object can be JSON marshalled so it can be inspected. If you need to store non-marshallable data (e.g. functions), make them private or mark them with
json:"-"
struct tag.If the object contains a map or set and it is often mutated, consider using the immutable
part.Map
orpart.Set
fromcilium/statedb
. Since these are immutable they don’t need to be deep-copied when modifying the object and there’s no risk of accidentally mutating them in-place.When designing a table consider how it can be used in tests outside your module. It’s a good idea to export your table constructor (New*Table) so it can be used by itself in an integration test of a module that depends on it.
Take into account the fact that objects be immutable by designing them to be cheap to shallow-clone. For example this could mean splitting off fields that are constant from creation into their own struct that’s referenced from the object.
Write benchmarks for your table to understand the cost of the indexing and storage use. See
benchmarks_test.go
incilium/statedb
for examples.If the object is small (<100 bytes) prefer storing it by value instead of by reference, e.g.
Table[MyObject]
instead ofTable[*MyObject]
. This reduces memory fragmentation and makes it safer to use since the fields can’t be accidentally mutated (anything inside that’s by reference of course can be mutated accidentally). Note though that each index will store a separate copy of the object. Measure if needed.
With that out of the way, let’s get concrete with a code example of a simple table and a controller that populates it:
package main
import (
"context"
"fmt"
"strconv"
"time"
"github.com/cilium/hive/cell"
"github.com/cilium/hive/job"
"github.com/cilium/statedb"
"github.com/cilium/statedb/index"
)
// Example is our object that we want to index and store in a table.
type Example struct {
ID uint64
CreatedAt time.Time
}
// TableHeader defines how cilium-dbg displays the header
func (e *Example) TableHeader() []string {
return []string{
"ID",
"CreatedAt",
}
}
// TableRow defines how cilium-dbg displays a row
func (e *Example) TableRow() []string {
return []string{
strconv.FormatUint(e.ID, 10),
e.CreatedAt.String(),
}
}
// TableName is a constant for the table name. This is used in cilium-dbg
// to refer to this table.
const TableName = "examples"
var (
// idIndex defines the primary index for the Example object.
idIndex = statedb.Index[Example, uint64]{
Name: "id",
FromObject: func(e Example) index.KeySet {
return index.NewKeySet(index.Uint64(e.ID))
},
FromKey: index.Uint64,
FromString: index.Uint64String,
Unique: true,
}
// ByID exports the query function for the id index. It's a convention
// for providing a short readable short-hand for creating queries.
// ("query" is essentially just the index name + the key created with
// the "FromKey" method defined above).
ByID = idIndex.Query
)
// NewExampleTable creates the table and registers it.
func NewExampleTable(db *statedb.DB) (statedb.RWTable[Example], error) {
tbl, err := statedb.NewTable(
TableName,
idIndex,
)
if err != nil {
return nil, err
}
return tbl, db.RegisterTable(tbl)
}
// Cell provides the Table[Example] and registers a controller to populate
// the table.
var Cell = cell.Module(
"example",
"Examples",
// Provide RWTable[Example] privately
cell.ProvidePrivate(NewExampleTable),
// Provide Table[Example] publicly
cell.Provide(statedb.RWTable[Example].ToTable),
// Register a controller that manages the contents of the
// table.
cell.Invoke(registerExampleController),
)
type exampleController struct {
db *statedb.DB
examples statedb.RWTable[Example]
}
// loop is a simple control-loop that once a second inserts an example object
// with an increasing [ID]. When 5 objects are reached it deletes everything
// and starts over.
func (e *exampleController) loop(ctx context.Context, health cell.Health) error {
id := uint64(0)
tick := time.NewTicker(time.Second)
defer tick.Stop()
health.OK("Starting")
for {
var tickTime time.Time
select {
case tickTime = <-tick.C:
case <-ctx.Done():
return nil
}
wtxn := e.db.WriteTxn(e.examples)
id++
if id <= 5 {
e.examples.Insert(wtxn, Example{ID: id, CreatedAt: tickTime})
} else {
e.examples.DeleteAll(wtxn)
id = 0
}
wtxn.Commit()
// Report the health of the job. This can be inspected with
// "cilium-dbg status --all-health" or with "cilium-dbg shell -- db/show health".
health.OK(fmt.Sprintf("%d examples inserted", id))
}
}
func registerExampleController(jg job.Group, db *statedb.DB, examples statedb.RWTable[Example]) {
// Construct the controller and add the loop() method as a one-shot background
// job to the module's job group.
// When the controller doesn't have any useful API to outside we can use this
// pattern instead of "Provide(NewController)" to keep things internal.
ctrl := &exampleController{db, examples}
jg.Add(job.OneShot(
"loop",
ctrl.loop,
))
}
To understand how the table defined by our example module can be consumed, we can construct a small mini-application:
package main
import (
"context"
"fmt"
"time"
"github.com/cilium/hive/cell"
"github.com/cilium/hive/job"
"github.com/cilium/statedb"
"github.com/cilium/cilium/pkg/hive"
"github.com/cilium/cilium/pkg/logging"
)
func followExamples(jg job.Group, db *statedb.DB, table statedb.Table[Example]) {
jg.Add(job.OneShot(
"follow",
func(ctx context.Context, _ cell.Health) error {
// Start tracking changes to the table. This instructs the database
// to keep deleted objects off to the side for us to observe.
wtxn := db.WriteTxn(table)
changeIterator, err := table.Changes(wtxn)
wtxn.Commit()
if err != nil {
return err
}
for {
// Iterate over the changed objects.
changes, watch := changeIterator.Next(db.ReadTxn())
for change, rev := range changes {
e := change.Object
fmt.Printf("ID: %d, CreatedAt: %s (revision: %d, deleted: %v)\n",
e.ID, e.CreatedAt.Format(time.Stamp), rev, change.Deleted)
}
// Wait until there's new changes to consume.
select {
case <-ctx.Done():
return nil
case <-watch:
}
}
},
))
}
func main() {
hive.New(
cell.Module("app", "Example app",
Cell,
cell.Invoke(followExamples),
),
).Run(logging.DefaultSlogLogger)
}
You can find and run the above examples in contrib/examples/statedb
:
$ cd contrib/examples/statedb && go run .
Pitfalls
Here are some common mistakes to be aware of:
Object is mutated after insertion to database. Since StateDB queries do not return copies, all readers will see the modifications.
Object (stored by reference, e.g.
*T
) returned from a query is mutated and then inserted. StateDB will catch this and panic. Objects stored by reference must be (shallow) cloned before mutating.Query is made with ReadTxn and results are used in a WriteTxn. The results may have changed between the ReadTxn and WriteTxn! If you want optimistic concurrency control, then use CompareAndSwap in the write transaction.
Inspecting with cilium-dbg
StateDB comes with script commands to inspect the tables. These can be invoked via
cilium-dbg shell
.
The db
command lists all registered tables:
root@kind-worker:/home/cilium# cilium-dbg shell -- db
Name Object count Deleted objects Indexes Initializers Go type Last WriteTxn
health 61 0 identifier, level [] types.Status health (107.3us ago, locked for 43.7us)
sysctl 20 0 name, status [] *tables.Sysctl sysctl (9.4m ago, locked for 12.8us)
mtu 2 0 cidr [] mtu.RouteMTU mtu (19.4m ago, locked for 5.4us)
...
The show
command prints out the table using the TableRow and TableHeader methods:
root@kind-worker:/home/cilium# cilium-dbg shell -- db/show mtu
Prefix DeviceMTU RouteMTU RoutePostEncryptMTU
::/0 1500 1450 1450
0.0.0.0/0 1500 1450 1450
The db/get
, db/prefix
, db/list
and db/lowerbound
allow querying a table, provided that the Index.FromString
method has
been defined:
root@kind-worker:/home/cilium# cilium-dbg shell -- db prefix --index=name devices cilium
Name Index Selected Type MTU HWAddr Flags Addresses
cilium_host 3 false veth 1500 c2:f6:99:50:af:71 up|broadcast|multicast 10.244.1.105, fe80::c0f6:99ff:fe50:af71
cilium_net 2 false veth 1500 5e:70:20:4d:8a:bc up|broadcast|multicast fe80::5c70:20ff:fe4d:8abc
cilium_vxlan 4 false vxlan 1500 b2:c6:10:14:48:47 up|broadcast|multicast fe80::b0c6:10ff:fe14:4847
The shell session can also be run interactively:
# cilium-dbg shell
/¯¯\
/¯¯\__/¯¯\
\__/¯¯\__/ Cilium 1.17.0-dev a5b41b93507e 2024-08-08T13:18:08+02:00 go version go1.23.1 linux/amd64
/¯¯\__/¯¯\ Welcome to the Cilium Shell! Type 'help' for list of commands.
\__/¯¯\__/
\__/
cilium> help db
db
Describe StateDB configuration
The 'db' command describes the StateDB configuration,
showing
...
cilium> db
Name Object count Zombie objects Indexes Initializers Go type Last WriteTxn
health 65 0 identifier, level [] types.Status health (993.6ms ago, locked for 25.7us)
sysctl 20 0 name, status [] *tables.Sysctl sysctl (5.3s ago, locked for 8.6us)
mtu 2 0 cidr [] mtu.RouteMTU mtu (4.4s ago, locked for 3.1us)
...
cilium> db/show mtu
Prefix DeviceMTU RouteMTU RoutePostEncryptMTU
::/0 1500 1450 1450
0.0.0.0/0 1500 1450 1450
cilium> db/show --out=/tmp/devices.json --format=json devices
...
Kubernetes reflection
To reflect Kubernetes objects from the API server into a table, the reflector
utility in pkg/k8s
can be used to automate this. For example, we can define
a table of pods and reflect them from Kubernetes into the table:
package main
import (
"log/slog"
"github.com/cilium/hive/cell"
"github.com/cilium/hive/job"
"github.com/cilium/statedb"
"github.com/cilium/statedb/index"
"k8s.io/client-go/tools/cache"
"github.com/cilium/cilium/pkg/k8s"
"github.com/cilium/cilium/pkg/k8s/client"
v1 "github.com/cilium/cilium/pkg/k8s/slim/k8s/api/core/v1"
"github.com/cilium/cilium/pkg/k8s/utils"
)
const PodTableName = "pods"
var (
// podNameIndex is the primary index for pods which indexes them by namespace+name.
podNameIndex = statedb.Index[*v1.Pod, string]{
Name: "name",
FromObject: func(obj *v1.Pod) index.KeySet {
return index.NewKeySet(index.String(obj.Namespace + "/" + obj.Name))
},
FromKey: index.String,
FromString: index.FromString,
Unique: true,
}
PodByName = podNameIndex.Query
)
// NewPodTable creates the pod table and registers it.
func NewPodTable(db *statedb.DB) (statedb.RWTable[*v1.Pod], error) {
tbl, err := statedb.NewTable(
PodTableName,
podNameIndex,
)
if err != nil {
return nil, err
}
return tbl, db.RegisterTable(tbl)
}
// PodListerWatcher is the lister watcher for pod objects. This is separately
// defined so integration tests can provide their own if needed.
type PodListerWatcher cache.ListerWatcher
func newPodListerWatcher(log *slog.Logger, cs client.Clientset) PodListerWatcher {
if !cs.IsEnabled() {
log.Error("client not configured, please set --k8s-kubeconfig-path")
return nil
}
return PodListerWatcher(utils.ListerWatcherFromTyped(cs.Slim().CoreV1().Pods("")))
}
// registerReflector creates and registers a reflector for pods.
func registerReflector(
jg job.Group,
lw PodListerWatcher,
db *statedb.DB,
pods statedb.RWTable[*v1.Pod],
) error {
if lw == nil {
return nil
}
cfg := k8s.ReflectorConfig[*v1.Pod]{
Name: "pods",
Table: pods,
ListerWatcher: lw,
// More options available to e.g. transform the objects.
}
return k8s.RegisterReflector(
jg,
db,
cfg,
)
}
// PodsCell provides Table[*v1.Pod] and registers a reflector to populate
// the table from the api-server.
var PodsCell = cell.Module(
"pods",
"Pods table",
cell.ProvidePrivate(
NewPodTable,
newPodListerWatcher,
),
cell.Provide(statedb.RWTable[*v1.Pod].ToTable),
cell.Invoke(registerReflector),
)
As earlier, we can then construct a small application to try this out:
package main
import (
"context"
"fmt"
"os"
"github.com/cilium/cilium/pkg/hive"
"github.com/cilium/cilium/pkg/k8s/client"
v1 "github.com/cilium/cilium/pkg/k8s/slim/k8s/api/core/v1"
"github.com/cilium/cilium/pkg/logging"
"github.com/cilium/hive/cell"
"github.com/cilium/hive/job"
"github.com/cilium/statedb"
"github.com/spf13/pflag"
)
func followPods(jg job.Group, db *statedb.DB, table statedb.Table[*v1.Pod]) {
jg.Add(job.OneShot(
"follow-pods",
func(ctx context.Context, _ cell.Health) error {
wtxn := db.WriteTxn(table)
changeIterator, err := table.Changes(wtxn)
wtxn.Commit()
if err != nil {
return err
}
for {
// Iterate over the changed objects.
changes, watch := changeIterator.Next(db.ReadTxn())
for change, rev := range changes {
pod := change.Object
fmt.Printf("Pod(%s/%s): %s (revision: %d, deleted: %v)\n",
pod.Namespace, pod.Name, pod.Status.Phase,
rev, change.Deleted)
}
// Wait until there's new changes to consume.
select {
case <-ctx.Done():
return nil
case <-watch:
}
}
},
))
}
var app = cell.Module(
"app",
"Example app",
client.Cell, // client.Clientset
PodsCell, // Table[*Pod]
cell.Invoke(followPods),
)
func main() {
h := hive.New(app)
h.RegisterFlags(pflag.CommandLine)
if err := pflag.CommandLine.Parse(os.Args); err != nil {
panic(err)
}
h.Run(logging.DefaultSlogLogger)
}
You can run the example in contrib/examples/statedb_k8s
to watch the pods in
your current cluster:
$ cd contrib/examples/statedb_k8s && go run . --k8s-kubeconfig-path ~/.kube/config
level=info msg=Starting
time="2024-09-05T11:22:15+02:00" level=info msg="Establishing connection to apiserver" host="https://127.0.0.1:44261" subsys=k8s-client
time="2024-09-05T11:22:15+02:00" level=info msg="Connected to apiserver" subsys=k8s-client
level=info msg=Started duration=9.675917ms
Pod(default/nginx): Running (revision: 1, deleted: false)
Pod(kube-system/cilium-envoy-8xwp7): Running (revision: 2, deleted: false)
...
Reconcilers
The StateDB reconciler can be used to reconcile changes on table against a target system.
To set up the reconciler you will need the following.
Add reconciler.Status
as a field into your object (there can be multiple):
type MyObject struct {
ID uint64
// ...
Status reconciler.Status
}
Implement the reconciliation operations (reconciler.Operations
):
type myObjectOps struct { ... }
var _ reconciler.Operations[*MyObject] = &myObjectOps{}
// Update reconciles the changed [obj] with the target.
func (ops *myObjectOps) Update(ctx context.Context, txn statedb.ReadTxn, obj *MyObject) error {
// Synchronize the target state with [obj]. [obj] is a clone and can be updated from here.
// [txn] can be used to access other tables, but note that Update() is only called when [obj] is
// marked pending.
...
// Return nil or an error. If not nil, the operation will be repeated with exponential backoff.
// If object changes the retrying will reset and Update() is called with latest object.
return err
}
// Delete removes the [obj] from the target.
func (ops *myObjectOps) Delete(ctx context.Context, txn statedb.ReadTxn, obj *MyObject) error {
...
// If error is not nil the delete is retried until it succeeds or an object is recreated
// with the same primary key.
return err
}
// Prune removes any stale/unexpected state in the target.
func (ops *myObjectOps) Prune(ctx context.Context, txn statedb.ReadTxn, objs iter.Seq2[*MyObject, statedb.Revision]) error {
// Compute the difference between [objs] and the target and remove anything unexpected in the target.
...
// If the returned error is not nil error is logged and metrics incremented. Failed pruning is currently not retried,
// but called periodically according to config.
return err
}
Register the reconciler:
func registerReconciler(
params reconciler.Params,
ops reconciler.Operations[*MyObject],
tbl statedb.RWTable[*MyObject],
) error {
// Reconciler[..] is an API the reconciler provides. Often not needed.
// Currently only contains the Prune() method to trigger immediate pruning.
var r reconciler.Reconciler[*MyObject]
r, err := RegisterReconciler(
params,
tbl,
(*MyObject).Clone,
(*MyObject).SetStatus,
(*MyObject).GetStatus,
ops,
nil, /* optional batch operations */
)
return err
}
var Cell = cell.Module(
"example",
"Example module",
...,
cell.Invoke(registerReconciler),
)
Insert objects with the Status
set to pending:
var myObjects statedb.RWTable[*MyObject]
wtxn := db.WriteTxn(myObjects)
myObjects.Insert(wtxn, &MyObject{ID: 123, Status: reconciler.StatusPending()})
wtxn.Commit()
The reconciler watches the tables (using Changes()
) and calls Update
for each
changed object that is Pending
or Delete
for each deleted object. On errors the object
will be retried (with configurable backoff) until the operation succeeds.
See the full runnable example in the StateDB repository.
The reconciler runs a background job which reports the health status of the reconciler.
The status is degraded if any objects failed to be reconciled and queued for retries.
Health can be inspected either with cilium-dbg status --all-health
or cilium-dbg statedb health
.
BPF maps
BPF maps can be reconciled with the operations returned by bpf.NewMapOps
.
The target object needs to implement the BinaryKey
and BinaryValue
to
construct the BPF key and value respectively. These can either construct the binary
value on the fly, or reference a struct defining the value. The example below uses
a struct as this is the prevalent style in Cilium.
// MyKey defines the raw BPF key
type MyKey struct { ... }
// MyValue defines the raw BPF key
type MyValue struct { ... }
type MyObject struct {
Key MyKey
Value MyValue
Status reconciler.Status
}
func (m *MyObject) BinaryKey() encoding.BinaryMarshaler {
return bpf.StructBinaryMarshaler{&m.Key}
}
func (m *MyObject) BinaryValue() encoding.BinaryMarshaler {
return bpf.StructBinaryMarshaler{&m.Value}
}
func registerReconciler(params reconciler.Params, objs statedb.RWTable[*MyObject], m *bpf.Map) error {
ops := bpf.NewMapOps[*MyObject](m)
_, err := reconciler.Register(
params,
objs,
func(obj *MyObject) *MyObject { return obj },
func(obj *MyObject, s reconciler.Status) *MyObject {
obj.Status = obj
return obj
},
func(obj *MyObject) reconciler.Status {
return e.Status
},
ops,
nil,
)
return err
}
For a real-world example see pkg/maps/bwmap/cell.go
.
Script commands
StateDB comes with a rich set of script commands for inspecting and manipulating tables:
# Show the registered tables
db
# Insert an object
db/insert my-table example.yaml
# Compare the contents of 'my-table' with a file. Retries until matches.
db/cmp my-table expected.table
# Show the contents of the table
db/show
# Write the object to a file
db/get my-table 'Foo' --format=yaml --out=foo.yaml
# Delete the object and assert that table is empty.
db/delete my-table example.yaml
db/empty my-table
-- expected.table --
Name Color
Foo Red
-- example.yaml --
name: Foo
color: Red
See help db
for full reference in cilium-dbg shell
or in the break
prompt in tests.
A good reference is also the existing tests. These can be found with git grep db/insert
.
Metrics
Metrics are available for both StateDB and the reconciler, but they are disabled
by default due to their fine granularity. These are defined in pkg/hive/statedb_metrics.go
and pkg/hive/reconciler_metrics.go
. As this documentation is manually maintained it may
be out-of-date so if things are not working, check the source code.
The metrics can be enabled by adding them to the helm prometheus.metrics
option with
the syntax +cilium_<name>
, where <name>
is the name of the metric in the table below.
For example, here is how to turn on all the metrics:
prometheus:
enabled: true
metrics:
- +cilium_statedb_write_txn_duration_seconds
- +cilium_statedb_write_txn_acquisition_seconds
- +cilium_statedb_table_contention_seconds
- +cilium_statedb_table_objects
- +cilium_statedb_table_revision
- +cilium_statedb_table_delete_trackers
- +cilium_statedb_table_graveyard_objects
- +cilium_statedb_table_graveyard_low_watermark
- +cilium_statedb_table_graveyard_cleaning_duration_seconds
- +cilium_reconciler_count
- +cilium_reconciler_duration_seconds
- +cilium_reconciler_errors_total
- +cilium_reconciler_errors_current
- +cilium_reconciler_prune_count
- +cilium_reconciler_prune_errors_total
- +cilium_reconciler_prune_duration_seconds
These are still under development and the metric names may change.
The metrics can be inspected even when disabled with the metrics
and metrics/plot
script commands as Cilium keeps samples of all metrics for the past 2 hours.
These metrics are also available in sysdump in HTML form (look for cilium-dbg-shell----metrics-html.html
).
# kubectl exec -it -n kube-system ds/cilium -- cilium-dbg shell
/¯¯\
/¯¯\__/¯¯\
\__/¯¯\__/ Cilium 1.17.0-dev a5b41b93507e 2024-08-08T13:18:08+02:00 go version go1.23.1 linux/amd64
/¯¯\__/¯¯\ Welcome to the Cilium Shell! Type 'help' for list of commands.
\__/¯¯\__/
\__/
# Dump the sampled StateDB metrics from the last 2 hours
cilium> metrics --sampled statedb
Metric Labels 5min 30min 60min 120min
cilium_statedb_table_contention_seconds handle=devices-controller table=devices 0s / 0s / 0s 0s / 0s / 0s 0s / 0s / 0s 0s / 0s / 0s
...
# Plot the rate of change in the "health" table
# (indicative of number of object writes per second)
cilium> metrics/plot --rate statedb_table_revision.*health
cilium_statedb_table_revision (rate per second)
[ table=health ]
╭────────────────────────────────────────────────────────────────────╮
2.4 ┤ .... ... ... . │
│ . . . . . . . .. │
│ . ............ ............. ............. .......│
1.2 ┤ . │
│ . │
│ . │
0.0 ┤. │
╰───┬───────────────────────────────┬──────────────────────────────┬─╯
-120min -60min now
# Plot the write transaction duration for the "devices" table
# (indicative of how long the table is locked during writes)
cilium> metrics/plot statedb_write_txn_duration.*devices
... omitted p50 and p90 plots ...
cilium_statedb_write_txn_duration_seconds (p99)
[ handle=devices-controller ]
╭────────────────────────────────────────────────────────────────────╮
47.2ms ┤ . │
│ . │
│ . . │
23.9ms ┤ . . │
│ . . │
│ .. . . ... │
0.5ms ┤................................. ..............................│
╰───┬───────────────────────────────┬──────────────────────────────┬─╯
-120min -60min now
# Plot the reconcilation errors for sysctl
cilium> metrics/plot reconciler_errors_current.*sysctl
cilium_reconciler_errors_current
[ module_id=agent.datapath.sysctl ]
╭────────────────────────────────────────────────────────────────────╮
0.0 ┤ │
│ │
│ │
0.0 ┤ │
│ │
│ │
0.0 ┤....................................................................│
╰───┬───────────────────────────────┬──────────────────────────────┬─╯
-120min -60min now
StateDB
Name |
Labels |
Description |
---|---|---|
|
|
Duration of the write transaction |
|
|
How long it took to lock target tables |
|
|
How long it took to lock a table for writing |
|
|
Number of objects in a table |
|
|
The current revision |
|
|
Number of delete trackers (e.g. Changes()) |
|
|
Number of deleted objects in graveyard |
|
|
Low watermark revision for deleting objects |
|
|
How long it took to GC the graveyard |
The label handle
is the database handle name (created with (*DB).NewHandle
). The default handle
is named DB
. The label table
and tables
(formatted as tableA+tableB
) are the StateDB tables
which the metric concerns.
Reconciler
Name |
Labels |
Description |
|
|
Number of reconcilation rounds performed |
|
|
Histogram of operation durations |
|
|
Total number of errors (update/delete) |
|
|
Current errors |
|
|
Number of pruning rounds |
|
|
Total number of errors during pruning |
|
|
Histogram of operation durations |
The label module_id
is the identifier for the Hive module under which the reconciler
was registered. op
is the operation performed, either update
or delete
.