Skip to content

Data plane

Lace runs overlay networks over a routed IPv6 underlay, using SRv6 to carry traffic between nodes. Each node acts as a router between the underlay and the overlays it hosts — and that router is implemented entirely in eBPF. This page covers the underlay and encapsulation, the interfaces Lace builds on a node, where the eBPF programs hook in, and how a packet moves through them.

The underlay is the physical network between nodes. Lace requires only L3 (routed) IPv6 reachability between them — no special fabric. Each node owns an SRv6 locator (an IPv6 prefix); every encapsulated packet is addressed into the locator of the node hosting its destination, so the underlay only has to route each node’s locator to that node.

Today Lace establishes that routing itself, installing explicit routes on every node for the other nodes’ locators. Those routes are direct, so the nodes currently need L2 adjacency between them.

SRv6 (Segment Routing over IPv6) expresses a path as a list of IPv6 addresses called segments, each one an instruction to the node that owns it, carried in an IPv6 routing extension header. Lace uses it as a plain encapsulation with a single segment: the original packet is wrapped in an outer IPv6 header addressed to a SID (segment identifier) on the destination node.

Pod addresses are meaningful only inside their network, so cross-node traffic cannot ride the underlay as-is. What gets wrapped depends on the network type:

  • L3 networks encapsulate the pod’s IP packet.
  • L2 networks encapsulate the whole Ethernet frame, so MAC-level semantics are preserved across nodes.

A SID is an IPv6 address built from three parts:

  • the destination node’s locator prefix — routes the encapsulated packet to that node across the underlay,
  • a function — what the node should do on arrival, namely decapsulate,
  • the network id — which overlay the inner packet belongs to, so the far node delivers it into the right network.

The destination node matches the outer address against its own locator, decapsulates, and hands the inner packet to the local pod. Traffic between two pods on the same node stays local and is never encapsulated.

Each network is realised on a node as a small set of Linux interfaces:

  • VRF — one per network. eBPF forwards all overlay traffic itself, so the kernel’s router never carries it in the pod-to-pod or pod-to-host direction. The VRF handles isolation the other way: a network’s interfaces add connected routes — for L3, the bridge’s link-local and its gateway subnet — and the VRF keeps them in its own table instead of the host’s main table, so the host stack can’t route into the overlay on its own. It keeps each network’s routes from leaking into the main table, between overlays, or between an overlay and the host. Both L3 and L2 networks get one.
  • Bridge — one per L3 network, enslaved to that network’s VRF. An L3 network’s prefix is split into a host-local subnet per node; the bridge is the L2 segment for that subnet, forwarding between the pods on this node. It also carries the subnet’s gateway IP and a stable gateway MAC, which pods use as their default gateway. L2 networks have no bridge — their L2 domain spans nodes and is handled in eBPF.
  • Pod veths — one pair per pod. The container end lives in the pod’s namespace; the host end is attached to the network’s bridge (L3) or directly to its VRF (L2). The pod-side MTU is reduced from the underlay MTU to leave room for the SRv6 encapsulation overhead.
  • lc-host / lc-host-peer — a single veth pair for traffic that originates on the host itself (node-local health probes, a host process reaching a pod or ClusterIP). Host routes for pod and service prefixes point out lc-host; the traffic arrives on lc-host-peer’s ingress, the one place the in-kernel redirect to a pod is legal.
  • Underlay interfaces — the node’s uplinks, identified as the interfaces carrying a default route. The primary uplink carries SRv6 traffic to and from other nodes; eBPF attaches to each managed underlay interface so encapsulated overlay traffic and other north-south ingress are all processed on entry.

Every Lace program attaches at the TC ingress hook of its interface — never egress. This is deliberate: the in-kernel redirect that hands a packet straight into a pod’s veth is only permitted from an ingress hook, so the whole pipeline is built around ingress.

That single rule places the programs as follows:

  • Pod host veth, ingress — a pod’s egress. Traffic leaving a pod is processed here, with separate programs for L3 and L2 networks.
  • Underlay uplink, ingress — traffic arriving from other nodes and north-south. SRv6 is decapsulated here before the packet continues through the pipeline.
  • lc-host-peer, ingress — host-originated traffic, redirected to the destination pod.

Lace tracks every managed flow. The first packet of a new flow runs the full pipeline, and the decision it reaches is recorded as connection-tracking entries that match the flow in both directions:

  • the forward direction — so the rest of the flow’s packets take the fast path and skip the pipeline,
  • the return direction — so replies are recognised and follow the same path in reverse.

Because return traffic is matched against the entry created on the forward path, the data plane needs neither routing symmetry nor a matching reverse policy for it.

The entry programs hand off to a chain of small, self-contained stages, each tail-called from the last. A new flow runs through all of them; an established one is short-circuited at the fast path:

  • Network classification — determine which overlay the packet belongs to and resolve its source and destination context.
  • Fast path — look the flow up in connection tracking; a hit forwards it by the recorded decision and skips everything below.
  • Routing — resolve the destination and the topology between networks to decide where the packet goes.
  • Service routing — translate a service VIP to one of its backing endpoints.
  • Policy — enforce reachability between the source and destination segments.
  • Delivery — carry out the decision: encap in SRv6 to the node hosting a remote pod, local redirect to another pod on this node, or hand to the host stack (egress, optionally masqueraded). The flow is committed to connection tracking on the way out.

Every final accept or deny decision on a new flow is emitted as a structured trace event. Reading those traces and inspecting connection tracking is covered in Debugging the data plane.