Falco: Runtime Security for the Bramble

Why Runtime Security?

I've got observability coming out of my ears at this point. Prometheus scrapes everything that moves, Loki ingests every log line, Tempo traces requests across services, Alloy shuttles telemetry around, Gatus checks endpoints, Blackbox Exporter probes from the outside. I can tell you exactly how many bytes Garage wrote to disk at 3:47 AM last Tuesday. What I couldn't tell you is whether something inside a container just read /etc/shadow or opened a reverse shell.

Falco fills that gap. It's a CNCF graduated project that uses eBPF to monitor syscalls at the kernel level — every open(), connect(), execve(), dup() across every container on every node. The default ruleset catches the classics: sensitive file reads, unexpected network connections, privilege escalation, container escapes, crypto mining processes. It's the security equivalent of "I don't know what I'm looking for, but I'll know it when I see it."

The Deployment

Architecture

Falco runs as a DaemonSet — one pod per node. Each pod loads an eBPF probe into the kernel and watches syscalls in real time. Events matching rules get forwarded to Falcosidekick, which fans them out to:

Alertmanager (via v2 API) — for warnings and above, integrating with existing Prometheus alerting
ntfy (via webhook) — for critical events only, because I don't need my phone buzzing every time nic-watchdog pings the gateway

The whole stack:

Kernel syscalls → eBPF probe → Falco engine → Rules evaluation
                                                    ↓
                                              Falcosidekick
                                              ↙          ↘
                                    Alertmanager        ntfy (critical only)
                                         ↓
                                    Prometheus/Grafana

Talos + eBPF: A Love Story

Talos Linux has an immutable rootfs, which rules out Falco's traditional kernel module driver entirely. This is actually fine, because the kernel module approach was always kind of eh to me anyway.

Falco's modern_ebpf driver is the answer. It's compiled directly into the Falco binary, so there's no init container downloading drivers, no kernel header matching dance, no "sorry, we don't have a prebuilt probe for your kernel version." The eBPF probe just loads. Talos ships kernel 6.18.x, which is well above the minimum 5.8 requirement for modern eBPF. Every Pi 4B (Cortex-A72) and Pi 5 (Cortex-A76) handles it fine.

driver:
  kind: modern_ebpf
  loader:
    enabled: false    # No driver loader needed — probe is built into the binary

Two lines of config. That's it. No drama.

The GitOps Setup

Standard four-file Flux structure in gitops/infrastructure/falco/:

falco/
├── kustomization.yaml
├── namespace.yaml          # Privileged PSA (needs host-level eBPF access)
├── repository.yaml         # HelmRepository → falcosecurity.github.io/charts
└── release.yaml            # HelmRelease with Falco + Falcosidekick config

Key values:

# Modern eBPF, no loader
driver:
  kind: modern_ebpf
  loader:
    enabled: false

# JSON output for Loki, ISO timestamps
falco:
  json_output: true
  json_include_output_property: true
  json_include_tags_property: true
  time_format_iso_8601: true
  log_syslog: false       # Talos has no syslog
  http_output:
    enabled: true
    url: http://falco-falco-falcosidekick.falco.svc:2801

# Prometheus integration
serviceMonitor:
  create: true
  labels:
    release: monitoring-kube-prometheus-stack

# Falcosidekick sub-chart
falcosidekick:
  enabled: true
  config:
    alertmanager:
      hostport: http://monitoring-kube-prometheus-alertmanager.monitoring.svc:9093
      endpoint: /api/v2/alerts
      minimumpriority: warning
    webhook:
      address: http://ntfy.ntfy.svc:80/falco
      minimumpriority: critical

Resource limits are conservative given the Pi 4B fleet: 50m/256Mi requests, 500m/512Mi limits for Falco; 20m/64Mi requests for sidekick.

The Debugging Gauntlet

Bug 1: The Service Name

Falcosidekick deploys as a service, and Falco's http_output needs to reach it. I initially configured:

url: http://falco-falcosidekick.falco.svc:2801

The actual service name:

url: http://falco-falco-falcosidekick.falco.svc:2801

The Helm chart names the service {release}-{chart}-falcosidekick. Since the release is falco-falco (Flux prefixes with the HelmRelease name... or something — honestly I've stopped trying to predict Helm naming) the service gets falco-falco-falcosidekick. The SeaweedFS filer had the exact same class of bug two hours earlier. I'm beginning to think "guess the Helm service name" should be its own drinking game.

Bug 2: The DaemonSet Timeout

Helm's --wait flag blocks until ALL pods in a release are Ready and up-to-date. When you're rolling a DaemonSet across 16 Raspberry Pi nodes — each one pulling a 40MB container image over the network, starting an eBPF probe, loading rules, and becoming ready — "wait for everything" takes a while. More than 5 minutes. More than 10 minutes.

The first install timed out at 5 minutes (default). Bumped to 10 minutes. Timed out again. The DaemonSet was actually working — 15/16 pods ready, just one slow node still pulling the image — but Helm doesn't care about "almost done."

The fix: disableWait: true and timeout: 15m in the HelmRelease. Helm submits the manifests and returns immediately. The DaemonSet controller handles the rollout at its own pace.

install:
  crds: CreateReplace
  disableWait: true
  remediation:
    retries: 3
upgrade:
  crds: CreateReplace
  disableWait: true
  remediation:
    retries: 3

After clearing Helm release secrets and doing a fresh install with these settings, it went through cleanly.

Bug 3: Alertmanager 410 Gone

Falcosidekick's Alertmanager integration was returning 410 Gone:

2026/03/15 05:14:28 [ERROR] : AlertManager - unexpected Response (410)
2026/03/15 05:14:28 [ERROR] : AlertManager - 410 Gone

Sidekick defaults to the Alertmanager v1 API, which newer Alertmanager versions have deprecated and removed. One line fix:

alertmanager:
  endpoint: /api/v2/alerts

After that: AlertManager - POST OK (200).

What Falco Sees

Out of the box, Falco immediately started flagging things on the cluster:

"Contact K8S API Server From Container" — Garage's tokio workers connecting to the K8s API for peer discovery. Expected behavior, Notice priority.

"Redirect STDOUT/STDIN to Network Connection in Container" — nic-watchdog's busybox ping command. It pings the gateway every 15 seconds to check NIC health. Also expected, also Notice.

These are all legitimate activity that the default rules flag at low priority. They won't trigger Alertmanager (set to warning+) or ntfy (set to critical only), but they'll show up in Loki for forensic analysis. If I wanted to suppress them, I could add exception lists to the Falco rules — but having them in the log is actually useful for establishing a behavioral baseline.

The test I ran — kubectl run falco-test --image=busybox --rm -it -- cat /etc/shadow — got caught and forwarded to Alertmanager successfully. Falco saw the sensitive file read, classified it as Warning, Sidekick POSTed to Alertmanager v2 API, got a 200. The pipeline works end to end.

Coverage

Falco is now running on 16 of 17 nodes:

Nodes	Count	Coverage
Pi 4B workers	9	All covered
Pi 4B control plane	3	All covered
Pi 5 NVMe workers	4	All covered
x86 GPU (velaryon)	1	Not covered (tainted)

Velaryon has platform=x86:NoSchedule and gpu=true:NoSchedule taints. Adding Falco tolerations for it would be trivial, but the GPU node runs JupyterLab and gaming containers — probably the node that most deserves security monitoring, actually. Something for the future.

Goldentooth