Prometheus Blackbox Exporter 2

Way back in entry 053, I set up Blackbox Exporter via Ansible on bare metal. That was a different era – before Talos, before FluxCD, before I nuked everything and rebuilt it properly. Time to bring synthetic monitoring back, but this time the Kubernetes way.

Why Blackbox Monitoring?

I've got whitebox monitoring covered: Prometheus scrapes node-exporter, kube-state-metrics, application /metrics endpoints. I can see if pods are running, if CPU is spiking, if memory is tight.

But none of that tells me: can I actually reach my websites?

Enter blackbox monitoring. Instead of asking "is the service running?", we ask "does it work from the outside?" – like an actual user would. Blackbox Exporter makes HTTP requests to URLs and reports whether they succeeded, how long they took, what status code came back.

The Targets: External GitHub Pages Sites

This isn't about monitoring cluster services (though I could). I want to monitor two external sites:

  • https://goldentooth.net/ – the main site
  • https://clog.goldentooth.net/ – this very journal you're reading

Both are hosted on GitHub Pages. I don't control their infrastructure. GitHub handles the TLS certs, the CDN, all of it. But I still want to know when they're down – partly for awareness, partly so I can feel smug when GitHub has issues instead of wondering if I broke something.

The GitOps Structure

Created a new infrastructure component:

gitops/infrastructure/prometheus-blackbox-exporter/
├── kustomization.yaml
├── release.yaml       # HelmRelease
├── probes.yaml        # Probe CRD (what to monitor)
└── alerts.yaml        # PrometheusRule (when to alert)

The HelmRelease

Pretty minimal. The key piece is the module configuration – this defines how to probe:

config:
  modules:
    http_2xx:
      prober: http
      timeout: 10s
      http:
        valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
        valid_status_codes: [200, 201, 202, 203, 204, 301, 302]
        method: GET
        follow_redirects: true
        preferred_ip_protocol: ip4

The http_2xx module says: make an HTTP GET, follow redirects, accept any 2xx or redirect status as success. Simple.

The Probe CRD

This is where Prometheus Operator shines. Instead of manually configuring Prometheus with relabeling rules, I just declare what I want:

apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: external-websites
  namespace: monitoring
spec:
  interval: 60s
  module: http_2xx
  prober:
    url: prometheus-blackbox-exporter.monitoring.svc.cluster.local:9115
  targets:
    staticConfig:
      static:
        - https://goldentooth.net/
        - https://clog.goldentooth.net/
      labels:
        environment: external
        probe_type: website

Every 60 seconds, Prometheus will ask Blackbox to probe both URLs. The Operator handles all the plumbing.

Alerting Rules

Since these are GitHub Pages sites, I don't need certificate expiry warnings (GitHub handles that). Just two alerts:

- alert: WebsiteDown
  expr: probe_success == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Website {{ $labels.instance }} is down"

- alert: WebsiteSlow
  expr: probe_duration_seconds > 10
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Website {{ $labels.instance }} is slow"

The for: 5m clause is the Prometheus equivalent of "X failures before alerting" – with a 60-second probe interval, 5 minutes means roughly 5 consecutive failures. No flapping alerts from transient network blips.

Also added a meta-alert for when the blackbox exporter itself is broken:

- alert: BlackboxProbeFailed
  expr: up{job="probe"} == 0
  for: 5m

Because what good is monitoring if you don't monitor your monitoring?

Bonus: Exposing Prometheus UI

While I was in there, I realized I'd never exposed the Prometheus UI itself. Grafana was accessible via LoadBalancer, but Prometheus wasn't. Added the service config:

prometheus:
  service:
    type: LoadBalancer
    annotations:
      metallb.io/address-pool: default
      external-dns.alpha.kubernetes.io/hostname: prometheus.goldentooth.net

Now I can poke around at prometheus.goldentooth.net to see raw metrics, check which alerts are registered, debug scrape targets. Much nicer than port-forwarding every time.

Verification

After Flux reconciled everything:

$ kubectl get pods -n monitoring | grep blackbox
prometheus-blackbox-exporter-xxx   1/1     Running

$ kubectl get probes -n monitoring
NAME                AGE
external-websites   5m

In Prometheus UI → Status → Rules, the blackbox-exporter group shows up with all three alerts.

Query probe_success and both URLs show 1. Query probe_duration_seconds and GitHub Pages responds in ~200ms. Not bad.