Fixing MetalLB

As mentioned here, I purchased a new router to replace a power-hungry Dell server running OPNsense, and that cost me BGP support. This kills my MetalLB configuration, so I need to switch it to use Layer 2.

This transition represents a fundamental change in how MetalLB operates and requires understanding the trade-offs between BGP and Layer 2 modes.

BGP vs Layer 2 Architecture Comparison

BGP Mode (Previous Configuration)

Dynamic routing: BGP speakers advertise LoadBalancer IPs to upstream routers
True load balancing: Multiple nodes can announce the same service IP with ECMP
Scalability: Router handles load distribution and failover automatically
Network integration: Works with enterprise routing infrastructure
Requirements: Router must support BGP (FRR, Quagga, hardware routers)

Layer 2 Mode (New Configuration)

ARP announcements: Nodes respond to ARP requests for LoadBalancer IPs
Active/passive failover: Only one node answers ARP for each service IP
Simpler setup: No routing protocol configuration required
Limited scalability: All traffic for a service goes through single node
Requirements: Nodes must be on same Layer 2 network segment

Hardware Infrastructure Change

The transition was necessitated by hardware changes:

Previous Setup:

Dell server: Power-hungry (likely PowerEdge) running OPNsense
BGP support: FRR (Free Range Routing) plugin provided full BGP implementation
Power consumption: High power draw from server-class hardware
Complexity: Full routing stack with BGP, OSPF, and other protocols

New Setup:

Consumer router: Lower power consumption
No BGP support: Consumer-grade firmware lacks routing protocol support
Simplified networking: Standard static routing and NAT
Cost efficiency: Reduced power costs and hardware complexity

Migration Process

The migration involved several coordinated steps to minimize service disruption:

Step 1: Remove BGP Configuration

That shouldn't be too bad.

I think it's just a matter of deleting the BGP advertisement:

$ sudo kubectl -n metallb delete BGPAdvertisement primary
bgpadvertisement.metallb.io "primary" deleted

This command removes the BGP advertisement configuration, which:

Stops route announcements: MetalLB speakers stop advertising LoadBalancer IPs via BGP
Maintains IP allocation: Existing LoadBalancer services keep their assigned IPs
Preserves connectivity: Services remain accessible until Layer 2 mode is configured

Step 2: Configure Layer 2 Advertisement

and creating an L2 advertisement:

$ cat tmp.yaml
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: primary
  namespace: metallb

$ sudo kubectl apply -f tmp.yaml
l2advertisement.metallb.io/primary created

L2Advertisement Configuration Details:

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: primary
  namespace: metallb
spec:
  ipAddressPools:
  - primary
  nodeSelectors:
  - matchLabels:
      kubernetes.io/hostname: "*"
  interfaces:
  - eth0

Key behaviors in Layer 2 mode:

ARP responder: Nodes respond to ARP requests for LoadBalancer IPs
Leader election: One node per service IP elected as ARP responder
Gratuitous ARP: Leader sends gratuitous ARP to announce IP ownership
Failover: New leader elected if current leader becomes unavailable

Step 3: Router Static Route Configuration

After adding the static route to my router, I can see the friendly go-httpbin response when I navigate to https://10.4.11.1/

Static Route Configuration:

# Router configuration (varies by model)
# Destination: 10.4.11.0/24 (MetalLB IP pool)
# Gateway: 10.4.0.X (any cluster node IP)
# Interface: LAN interface connected to cluster network

Why static routes are necessary:

IP pool isolation: MetalLB pool (10.4.11.0/24) is separate from cluster network (10.4.0.0/20)
Router awareness: Router needs to know how to reach LoadBalancer IPs
Return path: Ensures bidirectional connectivity for external clients

Network Topology Changes

Layer 2 Network Requirements

Physical topology:

[Internet] → [Router] → [Switch] → [Cluster Nodes]
                ↓
         Static Route:
         10.4.11.0/24 → cluster

ARP behavior:

Client request: External client sends packet to LoadBalancer IP
Router forwarding: Router forwards based on static route to cluster network
ARP resolution: Router/switch broadcasts ARP request for LoadBalancer IP
Node response: MetalLB leader node responds with its MAC address
Traffic delivery: Subsequent packets sent directly to leader node

Failover Mechanism

Leader election process:

# Check current leader for a service
kubectl -n metallb logs -l app.kubernetes.io/component=speaker | grep "announcing"

# Example output:
# {"level":"info","ts":"2024-01-15T10:30:00Z","msg":"announcing","ip":"10.4.11.1","node":"bettley"}

Failover sequence:

Leader failure: Current announcing node becomes unavailable
Detection: MetalLB speakers detect leader absence (typically 10-30 seconds)
Election: Remaining speakers elect new leader using deterministic algorithm
Gratuitous ARP: New leader sends gratuitous ARP to update network caches
Service restoration: Traffic resumes through new leader node

DNS Infrastructure Migration

I also lost some control over DNS, e.g. the router's DNS server will override all lookups for hellholt.net rather than forwarding requests to my DNS servers.

So I created a new domain, goldentooth.net, to handle this cluster. A couple of tweaks to ExternalDNS and some service definitions and I can verify that ExternalDNS is setting the DNS records correctly, although I don't seem to be able to resolve names just yet.

Domain Migration Impact

Previous Domain: hellholt.net

Router control: New router overrides DNS resolution
Local DNS interference: Router's DNS server intercepts queries
Limited delegation: Consumer router lacks sophisticated DNS forwarding

New Domain: goldentooth.net

External control: Managed entirely in AWS Route53
Clean delegation: No local DNS interference
ExternalDNS compatibility: Full automation support

ExternalDNS Configuration Updates

Domain filter change:

# Previous configuration
args:
- --domain-filter=hellholt.net

# New configuration
args:
- --domain-filter=goldentooth.net

Service annotation updates:

# httpbin service example
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: httpbin.goldentooth.net
    # Previously: httpbin.hellholt.net

DNS record verification:

# Check Route53 records
aws route53 list-resource-record-sets --hosted-zone-id Z0736727S7ZH91VKK44A

# Verify DNS propagation
dig A httpbin.goldentooth.net
dig TXT httpbin.goldentooth.net  # Ownership records

Performance and Operational Considerations

Layer 2 Mode Limitations

Single point of failure:

Only one node handles traffic for each LoadBalancer IP
Node failure causes service interruption until failover completes
No load distribution across multiple nodes

Network broadcast traffic:

ARP announcements increase broadcast traffic
Gratuitous ARP during failover events
Potential impact on large Layer 2 domains

Scalability constraints:

All service traffic passes through single node
Node bandwidth becomes bottleneck for high-traffic services
Limited horizontal scaling compared to BGP mode

Monitoring and Troubleshooting

MetalLB speaker logs:

# Monitor speaker activities
kubectl -n metallb logs -l component=speaker --tail=50

# Check for leader election events
kubectl -n metallb logs -l component=speaker | grep -E "(leader|announcing|failover)"

# Verify ARP announcements
kubectl -n metallb logs -l component=speaker | grep "gratuitous ARP"

Network connectivity testing:

# Test ARP resolution for LoadBalancer IPs
arping -c 3 10.4.11.1

# Check MAC address consistency
arp -a | grep "10.4.11"

# Verify static routes on router
ip route show | grep "10.4.11.0/24"

Future TLS Strategy

I think I still need to get TLS working too, but I've soured on the idea of maintaining a cert per domain name and per service. I think I'll just have a wildcard over goldentooth.net and share that out. Too much aggravation otherwise. That's a problem for another time, though.

Wildcard certificate benefits:

Simplified management: Single certificate for all subdomains
Reduced complexity: No per-service certificate automation
Cost efficiency: One certificate instead of multiple Let's Encrypt certs
Faster deployment: No certificate provisioning delays for new services

Implementation considerations:

# Wildcard certificate configuration
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: goldentooth-wildcard
  namespace: default
spec:
  secretName: goldentooth-wildcard-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - "*.goldentooth.net"
  - "goldentooth.net"

Configuration Persistence

The Layer 2 configuration is maintained in the gitops repository structure:

MetalLB Helm chart updates:

# values.yaml changes
spec:
  # BGP configuration removed
  # bgpPeers: []
  # bgpAdvertisements: []

  # Layer 2 configuration added
  l2Advertisements:
  - name: primary
    ipAddressPools:
    - primary

This transition demonstrates the flexibility of MetalLB to adapt to different network environments while maintaining service availability. While Layer 2 mode has limitations compared to BGP, it provides a viable solution for simpler network infrastructures and reduces operational complexity in exchange for some scalability constraints.

Post-Implementation Updates and Additional Fixes

After the initial MetalLB L2 migration, several additional issues were discovered and resolved to achieve full operational status.

Network Interface Selection Issues

During verification, a critical issue emerged with "super shaky" primary interface selection on cluster nodes. Some nodes (particularly newer ones like lipps and karstark) had both wired (eth0) and wireless (wlan0) interfaces active, causing:

Calico confusion: CNI plugin using wireless interfaces for pod networking
MetalLB routing failures: ARP announcements on wrong interfaces
Inconsistent connectivity: Services unreachable from certain nodes

Solution implemented:

Enhanced networking role: Created robust interface detection logic preferring eth0
Wireless interface management: Automatic detection and disabling of wlan0 on dual-homed nodes
SystemD persistence: Network configurations and wireless disable service survive reboots
Network debugging tools: Installed comprehensive toolset (arping, tcpdump, mtr, etc.)

Networking role improvements:

# /ansible/roles/goldentooth.setup_networking/tasks/main.yaml
- name: 'Set primary interface to eth0 if available'
  ansible.builtin.set_fact:
    metallb_interface: 'eth0'
  when:
    - 'network.metallb.interface == ""'
    - 'eth0_exists.rc == 0'

- name: 'Disable wireless interface if both eth0 and wireless exist'
  ansible.builtin.shell:
    cmd: "ip link set {{ wireless_interface_name.stdout }} down"
  when:
    - 'wireless_interface_count.stdout | int > 0'
    - 'eth0_exists.rc == 0'

DNS Architecture Migration

The L2 migration coincided with a broader DNS restructuring from hellholt.net to goldentooth.net with hierarchical service domains:

New domain structure:

Nodes: <node>.nodes.goldentooth.net
Kubernetes services: <service>.services.k8s.goldentooth.net
Nomad services: <service>.services.nomad.goldentooth.net
General services: <service>.services.goldentooth.net

ExternalDNS integration:

# Service annotations for automatic DNS management
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: "argocd.services.k8s.goldentooth.net"
    external-dns.alpha.kubernetes.io/ttl: "60"

Current Operational Status (July 2025)

The MetalLB L2 configuration is now fully operational with the following verified services:

Active LoadBalancer services:

ArgoCD: argocd.services.k8s.goldentooth.net → 10.4.11.0
HTTPBin: httpbin.services.k8s.goldentooth.net → 10.4.11.1

Verification commands (updated):

# Check MetalLB speaker status
kubectl -n metallb logs -l app.kubernetes.io/component=speaker --tail=20

# Verify L2 announcements
kubectl -n metallb logs -l app.kubernetes.io/component=speaker | grep "announcing"

# Test connectivity to LoadBalancer IPs
curl -I http://10.4.11.1/  # HTTPBin
curl -I http://10.4.11.0/  # ArgoCD

# Verify DNS resolution
dig argocd.services.k8s.goldentooth.net
dig httpbin.services.k8s.goldentooth.net

# Check interface status on all nodes
goldentooth command all_nodes "ip link show | grep -E '(eth0|wlan)'"

MetalLB configuration summary:

Mode: Layer 2 (BGP disabled)
IP Pool: 10.4.11.0 - 10.4.15.254
Interface: eth0 (consistently across all nodes)
FRR: Disabled in Helm values for pure L2 operation

Goldentooth