Fixing MetalLB
As mentioned here, I purchased a new router to replace a power-hungry Dell server running OPNsense, and that cost me BGP support. This kills my MetalLB configuration, so I need to switch it to use Layer 2.
This transition represents a fundamental change in how MetalLB operates and requires understanding the trade-offs between BGP and Layer 2 modes.
BGP vs Layer 2 Architecture Comparison
BGP Mode (Previous Configuration)
- Dynamic routing: BGP speakers advertise LoadBalancer IPs to upstream routers
- True load balancing: Multiple nodes can announce the same service IP with ECMP
- Scalability: Router handles load distribution and failover automatically
- Network integration: Works with enterprise routing infrastructure
- Requirements: Router must support BGP (FRR, Quagga, hardware routers)
Layer 2 Mode (New Configuration)
- ARP announcements: Nodes respond to ARP requests for LoadBalancer IPs
- Active/passive failover: Only one node answers ARP for each service IP
- Simpler setup: No routing protocol configuration required
- Limited scalability: All traffic for a service goes through single node
- Requirements: Nodes must be on same Layer 2 network segment
Hardware Infrastructure Change
The transition was necessitated by hardware changes:
Previous Setup:
- Dell server: Power-hungry (likely PowerEdge) running OPNsense
- BGP support: FRR (Free Range Routing) plugin provided full BGP implementation
- Power consumption: High power draw from server-class hardware
- Complexity: Full routing stack with BGP, OSPF, and other protocols
New Setup:
- Consumer router: Lower power consumption
- No BGP support: Consumer-grade firmware lacks routing protocol support
- Simplified networking: Standard static routing and NAT
- Cost efficiency: Reduced power costs and hardware complexity
Migration Process
The migration involved several coordinated steps to minimize service disruption:
Step 1: Remove BGP Configuration
That shouldn't be too bad.
I think it's just a matter of deleting the BGP advertisement:
$ sudo kubectl -n metallb delete BGPAdvertisement primary
bgpadvertisement.metallb.io "primary" deleted
This command removes the BGP advertisement configuration, which:
- Stops route announcements: MetalLB speakers stop advertising LoadBalancer IPs via BGP
- Maintains IP allocation: Existing LoadBalancer services keep their assigned IPs
- Preserves connectivity: Services remain accessible until Layer 2 mode is configured
Step 2: Configure Layer 2 Advertisement
and creating an L2 advertisement:
$ cat tmp.yaml
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: primary
namespace: metallb
$ sudo kubectl apply -f tmp.yaml
l2advertisement.metallb.io/primary created
L2Advertisement Configuration Details:
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: primary
namespace: metallb
spec:
ipAddressPools:
- primary
nodeSelectors:
- matchLabels:
kubernetes.io/hostname: "*"
interfaces:
- eth0
Key behaviors in Layer 2 mode:
- ARP responder: Nodes respond to ARP requests for LoadBalancer IPs
- Leader election: One node per service IP elected as ARP responder
- Gratuitous ARP: Leader sends gratuitous ARP to announce IP ownership
- Failover: New leader elected if current leader becomes unavailable
Step 3: Router Static Route Configuration
After adding the static route to my router, I can see the friendly go-httpbin response when I navigate to https://10.4.11.1/
Static Route Configuration:
# Router configuration (varies by model)
# Destination: 10.4.11.0/24 (MetalLB IP pool)
# Gateway: 10.4.0.X (any cluster node IP)
# Interface: LAN interface connected to cluster network
Why static routes are necessary:
- IP pool isolation: MetalLB pool (
10.4.11.0/24) is separate from cluster network (10.4.0.0/20) - Router awareness: Router needs to know how to reach LoadBalancer IPs
- Return path: Ensures bidirectional connectivity for external clients
Network Topology Changes
Layer 2 Network Requirements
Physical topology:
[Internet] → [Router] → [Switch] → [Cluster Nodes]
↓
Static Route:
10.4.11.0/24 → cluster
ARP behavior:
- Client request: External client sends packet to LoadBalancer IP
- Router forwarding: Router forwards based on static route to cluster network
- ARP resolution: Router/switch broadcasts ARP request for LoadBalancer IP
- Node response: MetalLB leader node responds with its MAC address
- Traffic delivery: Subsequent packets sent directly to leader node
Failover Mechanism
Leader election process:
# Check current leader for a service
kubectl -n metallb logs -l app.kubernetes.io/component=speaker | grep "announcing"
# Example output:
# {"level":"info","ts":"2024-01-15T10:30:00Z","msg":"announcing","ip":"10.4.11.1","node":"bettley"}
Failover sequence:
- Leader failure: Current announcing node becomes unavailable
- Detection: MetalLB speakers detect leader absence (typically 10-30 seconds)
- Election: Remaining speakers elect new leader using deterministic algorithm
- Gratuitous ARP: New leader sends gratuitous ARP to update network caches
- Service restoration: Traffic resumes through new leader node
DNS Infrastructure Migration
I also lost some control over DNS, e.g. the router's DNS server will override all lookups for hellholt.net rather than forwarding requests to my DNS servers.
So I created a new domain, goldentooth.net, to handle this cluster. A couple of tweaks to ExternalDNS and some service definitions and I can verify that ExternalDNS is setting the DNS records correctly, although I don't seem to be able to resolve names just yet.
Domain Migration Impact
Previous Domain: hellholt.net
- Router control: New router overrides DNS resolution
- Local DNS interference: Router's DNS server intercepts queries
- Limited delegation: Consumer router lacks sophisticated DNS forwarding
New Domain: goldentooth.net
- External control: Managed entirely in AWS Route53
- Clean delegation: No local DNS interference
- ExternalDNS compatibility: Full automation support
ExternalDNS Configuration Updates
Domain filter change:
# Previous configuration
args:
- --domain-filter=hellholt.net
# New configuration
args:
- --domain-filter=goldentooth.net
Service annotation updates:
# httpbin service example
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: httpbin.goldentooth.net
# Previously: httpbin.hellholt.net
DNS record verification:
# Check Route53 records
aws route53 list-resource-record-sets --hosted-zone-id Z0736727S7ZH91VKK44A
# Verify DNS propagation
dig A httpbin.goldentooth.net
dig TXT httpbin.goldentooth.net # Ownership records
Performance and Operational Considerations
Layer 2 Mode Limitations
Single point of failure:
- Only one node handles traffic for each LoadBalancer IP
- Node failure causes service interruption until failover completes
- No load distribution across multiple nodes
Network broadcast traffic:
- ARP announcements increase broadcast traffic
- Gratuitous ARP during failover events
- Potential impact on large Layer 2 domains
Scalability constraints:
- All service traffic passes through single node
- Node bandwidth becomes bottleneck for high-traffic services
- Limited horizontal scaling compared to BGP mode
Monitoring and Troubleshooting
MetalLB speaker logs:
# Monitor speaker activities
kubectl -n metallb logs -l component=speaker --tail=50
# Check for leader election events
kubectl -n metallb logs -l component=speaker | grep -E "(leader|announcing|failover)"
# Verify ARP announcements
kubectl -n metallb logs -l component=speaker | grep "gratuitous ARP"
Network connectivity testing:
# Test ARP resolution for LoadBalancer IPs
arping -c 3 10.4.11.1
# Check MAC address consistency
arp -a | grep "10.4.11"
# Verify static routes on router
ip route show | grep "10.4.11.0/24"
Future TLS Strategy
I think I still need to get TLS working too, but I've soured on the idea of maintaining a cert per domain name and per service. I think I'll just have a wildcard over goldentooth.net and share that out. Too much aggravation otherwise. That's a problem for another time, though.
Wildcard certificate benefits:
- Simplified management: Single certificate for all subdomains
- Reduced complexity: No per-service certificate automation
- Cost efficiency: One certificate instead of multiple Let's Encrypt certs
- Faster deployment: No certificate provisioning delays for new services
Implementation considerations:
# Wildcard certificate configuration
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: goldentooth-wildcard
namespace: default
spec:
secretName: goldentooth-wildcard-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- "*.goldentooth.net"
- "goldentooth.net"
Configuration Persistence
The Layer 2 configuration is maintained in the gitops repository structure:
MetalLB Helm chart updates:
# values.yaml changes
spec:
# BGP configuration removed
# bgpPeers: []
# bgpAdvertisements: []
# Layer 2 configuration added
l2Advertisements:
- name: primary
ipAddressPools:
- primary
This transition demonstrates the flexibility of MetalLB to adapt to different network environments while maintaining service availability. While Layer 2 mode has limitations compared to BGP, it provides a viable solution for simpler network infrastructures and reduces operational complexity in exchange for some scalability constraints.
Post-Implementation Updates and Additional Fixes
After the initial MetalLB L2 migration, several additional issues were discovered and resolved to achieve full operational status.
Network Interface Selection Issues
During verification, a critical issue emerged with "super shaky" primary interface selection on cluster nodes. Some nodes (particularly newer ones like lipps and karstark) had both wired (eth0) and wireless (wlan0) interfaces active, causing:
- Calico confusion: CNI plugin using wireless interfaces for pod networking
- MetalLB routing failures: ARP announcements on wrong interfaces
- Inconsistent connectivity: Services unreachable from certain nodes
Solution implemented:
- Enhanced networking role: Created robust interface detection logic preferring
eth0 - Wireless interface management: Automatic detection and disabling of
wlan0on dual-homed nodes - SystemD persistence: Network configurations and wireless disable service survive reboots
- Network debugging tools: Installed comprehensive toolset (
arping,tcpdump,mtr, etc.)
Networking role improvements:
# /ansible/roles/goldentooth.setup_networking/tasks/main.yaml
- name: 'Set primary interface to eth0 if available'
ansible.builtin.set_fact:
metallb_interface: 'eth0'
when:
- 'network.metallb.interface == ""'
- 'eth0_exists.rc == 0'
- name: 'Disable wireless interface if both eth0 and wireless exist'
ansible.builtin.shell:
cmd: "ip link set {{ wireless_interface_name.stdout }} down"
when:
- 'wireless_interface_count.stdout | int > 0'
- 'eth0_exists.rc == 0'
DNS Architecture Migration
The L2 migration coincided with a broader DNS restructuring from hellholt.net to goldentooth.net with hierarchical service domains:
New domain structure:
- Nodes:
<node>.nodes.goldentooth.net - Kubernetes services:
<service>.services.k8s.goldentooth.net - Nomad services:
<service>.services.nomad.goldentooth.net - General services:
<service>.services.goldentooth.net
ExternalDNS integration:
# Service annotations for automatic DNS management
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: "argocd.services.k8s.goldentooth.net"
external-dns.alpha.kubernetes.io/ttl: "60"
Current Operational Status (July 2025)
The MetalLB L2 configuration is now fully operational with the following verified services:
Active LoadBalancer services:
- ArgoCD:
argocd.services.k8s.goldentooth.net→10.4.11.0 - HTTPBin:
httpbin.services.k8s.goldentooth.net→10.4.11.1
Verification commands (updated):
# Check MetalLB speaker status
kubectl -n metallb logs -l app.kubernetes.io/component=speaker --tail=20
# Verify L2 announcements
kubectl -n metallb logs -l app.kubernetes.io/component=speaker | grep "announcing"
# Test connectivity to LoadBalancer IPs
curl -I http://10.4.11.1/ # HTTPBin
curl -I http://10.4.11.0/ # ArgoCD
# Verify DNS resolution
dig argocd.services.k8s.goldentooth.net
dig httpbin.services.k8s.goldentooth.net
# Check interface status on all nodes
goldentooth command all_nodes "ip link show | grep -E '(eth0|wlan)'"
MetalLB configuration summary:
- Mode: Layer 2 (BGP disabled)
- IP Pool:
10.4.11.0 - 10.4.15.254 - Interface:
eth0(consistently across all nodes) - FRR: Disabled in Helm values for pure L2 operation