KubeVirt: Virtual Machines in Kubernetes

I've been wanting to run VMs on the cluster for a while now. Not because I have any immediate need for them, but because KubeVirt is one of those technologies that's just... cool? The idea of managing virtual machines as Kubernetes resources, using the same GitOps workflows, the same kubectl commands – it's elegant in a way that appeals to the part of my brain that got me into infrastructure in the first place.

Plus, having the ability to spin up VMs on demand could be useful for testing OS-level stuff, running workloads that don't containerize well, or just experimenting with things that need a full VM environment.

The Problem: Stuck Kustomizations

I added KubeVirt to my Flux setup and pushed the changes. After a few minutes, I checked on things:

$ flux get kustomizations -w
NAME                      REVISION              SUSPENDED    READY    MESSAGE
apps                      main@sha1:e98a5de0    False        False    dependency 'flux-system/infrastructure' is not ready
flux-system               main@sha1:e98a5de0    False        True     Applied revision: main@sha1:e98a5de0
httpbin                   main@sha1:e98a5de0    False        True     Applied revision: main@sha1:e98a5de0
infrastructure            main@sha1:e98a5de0    False        Unknown  Reconciliation in progress
kubevirt-cdi              main@sha1:e98a5de0    False        Unknown  Reconciliation in progress
kubevirt-instance         main@sha1:e98a5de0    False        False    dependency 'flux-system/kubevirt-cdi' is not ready
kubevirt-operator         main@sha1:e98a5de0    False        True     Applied revision: main@sha1:e98a5de0

Not great. kubevirt-cdi was stuck in "Reconciliation in progress" with an "Unknown" ready status. This caused a cascade of failures – kubevirt-instance couldn't start because it depends on kubevirt-cdi, and apps couldn't start because it depends on infrastructure.

The Debugging Journey

Time to dig in. First, I checked Flux events to see what was actually happening:

$ flux events
...
3m21s    Warning    HealthCheckFailed    Kustomization/kubevirt-cdi    health check failed after 9m30s: timeout waiting for: [Deployment/cdi/cdi-operator status: 'InProgress']

Ah! So the cdi-operator Deployment in the cdi namespace was stuck. The health check was timing out because the deployment never became healthy.

Let's look at the deployment itself:

$ kubectl -n cdi get deployments
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
cdi-operator   0/1     1            0           28m

0/1 ready. Not good. What about the pods?

$ kubectl -n cdi get all
NAME                                READY   STATUS             RESTARTS         AGE
pod/cdi-operator-797944b474-ql7fz   0/1     CrashLoopBackOff   10 (2m22s ago)   29m

CrashLoopBackOff! Now we're getting somewhere. Let me describe the pod to see what's going on:

$ kubectl -n cdi describe pod cdi-operator-797944b474-ql7fz
...
State:           Waiting
  Reason:        CrashLoopBackOff
Last State:      Terminated
  Reason:        Error
  Exit Code:     255
...
Events:
  Warning  BackOff    4m4s (x115 over 29m)  kubelet    Back-off restarting failed container

Exit code 255, constantly restarting. The pod events show it's been failing for 29 minutes. Let's check the logs:

$ kubectl -n cdi logs cdi-operator-797944b474-ql7fz
exec /usr/bin/cdi-operator: exec format error

There it is. "exec format error" – that's the kernel telling me it can't execute the binary because it was compiled for a different CPU architecture.

The pod was scheduled on jast, one of my Raspberry Pi 4B nodes running ARM64. The container image must be built for AMD64/x86_64, not ARM64.

Understanding the Architecture Mismatch

Of course, container images are architecture-specific unless they're built as multi-arch images with manifest lists. When you pull an image, the container runtime tries to find a manifest for your architecture. If it doesn't exist, you might get the wrong architecture anyway, and then... exec format error.

KubeVirt CDI does publish ARM64 images, but they're tagged differently. Instead of using manifest lists where the same tag works for all architectures, they use separate tags like v1.63.1-arm64. Not a big deal.

The Solution: Two Problems, Two Fixes

I had attempted to override the images:

images:
  - name: quay.io/kubevirt/cdi-operator
    newTag: v1.63.1-arm64
  - name: quay.io/kubevirt/cdi-controller
    newTag: v1.63.1-arm64
  # ... etc

But that didn't work. Turns out I missed something - my upstream kustomization at gitops/infrastructure/kubevirt/upstream-cdi/kustomization.yaml was still pulling the v1.61.1 manifest:

resources:
  - https://github.com/kubevirt/containerized-data-importer/releases/download/v1.61.1/cdi-operator.yaml

Even with image overrides, that manifest had environment variables hardcoded to v1.61.1 images, and v1.61.1-arm64 tags don't exist in the registry.

Committed, pushed, and forced a reconciliation:

flux reconcile kustomization kubevirt-cdi --with-source

Does It Work?

A minute later:

$ flux get kustomizations
NAME                      REVISION                 SUSPENDED    READY    MESSAGE
apps                      main@sha1:773fbac5       False        True     Applied revision: main@sha1:773fbac5
flux-system               main@sha1:773fbac5       False        True     Applied revision: main@sha1:773fbac5
httpbin                   main@sha1:773fbac5       False        True     Applied revision: main@sha1:773fbac5
infrastructure            main@sha1:773fbac5       False        True     Applied revision: main@sha1:773fbac5
kubevirt-cdi              main@sha1:773fbac5       False        True     Applied revision: main@sha1:773fbac5
kubevirt-instance         main@sha1:773fbac5       False        True     Applied revision: main@sha1:773fbac5
kubevirt-operator         main@sha1:773fbac5       False        True     Applied revision: main@sha1:773fbac5

All green! Everything reconciled successfully. The cascade of dependencies resolved – kubevirt-cdi became ready, which unblocked kubevirt-instance, which unblocked infrastructure, which unblocked apps.