Disk Cleanup
Once I had the cluster in running shape, I figured it was a good time to set up storage. I'd set up ZFS, SeaweedFS, and played with Ceph (with and without Rook), GlusterFS, and BeeGFS. I really liked SeaweedFS but thought it might be good to work with Longhorn, which seems (for better or worse) to be a good, "conventional" choice.
As mentioned previously, I have Talos installed on twelve Raspberry Pi 4B's. Eight of them (Erenford, Fenn, Gardener, Harlton, Inchfield, Jast, Karstark, and Lipps) have SSDs installed via USB <-> SATA cables. The one on Harlton isn't working; not sure if that's an issue with the SSD or the USB cable, but I haven't checked it out yet. The disks vary in size from 120GB to 1TB.
So I obligingly added some sections like this to my talconfig.yaml
:
userVolumes:
- name: usb
provisioning:
diskSelector:
match: disk.transport == "usb"
minSize: 100GiB
filesystem:
type: xfs
I applied, checked the disks - no change. I checked the dmesg
and Talos couldn't find > 100GiB to use. Weird. I lowered it to 1GiB, but it still didn't work. It was then I realized that Talos wouldn't just yeet an existing partition into the abyss; nice. So I used the handy talosctl wipe disk ... --drop-partition
commands to wipe the disks and drop the partitions so that the userVolumes
configs could work.
This worked everywhere except Inchfield, whose SSD was repurposed from a Proxmox machine with LVM logical volumes, volume groups, and physical volumes. Talos doesn't include any tools for dealing with LVM, and the wipe disk
command wouldn't work with the device mapper volumes, leading to an unfortunate error:
$ talosctl -n inchfield wipe disk sda3 --drop-partition
1 error occurred:
* inchfield: rpc error: code = FailedPrecondition desc = blockdevice "sda3" is in use by blockdevice "dm-0"
The solution was to create a static pod that contained the appropriate LVM tools and use that to delete the LVM resources.
I ended up with the following:
apiVersion: v1
kind: Pod
metadata:
name: lvm-cleanup
namespace: kube-system
spec:
hostNetwork: true
hostPID: true
hostIPC: true
containers:
- name: lvm-tools
image: ubuntu:22.04
command: ["/bin/bash"]
args: ["-c", "apt-get update && apt-get install -y lvm2 gdisk util-linux && while true; do sleep 3600; done"]
securityContext:
privileged: true
runAsUser: 0
volumeMounts:
- name: dev
mountPath: /dev
- name: sys
mountPath: /sys
- name: proc
mountPath: /proc
- name: run-udev
mountPath: /run/udev
- name: run-lvm
mountPath: /run/lvm
env:
- name: LVM_SUPPRESS_FD_WARNINGS
value: "1"
volumes:
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: proc
hostPath:
path: /proc
- name: run-udev
hostPath:
path: /run/udev
- name: run-lvm
hostPath:
path: /run/lvm
restartPolicy: Never
tolerations:
- operator: Exists
nodeSelector:
kubernetes.io/hostname: inchfield
and this script:
#!/bin/bash
set -e
echo "Current LVM state:"
echo "--- Volume Groups ---"
vgs || echo "No volume groups found"
echo
echo "--- Logical Volumes ---"
lvs || echo "No logical volumes found"
echo
echo "--- Physical Volumes ---"
pvs || echo "No physical volumes found"
echo
echo "Deactivating all volume groups..."
vgchange -an || echo "No volume groups to deactivate"
echo "Removing logical volumes..."
for lv in $(lvs --noheadings -o lv_path 2>/dev/null || true); do
echo "Removing logical volume: $lv"
lvremove -f "$lv" || echo "Failed to remove $lv"
done
echo "Removing volume groups..."
for vg in $(vgs --noheadings -o vg_name 2>/dev/null || true); do
echo "Removing volume group: $vg"
vgremove -f "$vg" || echo "Failed to remove $vg"
done
echo "Removing physical volumes..."
for pv in /dev/sda3 /dev/dm-6p3; do
if pvs "$pv" 2>/dev/null; then
echo "Removing physical volume: $pv"
pvremove -f "$pv" || echo "Failed to remove $pv"
else
echo "Physical volume $pv not found or already removed"
fi
done
echo "Wiping USB disk /dev/sda..."
if [ -b /dev/sda ]; then
sgdisk --zap-all /dev/sda
echo "USB disk /dev/sda wiped successfully"
else
echo "USB disk /dev/sda not found"
fi
echo
echo "=== Cleanup completed ==="
echo "Verify results:"
vgs || echo "No volume groups (expected)"
lvs || echo "No logical volumes (expected)"
pvs || echo "No physical volumes (expected)"
That seemed to do it; even without a reboot the xfs
volume appeared.
$ talosctl get discoveredvolumes --nodes inchfield
NODE NAMESPACE TYPE ID VERSION TYPE SIZE DISCOVERED LABEL PARTITIONLABEL
inchfield runtime DiscoveredVolume loop2 1 disk 483 kB squashfs
inchfield runtime DiscoveredVolume loop3 1 disk 66 MB squashfs
inchfield runtime DiscoveredVolume mmcblk0 1 disk 128 GB gpt
inchfield runtime DiscoveredVolume mmcblk0p1 1 partition 105 MB vfat EFI EFI
inchfield runtime DiscoveredVolume mmcblk0p2 1 partition 1.0 MB BIOS
inchfield runtime DiscoveredVolume mmcblk0p3 1 partition 2.1 GB xfs BOOT BOOT
inchfield runtime DiscoveredVolume mmcblk0p4 1 partition 1.0 MB talosmeta META
inchfield runtime DiscoveredVolume mmcblk0p5 1 partition 105 MB xfs STATE STATE
inchfield runtime DiscoveredVolume mmcblk0p6 1 partition 126 GB xfs EPHEMERAL EPHEMERAL
inchfield runtime DiscoveredVolume sda 1 disk 1.0 TB gpt
inchfield runtime DiscoveredVolume sda1 1 partition 1.0 TB xfs u-usb