GPU Storage NFS Export
With the cluster expanded to include the Velaryon GPU node, a natural question emerged: how can the Raspberry Pi cluster nodes efficiently exchange data with the GPU node for machine learning workloads and other compute-intensive tasks?
The solution was to leverage Velaryon's secondary 1TB NVMe SSD and expose it to the entire cluster via NFS, creating a high-speed shared storage pool specifically for Pi-GPU data exchange.
The Challenge
Velaryon came with two storage devices:
- Primary NVMe (nvme1n1): Linux system drive
- Secondary NVMe (nvme0n1): 1TB drive with old NTFS partitions from previous Windows installation
The goal was to repurpose this secondary drive as shared storage while maintaining architectural separation - the GPU node should provide storage services without becoming a structural component of the Pi cluster.
Storage Architecture Decision
Rather than integrating Velaryon into the existing storage ecosystem (ZFS replication, Ceph distributed storage), I opted for a simpler approach:
- Pure ext4: Single partition consuming the entire 1TB drive
- NFS export: Simple, performant network filesystem
- Subnet-wide access: Available to all 10.4.x.x nodes
This keeps the GPU node loosely coupled while providing the needed functionality.
Implementation
Drive Preparation
First, I cleared the old NTFS partitions and created a fresh GPT layout:
# Clear existing partition table
sudo wipefs -af /dev/nvme0n1
# Create new GPT partition table and single partition
sudo parted /dev/nvme0n1 mklabel gpt
sudo parted /dev/nvme0n1 mkpart primary ext4 0% 100%
# Format as ext4
sudo mkfs.ext4 -L gpu-storage /dev/nvme0n1p1
The resulting filesystem has UUID 5bc38d5b-a7a4-426e-acdb-e5caf0a809d9
and is mounted persistently at /mnt/gpu-storage
.
NFS Server Configuration
Velaryon was configured as an NFS server with a single export:
# /etc/exports
/mnt/gpu-storage 10.4.0.0/20(rw,sync,no_root_squash,no_subtree_check)
This grants read-write access to the entire infrastructure subnet with synchronous writes for data integrity.
Ansible Integration
Rather than manually configuring each node, I integrated the GPU storage into the existing Ansible automation:
Inventory Updates (inventory/hosts
):
nfs_server:
hosts:
allyrion: # Existing NFS server
velaryon: # New GPU storage server
Host Variables (inventory/host_vars/velaryon.yaml
):
nfs_exports:
- "/mnt/gpu-storage 10.4.0.0/20(rw,sync,no_root_squash,no_subtree_check)"
Global Configuration (group_vars/all/vars.yaml
):
nfs:
mounts:
primary: # Existing allyrion NFS share
share: "{{ hostvars[groups['nfs_server'] | first].ipv4_address }}:/mnt/usb1"
mount: '/mnt/nfs'
safe_name: 'mnt-nfs'
type: 'nfs'
options: {}
gpu_storage: # New GPU storage share
share: "{{ hostvars['velaryon'].ipv4_address }}:/mnt/gpu-storage"
mount: '/mnt/gpu-storage'
safe_name: 'mnt-gpu\x2dstorage' # Systemd unit name escaping
type: 'nfs'
options: {}
Systemd Automount Configuration
The trickiest part was configuring systemd automount units. Systemd requires special character escaping for mount paths - the mount point /mnt/gpu-storage
must use the unit name mnt-gpu\x2dstorage
(where \x2d
is the escaped dash).
Mount Unit Template (templates/mount.j2
):
[Unit]
Description=Mount {{ item.key }}
[Mount]
What={{ item.value.share }}
Where={{ item.value.mount }}
Type={{ item.value.type }}
{% if item.value.options -%}
Options={{ item.value.options | join(',') }}
{% else -%}
Options=defaults
{% endif %}
[Install]
WantedBy=default.target
Automount Unit Template (templates/automount.j2
):
[Unit]
Description=Automount {{ item.key }}
After=remote-fs-pre.target network-online.target network.target
Before=umount.target remote-fs.target
[Automount]
Where={{ item.value.mount }}
TimeoutIdleSec=60
[Install]
WantedBy=default.target
Deployment Playbook
A new playbook setup_gpu_storage.yaml
orchestrates the entire deployment:
---
# Setup GPU storage on Velaryon with NFS export
- name: 'Setup Velaryon GPU storage and NFS export'
hosts: 'velaryon'
become: true
tasks:
- name: 'Ensure GPU storage mount point exists'
ansible.builtin.file:
path: '/mnt/gpu-storage'
state: 'directory'
owner: 'root'
group: 'root'
mode: '0755'
- name: 'Check if GPU storage is mounted'
ansible.builtin.command:
cmd: 'mountpoint -q /mnt/gpu-storage'
register: gpu_storage_mounted
failed_when: false
changed_when: false
- name: 'Mount GPU storage if not already mounted'
ansible.builtin.mount:
src: 'UUID=5bc38d5b-a7a4-426e-acdb-e5caf0a809d9'
path: '/mnt/gpu-storage'
fstype: 'ext4'
opts: 'defaults'
state: 'mounted'
when: gpu_storage_mounted.rc != 0
- name: 'Configure NFS exports on Velaryon'
hosts: 'velaryon'
become: true
roles:
- 'geerlingguy.nfs'
- name: 'Setup NFS mounts on all nodes'
hosts: 'all'
become: true
roles:
- 'goldentooth.setup_nfs_mounts'
Usage
The GPU storage is now seamlessly integrated into the goldentooth CLI:
# Deploy/update GPU storage configuration
goldentooth setup_gpu_storage
# Enable automount on specific nodes
goldentooth command allyrion 'sudo systemctl enable --now mnt-gpu\x2dstorage.automount'
# Verify access (automounts on first access)
goldentooth command cargyll,bettley 'ls /mnt/gpu-storage/'
Results
The implementation provides:
- 1TB shared storage available cluster-wide at
/mnt/gpu-storage
- Automatic mounting via systemd automount on directory access
- Full Ansible automation via the goldentooth CLI
- Clean separation between Pi cluster and GPU node architectures
Data written from any node is immediately visible across the cluster, enabling seamless Pi-GPU workflows for machine learning datasets, model artifacts, and computational results.