NFS Mounts
Now that Kubernetes is kinda squared away, I'm going to set up NFS mounts on the cluster nodes.
For the sake of simplicity, I'll just set up the mounts on every node, including the load balancer (which is currently exporting the share).
Implementation Architecture
Systemd-Based Mounting
Rather than using traditional /etc/fstab
entries, I implemented NFS mounting using systemd mount and automount units. This approach provides several advantages:
- Dynamic mounting: Automount units mount filesystems on-demand
- Service management: Standard systemd service lifecycle management
- Dependency handling: Proper ordering with network services
- Logging: Integration with systemd journal for troubleshooting
Global Configuration
The NFS mount configuration is defined in group_vars/all/vars.yaml
:
nfs:
server: "{{ groups['nfs_server'] | first}}"
mounts:
primary:
share: "{{ hostvars[groups['nfs_server'] | first].ipv4_address }}:/mnt/usb1"
mount: '/mnt/nfs'
safe_name: 'mnt-nfs'
type: 'nfs'
options: {}
This configuration:
- Dynamically determines NFS server: Uses first host in
nfs_server
group (allyrion) - IP-based addressing: Uses
10.4.0.10:/mnt/usb1
for reliable connectivity - Standardized mount point: All nodes mount at
/mnt/nfs
- Safe naming: Provides
mnt-nfs
for systemd unit names
Systemd Template Implementation
Mount Unit Template
The mount service template (templates/mount.j2
) creates individual systemd mount units:
[Unit]
Description=Mount {{ item.key }}
[Mount]
What={{ item.value.share }}
Where={{ item.value.mount }}
Type={{ item.value.type }}
Options={{ item.value.options | join(',') }}
[Install]
WantedBy=default.target
This generates a unit file at /etc/systemd/system/mnt-nfs.mount
with:
- What:
10.4.0.10:/mnt/usb1
(NFS export path) - Where:
/mnt/nfs
(local mount point) - Type:
nfs
(filesystem type) - Options: Default NFS mount options
Automount Unit Template
The automount template (templates/automount.j2
) provides on-demand mounting:
[Unit]
Description=Automount {{ item.key }}
After=remote-fs-pre.target network-online.target network.target
Before=umount.target remote-fs.target
[Automount]
Where={{ item.value.mount }}
[Install]
WantedBy=default.target
Key features:
- Network dependencies: Waits for network availability before attempting mounts
- Lazy mounting: Only mounts when the path is accessed
- Proper ordering: Correctly sequences with system startup and shutdown
Deployment Process
Ansible Role Implementation
The goldentooth.setup_nfs_mounts
role handles the complete deployment:
- name: 'Generate mount unit for {{ item.key }}.'
ansible.builtin.template:
src: 'mount.j2'
dest: "/etc/systemd/system/{{ item.value.safe_name }}.mount"
mode: '0644'
loop: "{{ nfs.mounts | dict2items }}"
notify: 'reload systemd'
- name: 'Generate automount unit for {{ item.key }}.'
ansible.builtin.template:
src: 'automount.j2'
dest: "/etc/systemd/system/{{ item.value.safe_name }}.automount"
mode: '0644'
loop: "{{ nfs.mounts | dict2items }}"
notify: 'reload systemd'
Service Management
The role ensures proper service lifecycle:
- name: 'Enable and start automount services.'
ansible.builtin.systemd:
name: "{{ item.value.safe_name }}.automount"
enabled: true
state: started
daemon_reload: true
loop: "{{ nfs.mounts | dict2items }}"
Network Integration
Client Targeting
The NFS mounts are deployed across the entire cluster:
Target Hosts: All cluster nodes (hosts: 'all'
)
- 12 Raspberry Pi nodes: allyrion, bettley, cargyll, dalt, erenford, fenn, gardener, harlton, inchfield, jast, karstark, lipps
- 1 x86 GPU node: velaryon
Including NFS Server: Even allyrion (the NFS server) mounts its own export, providing:
- Consistent access patterns: Same path (
/mnt/nfs
) on all nodes - Testing capability: Server can verify export functionality
- Simplified administration: Uniform management across cluster
Network Configuration
Infrastructure Network: All communication occurs within the trusted 10.4.0.0/20
CIDR
NFS Protocol: Standard NFSv3/v4 with default options
Firewall: No additional firewall rules needed within cluster network
Directory Structure and Permissions
Mount Point Creation
- name: 'Ensure mount directories exist.'
ansible.builtin.file:
path: "{{ item.value.mount }}"
state: directory
mode: '0755'
loop: "{{ nfs.mounts | dict2items }}"
Shared Directory Usage
The NFS mount serves multiple cluster functions:
Slurm Integration:
slurm_nfs_base_path: "{{ nfs.mounts.primary.mount }}/slurm"
Common Patterns:
/mnt/nfs/slurm/
- HPC job shared storage/mnt/nfs/shared/
- General cluster shared data/mnt/nfs/config/
- Configuration file distribution
Command Line Integration
goldentooth CLI Commands
# Configure NFS mounts on all nodes
goldentooth setup_nfs_mounts
# Verify mount status
goldentooth command all 'systemctl status mnt-nfs.automount'
goldentooth command all 'df -h /mnt/nfs'
# Test shared storage
goldentooth command allyrion 'echo "test" > /mnt/nfs/test.txt'
goldentooth command bettley 'cat /mnt/nfs/test.txt'
Troubleshooting and Verification
Service Status Verification
# Check automount service status
systemctl status mnt-nfs.automount
# Check mount service status (after access)
systemctl status mnt-nfs.mount
# View mount information
mount | grep nfs
df -h /mnt/nfs
Common Issues and Solutions
Network Dependencies: The automount units properly wait for network availability through After=network-online.target
Permission Issues: The NFS export uses no_root_squash
, allowing proper root access from clients
Mount Persistence: Automount units ensure mounts survive reboots and network interruptions
Security Considerations
Trust Model
Internal Network Security: Security relies on the trusted cluster network boundary
No User Authentication: Uses IP-based access control rather than user credentials
Root Access: no_root_squash
on server allows administrative operations
Future Enhancements
The current implementation could be enhanced with:
- Kerberos authentication for user-based security
- Network policies for additional access control
- Encryption in transit for sensitive data protection
Integration with Storage Evolution
Note: This NFS mounting system provides the foundation for shared storage. As documented in Chapter 050, the cluster later evolves to include ZFS-based storage with replication, while maintaining compatibility with these NFS mount patterns.
This in itself wasn't too complicated, but I created two template files (one for a .mount service, another for a .automount service), fought with the variables for a bit, and it seems to work. The result is robust, cluster-wide shared storage accessible at /mnt/nfs
on every node.