Introductory Context
This article chronicles a technical troubleshooting session involving NixOS, a distinct Linux distribution known for its unique approach to system management. Unlike traditional systems, NixOS uses a declarative configuration model, meaning the entire system setup (packages, settings, services) is defined in configuration files. When the system is updated, NixOS builds a new, complete version of the system, referred to as a “generation.” This allows users to easily switch between different system versions or roll back to a previous state if an update causes problems.
A critical part of the boot process involves placing essential files, like the operating system kernel and an initial ramdisk (initrd), onto a special, often small, partition designated for booting (commonly the EFI System Partition or ESP). This article details a situation where this boot partition became completely full during a routine system upgrade, preventing the process from completing. It explores why the standard cleanup methods failed and the steps taken to diagnose and manually resolve a less common underlying issue related to leftover boot files.
The Case of the Full Boot Partition: A NixOS Upgrade Saga
Okay, let’s capture that troubleshooting adventure.
It started, as many NixOS adventures do, with a routine command: sudo nixos-rebuild switch --upgrade
. The goal was simple: update the system to the latest available packages and configuration. However, the familiar process ground to a halt, not with a clear build failure, but with a cryptic Python traceback culminating in a stark operating system error.
The Symptom: No Room at the Inn
The terminal presented a wall of text, but the critical lines pointed directly at the problem:
OSError: [Errno 28] No space left on device: '/nix/store/15zmjqqrdhnffii2kykhx1sfgck92xfn-initrd-linux-6.12.23/initrd' -> '/boot/EFI/nixos/15zmjqqrdhnffii2kykhx1sfgck92xfn-initrd-linux-6.12.23-initrd.efi'
Failed to install bootloader
warning: error(s) occurred while switching to the new configuration
The message [Errno 28] No space left on device
was unambiguous. The system couldn’t complete the upgrade because it failed during a crucial step: copying the newly built initial ramdisk (initrd) file to the boot partition, specifically /boot/EFI/nixos/
. This directory is where NixOS, when configured with the common systemd-boot loader, stores the kernel and initrd .efi
files needed to actually boot each system generation. Without this copy, the upgrade couldn’t activate.
Initial Diagnosis: A Full /boot Partition
In the world of NixOS, where rollbacks are a key feature, the system keeps multiple generations available. Each generation typically requires its own kernel and initrd copy in /boot
. Over time, especially on systems with a relatively small /boot
partition (often the EFI System Partition, or ESP), these files can accumulate.
A quick check confirmed this suspicion:
[user@nixos:~/repos/nixos]$ df -h /boot
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 511M 511M 0 100% /boot
The /boot
partition (/dev/sda1
), sized at 511MB, was completely full – 100% usage with zero bytes available. This definitively explained the OSError: [Errno 28]
.
The Standard Fix Fails: A Deeper Mystery
The standard operating procedure in NixOS for a full /boot
partition is to clean up old, unused generations. This not only removes references from the Nix store but crucially also removes the corresponding kernel and initrd files from /boot
. The command for this is sudo nix-collect-garbage -d
.
We ran the command:
[user@nixos:~/repos/nixos]$ sudo nix-collect-garbage -d
removing old generations of profile /nix/var/nix/profiles/system
...
0 store paths deleted, 0.00 MiB freed
Surprisingly, while the command acknowledged removing old generation references, it freed no space in the Nix store, and more importantly, a subsequent check showed /boot
was still completely full:
[user@nixos:~/repos/nixos]$ df -h /boot
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 511M 511M 0 100% /boot
This deviation from the norm indicated a more complex issue. Why wasn’t the standard cleanup working?
Unraveling the Complexity: Orphaned Boot Files
To understand why the cleanup failed, we needed more data:
- How many generations actually existed?
[user@nixos:~/repos/nixos]$ sudo nix-env --list-generations --profile /nix/var/nix/profiles/system 572 2025-04-18 16:58:23 (current)
This was the smoking gun. Only one generation, the current one (572), was registered. The
nix-collect-garbage -d
command had nothing old to delete according to the system profile! - What was actually in
/boot/EFI/nixos/
?[user@nixos:~/repos/nixos]$ ls -lh /boot/EFI/nixos/ total 511M -rwxr-xr-x 1 root root 132M Apr 18 17:25 15z...initrd-linux-6.12.23-initrd.efi -rwxr-xr-x 1 root root 172M Apr 18 17:25 1ah...initrd-linux-6.12.23-initrd.efi -rwxr-xr-x 1 root root 12M Apr 18 17:25 7fl...linux-6.12.23-bzImage.efi -rwxr-xr-x 1 root root 12M Apr 18 17:25 l7v...linux-6.12.22-bzImage.efi -rwxr-xr-x 1 root root 12M Apr 18 17:25 x77...linux-6.12.23-bzImage.efi -rwxr-xr-x 1 root root 172M Apr 18 17:25 zvb...initrd-linux-6.12.22-initrd.efi
Despite having only one generation registered, the boot directory contained multiple kernels (versions 6.12.22 and 6.12.23) and multiple initrds, totalling ~512MB – precisely the size of the partition.
The conclusion was clear: Previous nixos-rebuild
operations, likely when deleting older generations, had failed to remove their corresponding files from /boot/EFI/nixos/
. These files were now orphaned – occupying critical space but not linked to any deletable generation in the system profile. The standard cleanup tools were powerless against them.
The Careful Path vs. The Simple Request
The only way forward was manual intervention. The proposed plan involved careful, step-by-step identification and deletion:
- Examine the systemd-boot configuration files (
/boot/loader/entries/*.conf
). - Identify the exact
*.efi
files required by the current generation (572). - Manually use
sudo rm
to delete only the files in/boot/EFI/nixos/
that were not required by generation 572.
This process, while safe, felt laborious. The user, understandably, pushed back: “Surely you can give me a simplified command to do this.”
A Simpler Command, Mitigated Risk
Automating file deletion in /boot
is inherently risky. A single mistake could render the system unbootable. However, acknowledging the user’s request, a complex, single-line command was constructed. This command attempted to:
- Programmatically extract the required
.efi
filenames from the/boot/loader/entries/*.conf
files. - List all
.efi
files currently present in/boot/EFI/nixos/
. - Compare the two lists and identify the files present but not required (the orphans).
Crucially, risk mitigation was paramount. Instead of providing a command that immediately deleted files, a dry-run version was offered first:
comm -13 \
<(grep -h -E '^\s*(linux|initrd)\s+/EFI/nixos/' /boot/loader/entries/*nixos*.conf 2>/dev/null | sed 's|.*/EFI/nixos/||' | sort -u) \
<(find /boot/EFI/nixos/ -maxdepth 1 -name '*.efi' -printf '%f\n' | sort) \
| sed 's|^|/boot/EFI/nixos/|' \
| xargs -d '\n' --no-run-if-empty echo "Would delete:"
This command performs the entire detection logic but replaces the final rm
action with echo
, simply printing the files it would have deleted.
Execution and Success
The user ran the dry-run command, which correctly identified two files as potential orphans:
Would delete: /boot/EFI/nixos/15zmjqqrdhnffii2kykhx1sfgck92xfn-initrd-linux-6.12.23-initrd.efi /boot/EFI/nixos/x77mzlpif1alj4i1wnwbfla81nqqhzy0-linux-6.12.23-bzImage.efi
Confident in the dry run’s output, the user proceeded to execute the command with the echo
replaced by sudo rm --verbose
:
# --- WARNING: THIS DELETES THE FILES LISTED ABOVE ---
comm -13 \
<(grep -h -E '^\s*(linux|initrd)\s+/EFI/nixos/' /boot/loader/entries/*nixos*.conf 2>/dev/null | sed 's|.*/EFI/nixos/||' | sort -u) \
<(find /boot/EFI/nixos/ -maxdepth 1 -name '*.efi' -printf '%f\n' | sort) \
| sed 's|^|/boot/EFI/nixos/|' \
| xargs -d '\n' --no-run-if-empty sudo rm --verbose
# --- WARNING: THIS DELETES THE FILES LISTED ABOVE ---
removed '/boot/EFI/nixos/15zmjqqrdhnffii2kykhx1sfgck92xfn-initrd-linux-6.12.23-initrd.efi'
removed '/boot/EFI/nixos/x77mzlpif1alj4i1wnwbfla81nqqhzy0-linux-6.12.23-bzImage.efi'
Success! The orphaned files were removed. A final check confirmed the result:
[user@nixos:~/repos/nixos]$ df -h /boot
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 511M 368M 144M 72% /boot
With 144MB of space freed, the original command was retried:
[user@nixos:~/repos/nixos]$ sudo nixos-rebuild switch --upgrade
unpacking 2 channels...
building Nix...
building the system configuration...
... # Command proceeds successfully
The upgrade process continued without the “No space left on device” error, finally completing the task.
Conclusion
This troubleshooting journey highlighted a specific edge case in NixOS boot management where orphaned files can fill the /boot
partition, rendering standard cleanup tools ineffective. Careful diagnosis, involving checking actual partition contents and comparing them against system state (generations) and bootloader configuration, was key to understanding the root cause. While automation offers convenience, the incident underscores the importance of caution and verification, especially via dry runs, when dealing with critical system areas like the boot partition. The path from cryptic error to successful upgrade involved detailed inspection, understanding system internals, and carefully executing targeted manual intervention, ultimately restoring the system to its normally smooth operational state.
AI Analysis
Title/Headline Ideas:
- NixOS Upgrade Blocked: Solving the Full /boot Partition Mystery
- When
nix-collect-garbage
Fails: Debugging Orphaned EFI Files in NixOS /boot - Troubleshooting
OSError: [Errno 28] No space left on device
on NixOS /boot - NixOS Boot Partition Cleanup: A Case of Orphaned Kernels and Initrds
- Manual Intervention Required: Fixing a Stubbornly Full NixOS Boot Partition
Strengths:
- Detailed Process: Provides a clear, step-by-step account of the diagnosis and resolution.
- Authenticity: Includes actual commands used and their terminal output, adding credibility and practical value.
- Specific Problem Focus: Addresses a concrete, real-world technical issue (
/boot
full, standard fix fails). - Highlights Edge Case: Explains a scenario (orphaned boot files) that deviates from standard NixOS behavior/fixes.
- Emphasis on Safety: Demonstrates caution by using a dry-run command before actual deletion.
Weaknesses:
- Assumed Knowledge: Heavily relies on reader familiarity with NixOS concepts (generations, rebuild, store paths), Linux file systems (
/boot
, ESP), and bootloaders (systemd-boot, EFI). - Jargon-Heavy: Uses technical terms without extensive definition, potentially alienating beginners.
- Lack of Broader Context: Doesn’t explain why the files might have become orphaned in the first place.
- Log Format: Reads like a personal technical log, potentially lacking the polished structure or background of a formal tutorial.
AI Opinion:
This article is a valuable technical log detailing the diagnosis and resolution of a specific and non-trivial NixOS issue. Its clarity is high for users already familiar with NixOS and Linux system administration, providing a practical, command-level guide to solving the “full /boot partition with orphaned files” problem. The inclusion of actual commands and the emphasis on a safe dry run significantly increase its usefulness for the target audience. However, its lack of introductory explanations and reliance on jargon make it less accessible to those unfamiliar with the NixOS ecosystem or advanced Linux concepts. Overall, it’s a high-quality piece of troubleshooting documentation for a specific technical niche.