This article is a walk-through for installing Fedora linux with root on ZFS. It has been tested with:
Fedora 32, kernel-5.7.*, zfs-0.8.3
...
Fedora 39, kernel-6.10.10, zfs-2.2.7
Notes:
1) If running "dnf update" shows that both zfs and the kernel will be updated, it's best to cancel the update and do the zfs update by itself first. Then reboot and continue the update.
dnf update zfs zfs-dkms zfs-dracut
dnf update
2) Sometimes Fedora will introduce a kernel beyond what is currently supported by ZFS. When you see an update that includes a new kernel, check the ZFS on Linux website to make sure your prospective new kernel is supported.
Earlier (and unfortunately far more complex) versions of this document exist:
You can work with real hardware or a virtual machine. Some section names start with [RH] "Real hardware" or [VM] "Virtual machine" - they only apply to those respective cases. Everything else applies to both. If this is your first time, following the virtual machine path is good way to learn without commiting hardware or accidentally reformatting your working system disk.
You'll need a fedora linux system that has support for ZFS to follow this guide. After installing Fedora, visit the ZFS on Linux site and follow the instructions.
I suggest creating this system on a removable device and keeping it in a safe place because it's occasionally necessary to rescue root-on-zfs systems.
We will create a root-on-zfs operating system by running commands mostly in the host environment. But some steps have to taken inside the target which is done via the "chroot" command. But without additional configuration, many linux commands won't work inside a chroot. To fix that, we need special script, "zenter." Some-but-not-all linux distributions provide a command that does this. (Not Fedora...)
Here's the source. Save it in a file "zenter.sh" and proceed. (Or you can download zenter here.)
#!/bin/bash
# zenter - Mount system directories and enter a chroot
target=$1
mount -t proc proc $target/proc
mount -t sysfs sys $target/sys
mount -o bind /dev $target/dev
mount -o bind /dev/pts $target/dev/pts
chroot $target /bin/env -i \
HOME=/root TERM="$TERM" PS1='[\u@chroot \W]\$ ' \
PATH=/bin:/usr/bin:/sbin:/usr/sbin:/bin \
/bin/bash --login
echo "Exiting chroot environment..."
umount $target/dev/pts
umount $target/dev/
umount $target/sys/
umount $target/proc/
Install the script to a directory on your PATH:
cp -a zenter.sh /usr/local/sbin/zenter
Installation variables:
VER=34
POOL=Magoo
USER=hugh
PASW=mxyzptlk
NAME="Hugh Sparks"
Define a group of variables from one of the following two sections:
DEVICE=/dev/sda
PART1=1
PART2=2
PART3=3
The device name is only an example: when you add a physical disk, you must identify the new device name and carefully avoid blasting a device that's already part of your operating system.
IMPORTANT: Adding or removing devices can alter all device and partition names after reboot. This is why modern linux distributions avoid using them in places like fstab. We will convert device names to UUIDs as we proceed.
DEVICE=/dev/nbd0
PART1=p1
PART2=p2
PART3=p3
IMAGE=/var/lib/libvirt/images/$POOL.qcow2
In the virtual machine case, the device name will always be the same unless you're using nbd devices for some other purpose.
qemu-img create -f qcow2 ${IMAGE} 10G
modprobe nbd
qemu-nbd --connect=/dev/nbd0 ${IMAGE} -f qcow2
If your target disk was ever part of a zfs pool, you need to clear the label before you repartition the device. First list all partitions:
sgdisk -p $DEVICE
For each partition number "n" that has type BF01 "Solaris /usr & Mac ZFS", execute:
zpool labelclear -f ${DEVICE}n
If you suspect the whole disk (no partitions) was part of a zfs array, clear that label using:
zpool labelclear -f ${DEVICE}
This example uses a very simple layout: An EFI partition, a boot partition and a ZFS partition that fills the rest of the disk.
sgdisk -Z $DEVICE
sgdisk -n 1:0:+200Mib -t 1:EF00 -c 1:EFI $DEVICE
sgdisk -n 2:0:+500Mib -t 2:8300 -c 2:Boot $DEVICE
sgdisk -n 3:0:0 -t 3:BF01 -c 3:ZFS $DEVICE
mkfs.fat -F32 ${DEVICE}${PART1}
mkfs.ext4 ${DEVICE}${PART2}
zpool create $POOL -m none ${DEVICE}${PART3} -o ashift=12 -o cachefile=none
This is a very simple layout that has no redundancy. For a production system, you would create a mirror, raidz array or some combination. These topics are covered on many websites such as ZFS Without Tears
If for some reason you want to keep using a system with one device, adding the following option to zpool create will give you 2x redundancy (and half the space):
-o copies=2
zfs set compression=on $POOL
zfs set atime=off $POOL
zpool export $POOL
udevadm trigger --settle
zpool import $POOL -d /dev/disk/by-uuid -o altroot=/target -o cachefile=none
zfs create $POOL/fedora -o xattr=sa -o acltype=posixacl
zfs create $POOL/fedora/var -o exec=off -o setuid=off -o canmount=off
zfs create $POOL/fedora/var/cache
zfs create $POOL/fedora/var/log
zfs create $POOL/fedora/var/spool
zfs create $POOL/fedora/var/lib -o exec=on
zfs create $POOL/fedora/var/tmp -o exec=on
zfs create $POOL/www -o exec=off -o setuid=off
zfs create $POOL/home -o setuid=off
zfs create $POOL/root
The motivation for using multiple datasets is similar to the reason more conventional systems use multiple LVM volumes:
zfs set mountpoint=/ $POOL/fedora
zfs set mountpoint=/var $POOL/fedora/var
zfs set mountpoint=/var/www $POOL/www
zfs set mountpoint=/home $POOL/home
zfs set mountpoint=/root $POOL/root
The reason for using ZFS mountpoints during installation is to avoid modifying the host system's fstab and to smooth the transition to the chroot environment for the final installation steps.
Later we'll switch to legacy mountpoints. During Fedora updates or upgrades, files sometimes get saved in mountpoint directories before ZFS gets around to mounting the datasets at boot time. This is a catastrophe because datasets can't be mounted on non-empty directories. The files they contain will become invisible and the system will fail to boot or exhibit bizarre symptoms. Fedora's update scripts know about fstab and make sure things are mounted at the right time. Hence we must accommodate.
zfs set com.sun:auto-snapshot=false $POOL/fedora/var/tmp
zfs set com.sun:auto-snapshot=false $POOL/fedora/var/cache
When com.sun:auto-snapshot=false, 3rd party snapshot software is supposed to exclude the dataset. Otherwise all datasets are included in snapshots.
This is an example of a user-created property. ZFS itself doesn't attach any meaning to such properties. They conventionally have "owned" names based on DNS to avoid conflicts.
mkdir /target/boot
mount -U `lsblk -nr ${DEVICE}${PART2} -o UUID` /target/boot
rm -rf /target/boot/*
mkdir /target/boot/efi
mount -U `lsblk -nr ${DEVICE}${PART1} -o UUID` /target/boot/efi -o umask=0077,shortname=winnt
rm -rf /target/boot/efi/*
The "rm -f" expressions are there in case you're repeating these instructions on a previously partitioned device where an operating system was installed.
dnf install -y --installroot=/target --releasever=$VER \
@minimal-environment \
kernel kernel-modules kernel-modules-extra \
grub2-efi-x64 shim-x64 mactel-boot
Optional: Add your favorite desktop environment to the list e.g. @cinnamon-desktop.
UPDATE: A few errors/warnings will be reported because some of the grub2 components expect the system to be live. This gets resolved in a later step.
dnf install -y --installroot=/target --releasever=$VER \
http://download.zfsonlinux.org/fedora/zfs-release.fc$VER.noarch.rpm
dnf install -y --installroot=/target --releasever=$VER zfs zfs-dracut
cat > /target/etc/resolv.conf <<-EOF
search csparks.com
nameserver 192.168.1.2
EOF
(Be yourself.)
You may object that NetworkManager likes to use a symbolic link here that vectors off into NetworkManager Land. This concept has caused numerous boot failures on most of the systems I manage because of permission problems in the target directory. These can be corrected by hand, but I've had an easier life since I took over this file and used the traditional contents. Your mileage may vary. Someday Fedora will correct the problem. If you're in the mood to find out, don't create this file.
cat > /target/etc/profile.d/grub2_zpool_fix.sh <<-EOF
export ZPOOL_VDEV_NAME_PATH=YES
EOF
cat > /target/etc/dracut.conf.d/fs.conf <<-EOF
filesystems+=" virtio_blk "
EOF
Keep the spaces around virtio_blk!
cat > /target/etc/default/zfs <<-EOF
ZPOOL_CACHE="none"
ZPOOL_IMPORT_OPTS="-o cachefile=none"
EOF
cat > /target/etc/default/grub <<-EOF
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=Fedora
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT=console
GRUB_DISABLE_RECOVERY=true
GRUB_DISABLE_OS_PROBER=true
GRUB_PRELOAD_MODULES=zfs
GRUB_ENABLE_BLSCFG=false
EOF
We're going to switch to BLS later.
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /target/etc/selinux/config
chroot /target zgenhostid
chroot /target useradd $USER -c "$NAME" -G wheel
echo "$USER:$PASW" | chpasswd -R /target
systemd-firstboot \
--root=/target \
--locale=C.UTF-8 \
--keymap=us \
--hostname=$POOL \
--setup-machine-id
cat > /target/etc/fstab <<-EOF
UUID=`lsblk -nr ${DEVICE}${PART2} -o UUID` /boot ext4 defaults 0 0
UUID=`lsblk -nr ${DEVICE}${PART1} -o UUID` /boot/efi vfat umask=0077,shortname=winnt 0 2
$POOL/fedora/var/cache /var/cache zfs defaults 0 0
$POOL/fedora/var/lib /var/lib zfs defaults 0 0
$POOL/fedora/var/log /var/log zfs defaults 0 0
$POOL/fedora/var/spool /var/spool zfs defaults 0 0
$POOL/fedora/var/tmp /var/tmp zfs defaults 0 0
$POOL/www /var/www zfs defaults 0 0
$POOL/home /home zfs defaults 0 0
$POOL/root /root zfs defaults 0 0
EOF
zfs set mountpoint=legacy $POOL/fedora/var
zfs set mountpoint=legacy $POOL/www
zfs set mountpoint=legacy $POOL/home
zfs set mountpoint=legacy $POOL/root
zenter /target
mount -a
source /etc/profile.d/grub2_zpool_fix.sh
Running grub2-mkconfig will fail without this definition. It will always be defined after logging into the target, but we're not there yet.
grub2-mkconfig -o /etc/grub2-efi.cfg
grub2-switch-to-blscfg
systemctl disable zfs-import-cache
systemctl enable zfs-import-scan
kver=`rpm -q --last kernel | sed '1q' | sed 's/kernel-//' | sed 's/ .*$//'`
zver=`rpm -q zfs | sed 's/zfs-//' | sed 's/\.fc.*$//' | sed 's/-[0-9]//'`
zver=`echo $zver | sed 's/-rc[0-9]//'`
dkms install -m zfs -v $zver -k $kver
dracut -fv --kver $kver
umount /boot/efi
umount /boot
exit
zpool export $POOL
It works!
qemu-nbd --disconnect /dev/nbd0
If you forget to disconnect the nbd device, the virtual machine won't be able to access the virtual disk.
virt-install \
--name=$POOL \
--os-variant=fedora$VER \
--vcpus=4 \
--memory=32000 \
--boot=uefi \
--disk path=$IMAGE,format=qcow2 \
--import \
--noreboot \
--noautoconsole \
--wait=-1
You only need to do this once. By replacing the disk image file, other configurations can be tested using the same vm.
Use the VirtManager GUI or:
virsh start $POOL
virt-viewer $POOL
Things to do after you've successfully logged in.
timedatectl set-timezone America/Chicago
timedatectl set-ntp true
hostnamectl set-hostname magoo
I detest superstitions, gratuitous complications, obscure writing, and bugs. If you get stuck or if your understanding exceeds mine, please share your thoughts. (I like to hear good news too.)
The cardinal rule when running "dnf update" is to check for the situation where both the kernel and zfs will be updated at the same time. Cancel the update and instead update zfs by itself. Then update the rest and reboot.
dnf update zfs zfs-dkms zfs-dracut
dnf update
If you forget to do this, all is not lost: Run this script to build and install zfs in the new kernel.
If you're rash enough to be booting Fedora on ZFS in a production system, it's almost imperative that you maintain a simple virtual machine in parallel. When you see that updates are available, clone the VM and update that first. If it won't boot, attempt your fixes there. If all else fails, freeze kernel updates on your production system and wait for better times. See Appendix - Freeze kernel updates )
With UEFI motherboards, the only way to "ZFS purity" is to put your EFI partition on a separate device, rather than on a partition of a device that also has all or part of a ZFS pool. It's also possible to do away with the ext4 /boot partition by keeping it in a dataset, but this will put you into contention with the "pool features vs grub supported features" typhoon of uncertainty. (See Grub-compatible pool creation.)
A better way, in my opinion, is to use a small SSD with both EFI and boot partitions. The ZFS pool for the rest of the operating system can be assembled from disks without partitions, "whole disks", which most ZFS pundits recommend. This example doesn't follow that advice because it's intended to be a simplified tutorial.
If you still want to have /boot on ZFS, it's necessary to add the grub2 zfs modules to the efi partition:
dnf install grub2-efi-x64-modules
mkdir -p /target/boot/efi/EFI/fedora/x86_64-efi
cp -a /target/usr/lib/grub/x86_64-efi/zfs* /target/boot/efi/EFI/fedora/x86_64-efi
The zfs.mod file in that collection does not support all possible pool features, but it will work if you find a compromise. Currently, the zfs.mod with Fedora32 will handle a ZFS pool with default "compression=on" settings created using zfs-0.8.4.
By far the best way to fix boot problems it to avoid them by recognizing problematic situations before you reboot after an update.
Before rebooting after an update, check to make sure that a new initramfs was created in the /boot directory. Then check that file to make sure it contains a zfs module:
cd /boot
ls -lrt
The commands above will list the contents of /boot such that the last file listed is the newest. It should be the initramfs file with the current date and most recent kernel version. Example:
...
initramfs-5.13.8-200.fc34.x86_64.img
Now list the contents of the initramfs and check for zfs:
lsinitrd initramfs-5.13.8-200.fc34.x86_64.img | grep zfs.ko
If zfs.ko is present, you are probably good to go for a reboot.
If zfs.ko is not present, run this script to build and install the zfs modules.
You reboot and get the Black Screen of Death.
You'll need a thumb drive or other detachable device that has a linux system and ZFS support. Boot the device.
zpool import -f $POOL -o altroot=/target
zenter /target
mount -a
dnf reinstall zfs-dkms
If you see errors from dkms, you'll probably have have to revert to an earlier kernel and/or version of zfs. Such problems are temporary and rare.
First make sure you're running in chroot (zenter) and that the right /boot/efi partition is mounted:
df -h
Next run:
rm -rf /boot/efi/*
dnf reinstall grub2-efi-x64 shim-x64 fwupdate-efi mactel-boot
Edit /etc/default/grub and disable BLS:
...
GRUB_ENABLE_BLSCFG=false
...
Then run:
grub2-mkconfig -o /etc/grub2-efi.cfg
grub2-switch-to-blscfg
rm -f /etc/zfs/zfs.cache
This thing has a way of rising from the dead..
kver=`rpm -q --last kernel | sed '1q' | sed 's/kernel-//' | sed 's/ .*$//'`
dracut -fv --kver $kver
umount /boot/efi
exit
zpool export $POOL
Visit the ZFS Issue Tracker and see what others discover. If your problem is unique, join up and post a question.
This script builds zfs into the most recently installed kernel, which may not be the running kernel. It also updates initramfs.
#!/bin/sh
# zfsupdate.sh - Build and install zfs modules
# 2020-08-11
# Exit on error
set -o errexit
set -o pipefail
set -o nounset
# Get version number of newest kernel
kver=`rpm -q --last kernel \
| sed '1q' \
| sed 's/kernel-//' \
| sed 's/ .*$//'`
# Get version number of newest zfs
zver=`rpm -q zfs \
| sed 's/zfs-//' \
| sed 's/\.fc.*$//' \
| sed 's/-[0-9]//'`
# Install the new zfs module
dkms install -m zfs -v $zver -k $kver
# Build initrd
dracut -fv --kver $kver
# EOF
If you discover that you can't build the zfs modules for a new kernel, you'll have to use your recovery device and revert. (Or use a virtual machine to find out without blowing yourself up.)
Once you've got your system running again, you can "version lock" the kernel packages. This will allow other fedora updates to proceed, but hold the kernel at the current version:
dnf versionlock add kernel-`uname -r`
dnf versionlock add kernel-core-`uname -r`
dnf versionlock add kernel-devel-`uname -r`
dnf versionlock add kernel-modules-`uname -r`
dnf versionlock add kernel-modules-extra-`uname -r`
dnf versionlock add kernel-headers-`uname -r`
When it's safe to allow kernel updates, you can release all locks using the expression:
dnf versionlock clear
If you have locks on other packages and don't want to clear all of them, you can release only the previous kernel locks:
dnf versionlock delete kernel-`uname -r`
dnf versionlock delete kernel-core-`uname -r`
dnf versionlock delete kernel-devel-`uname -r`
dnf versionlock delete kernel-modules-`uname -r`
dnf versionlock delete kernel-modules-extra-`uname -r`
dnf versionlock delete kernel-headers-`uname -r`
The screen is mostly black with plain text. You see:
[ OK ] Started Emergency Shell.
[ OK ] Reached target Emergency Mode.
This is the Black Screen Of Dracut.
You'll be invited to run journalctl which will list the whole boot sequence. Near the end, carefully inspect lines that mention ZFS. There are three common cases:
1) Journal entry looks like this:
systemd[1]: Failed to start Import ZFS pools by cache file.
You are a victim of the Abominable Cache File. The fix is easy. Boot your recovery device, enter the target, and follow the section that deals with getting rid of the cache file in Appendix - Fix boot problems.
2) Journal entry looks like this:
...
Starting Import ZFS pools by device scanning...
cannot import 'Magoo': pool was previously in use from another system.
You probably forget to export the pool after tampering with it from another system. (Such as when you previously used the recovery device.) You can fix the problem from the emergency shell:
zpool import -f myPool -N
zpool export myPool
reboot
3) If you see messages about not being able to load the zfs modules, that may be normal because it takes several tries during the boot sequence. But if ends up being unable to load the modules, try this:
modprobe zfs
If that fails, the zfs modules were never built or they were left out of the initramfs. To fix that, go through the entire sequence describe in Appendix - Fix boot problems.
If you can execute the modprobe sucessfully, you should try the next fix:
During boot, it's normal to see a few entries like this in the journal:
dracut-pre-mount[508]: The ZFS modules are not loaded.
dracut-pre-mount[508]: Try running '/sbin/modprobe zfs' as root to load them.
But if the zfs modules aren't loaded by the time dracut wants to mount the root filesystem, the boot will fail. This problems was reported in 2019 ZOL 0.8 Not Loading Modules or ZPools on Boot #8885. I never saw this until I tried to boot a fast flash drive on a slow computer. Since I knew the flash drive worked on other machines, I was surprised to see The Black Screen Of Dracut.
Here's a fix you can apply when your root-on-zfs device is mounted for repair on /target:
mkdir /target/etc/systemd/system/systemd-udev-settle.service.d
cat > /target/etc/systemd/system/systemd-udev-settle.service.d/override.conf <<-EOF
[Service]
ExecStartPre=/usr/bin/sleep 5
EOF
A black screen with an enigmatic prompt:
grub>
This is the Dread Prompt Of Grub.
Navigating this little world merits a separate document Grub Expressions. A nearly-foolproof solution is to run through Appendix - Fix boot problems. Pay particular attention to the step where the entire /boot/efi partition is recreated.
Using a zvol for swapping is problematic. (as of 2020-08, zfs 0.8.4) If you feel the urge to try, first read the swap deadlock thread.
Sooner or later, the issues will be fixed. (Maybe now?) Here's how to try it out:
zfs create $POOL/swap \
-o volsize=4G \
-o volblocksize=4k \
-o compression=zle \
-o refreservation=4.13G \
-o primarycache=metadata \
-o secondarycache=none \
-o logbias=throughput \
-o sync=always \
-o com.sun:auto-snapshot=false
...
/dev/zvol/pool/swap none swap defaults 0 0
...
After you're running the target, enable swapping:
swapon -av
This setting is remembered so swapping will operate after reboot.
Don't enable hibernation. It tries to use swap space for the memory image but the dataset is not available early enough in the boot process.