Work-around for booting Fedora 28 on ZFS

This fresh horror came to my attention when I upgraded from FC27 to FC28 on a system booting from a zfs mirror. When the system rebooted after the upgrade, I saw this on the black screen of death:

error: file '/ROOT/fedora@/boot/vmlinuz-4.16.13-300.fc28.x86_64 not found.
error: you need to load the kernel first.

The problem turned out to be in the grub configuration file:

/boot/efi/EFI/fedora/grub.cfg

If you examine the file, all the linuxefi entries will look something like this:

...
    load_video
    set gfxpayload=keep
    insmod gzio
    insmod part_gpt
    insmod zfs
    if [ x$feature_platform_search_hint = xy ]; then
      search --no-floppy --fs-uuid --set=/dev/sdb2  5f34daa4b2ae5710
    else
      search --no-floppy --fs-uuid --set=/dev/sdb2 5f34daa4b2ae5710
    fi
    linuxefi /ROOT/fedora@/boot/vmlinuz-4.17.3-200.fc28.x86_64 root=ZFS=cool/ROOT/fedora ro rhgb quiet intel_iommu=on
    initrdefi /ROOT/fedora@/boot/initramfs-4.17.3-200.fc28.x86_64.img
...

The problem is in the search expression. The --set parameter value should be the name of a variable, not the first pool device. In this case, it should have been "root":

search --no-floppy --fs-uuid --set=root 5f34daa4b2ae5710

To get going temporarily, reboot and "catch" the boot process at the kernel selection screen. Select the first entry and type "e" to edit the script.

Scroll down and edit the search line in the "else" clause so it reads:

search --no-floppy --fs-uuid --set=root 5f34daa4b2ae5710

Obviously, use the UUID number shown in your case.

Now type "control c" to reboot.


Method 1 fix - Specify the grub boot device

Run the command:

zpool status

Note the name of the first device in your root pool. As an example let's assume you used device IDs so the name might be some mess like:

/dev/disk/by-id/wwn-0x5002538d415dc9b5-part2

Edit the file:

/etc/default/grub

Add this line: (using your device id)

GRUB_DEVICE_BOOT=/dev/disk/by-id/wwn-0x5002538d415dc9b5-part2

Now you can run zmogrify. Or if you're fixing something, just recreate the grub configuration file:

grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

Method 2 fix: Fuss with the grub2-tools package

If you do this, you won't have to define GRUB_DEVICE_BOOT as desribed above. But changing a script like this means you have to keep track of future updates that might change it back.

Edit the file:

/usr/share/grub/grub-mkconfig_lib

Locate the function prepare_grub_to_access_device. It begins like this:

prepare_grub_to_access_device ()
{
  local device=$1 && shift
  if [ "$#" -gt 0 ]; then
    local variable=$1 && shift
  else
    local variable=root
  fi
...

Change the first few lines to read:

prepare_grub_to_access_device ()
{
  local device=$1
  if [ "x${GRUB_ENABLE_BLSCFG}" = "xtrue" ] ; then
    local variable=boot
  else
    local variable=root
  fi
...

Now run grub2-mkconfig again:

grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

You should be good to go.

When performing "dnf update" in the future, pay attention if package grub2-tools is updated. If so, the fix will have to be installed again.