image of the three stooges

Fixing my broken Fedora Linux

I run my blogs on a little web server in my basement. This is great because it means I control exactly how my web server is configured. It is also horrible because I control exactly how my web server is configured.

In simple terms, I screwed up. Some years back I configured Fedora to perform automatic updates after updating it to the then-current Fedora Core version 34. I thought automatic updates would keep the OS current. This was not correct.

Worse yet, my server wasn’t even actually running the Fedora core version 34 kernel: it was running FC kernel version 27. Fixing this took me far longer than it should have.

But I did finally fix it. This post explains a bit of how I resolved the issue, but for those in a rush: follow the instructions in the Fedora core documentation, including the ‘optional’ guidance to upgrade the GRUB boot loader. That bit about updating GRUB is not really optional for some releases.

If you are not in a hurry and want some details on how things can go wrong, then read on! The stooges would be proud of me…

Identifying the problem

I have become complacent in recent years, reliant on the kindness of operating system vendors to keep my systems up to date with minimal effort. I was so happy to find that Fedora had a good OS upgrade facility as part of DNF that I even wrote a post about it.

Even better, Fedora had an automation facility for DNF updates called dnf-automatic that I was able to enable. I foolishly thought that my need to manually perform Linux OS updates was in the past.

I should note here that my web server normally runs 'headless': no display, no GUI, no keyboard, no mouse.  Everything I do on it is via the command line and a terminal session.   I am aware that the various Linux GUIs have user friendly upgrade and update automation tools.  But these tools are not ones I use.

Several versions behind: updates versus upgrades

But the eagle-eyed would notice that there are two different words in those paragraphs above: upgrades and updates. That’s a problem right there, and one I should have perceived.

By Fedora’s definition, an update installs changes within a given Fedora core release: dnf-automatic does updates only. This means that security patches and the like for Fedora version X are applied, but upgrades to Fedora version X+1 are not.

Upgrades are version increments, and will not be performed by the provided automation. So my server was never upgraded beyond version 34. Worse yet: Fedora only provides updates for the two most current releases, so once you get two releases behind you no longer receive security patches. And Fedora produces a new release about every six months, so falling off the update train happens pretty quickly.

The fact that my system was several Fedora releases behind was the first thing I detected while examining my machine. But that was just the start.

Kernel even further behind

I performed a manual upgrade using DNF after discovering the update versus upgrade problem. DNF makes this pretty simple:

# dnf --refresh upgrade
# dnf system-upgrade download --releasever=<targetVersion> --skip-broken --allowerasing
# dnf system-upgrade reboot

The last step is supposed to complete the upgrade including installing the new OS kernel as part of a reboot cycle. <targetVersion> should not be more than a couple of versions ahead of your currently installed release: so if you are on Fedora 34 and want to go to Fedora 40, you should perform the upgrade steps several times: to Fedora 36, then 38, and finally 40.

But I decided to double check the upgrade after the first step because I was no longer trusting my understanding of the process. So after my first upgrade I used another Linux command to check the kernel version:

# uname -r
4.18.19-100.fc27.x86_64
#

The output from uname -r shows that the running kernel was Fedora core 27: even older than the OS modules would suggest. But why?

Upgrading GRUB

Image showing oncoming train in a tunnel with the words 'The light at the end of the tunnel...' overlayed

Unravelling the problem

Resolving the mystery of the extremely out of date kernel took a lot of sleuthing. What it boiled down to was a problem related to changes in how GRUB (aka Grand Unified Bootloader) worked at around the time Fedora Core version 27/28/29 were being released.

In Fedora 27, GRUB configuration was handled via discrete entries in the /boot/grub2/grub.cfg file. Each OS version that could be selected during the boot process was listed in that file and was used to generate the GRUB boot menu. When a new OS was released a command was run to re-generate that file: grub2-mkconfig.

Somewhere around Fedora version 28 or 29, the mechanism for how GRUB worked was changed. Instead of discrete entries in the grub.cfg file, GRUB was revised to use something called BLS (boot loader specification). This specification was accompanied by a GRUB library called blscfg that scanned a folder (e.g.: /boot/loader/entries) at boot time and generated the GRUB menu dynamically.

I didn’t piece this all together by myself: I was aided by a couple of references to the problem in old posts in various forums including this one=> https://www.linode.com/community/questions/23249/fedora-linode-stuck-on-older-kernel.

Bad mistake: enable blscfg

I initially thought: “Oh, simple enough: I just need to generate a new grub.cfg using grub2-mkconfig with the correct GRUB_ENABLE_BLSCFG=true setting in /etc/default/grub, and I’ll be good to go!”

So I did this, restarted my machine and… it stopped at the boot loader with the GRUB shell prompt. Nothing booted. I had to drag my wife’s monitor and keyboard down to my server so I could see the console and figure out what had happened. You can imagine how happy Irene was about this, especially since it took me a couple of days to unravel all the problems on the server.

I then had to retrieve ancient knowledge regarding use of GRUB shell to boot when there is no OS version menu. For the record, getting GRUB shell to manually boot a specific kernel looks something like:

grub> set root (hd0,gpt2)
grub> linux /vmlinuz-6.2.15-100.fc36.x86_64 root=/dev/mapper/VolGroup-lv_root ro rd.md=0 rd.dm=0 rd.lvm.lv=VolGroup/lv_swap rhgb rd.lvm.lv=VolGroup/Iv_root rd. luks=0
grub> initrd /initramfs-6.2.15-100.fc36.x86_64.img
grub> boot

Note that it is almost a certainty that none of the specifics above will work for your machine. The device name (hd0,gpt2) for set root, as an example, is a unique device and partition identifiers, and /vmlinuz-… is a specific kernel file that exists on my machine’s boot partition.

GRUB shell has commands that allow you to interrogate your machine and list directories to get the details you’ll need, but it is rather cryptic- Google will be your friend here. One reference that might help you get started if you ever find yourself in this situation is this article on the Linux Foundation site regarding rescuing a non-booting GRUB2 configuration.

Why did blscfg fail to generate a GRUB menu?

The problem with GRUB not dynamically generating the menu using BLS took me a while to unravel. I enabled debugging in the updated grub.cfg file which gave some clues. This screenshot shows some of the evidence I had to work with:

So blscfg was reporting read_entry returned error and Entries weren’t found in /loader/entries/, even though the GRUB shell could be used to clearly show the expected entries were there on the file system at the noted location.

I tried a couple of different unsuccessful approaches to fix the problem. For one, I noticed that the file names for the /loader/entries files were prefixed by the incorrect machine-id UUID, so I regenerated them. I also double-checked file permissions and removed any extraneous files from the /loader/entries folder. And I checked the grub2-install –version on my machine, confirming it was 2.06

Finally the light dawned: the GRUB version that showed up on the GRUB boot loader when it failed to the shell was 0.99: not even close to 2.06. What if the GRUB boot loader image itself was out of date? Maybe it was so old that it didn’t ‘t even work properly with blscfg?

I found these articles about the order to run grub2-image versus grub2-mkconfig, and this one about boot failures with out-of-date GRUB boot images that supported my hypothesis. A few notes about these references:

  • pay close attention to the details in the second thread about UEFI versus legacy bios as this impacts how GRUB boot loader is updated.
  • the link to the GRUB 2 boot loader instructions referenced in the second thread is now=> https://docs.fedoraproject.org/en-US/quick-docs/grub2-bootloader/. It provides a simple shell command for identifying whether your machine has EFI or legacy BIOS:
#  [ -d /sys/firmware/efi ] && echo UEFI || echo BIOS

My machine has the legacy bios so that simplified things a bit.

Updating the GRUB boot image with legacy BIOS was easy. The only challenge was figuring out the correct storage device to update as my machine uses NVMe storage rather than a spinning hard disk which means the device names are a bit obscure.

On my machine with legacy BIOS and NVMe storage, the command to update the GRUB boot image and regenerate the GRUB configuration was:

# grub2-install /dev/nvme0n1
# grub-mkconfig /boot/grub2/grub.cfg

After updating the GRUB boot image I restarted my machine and the boot menu came up with the correct OS entries without any issue. Subsequent executions of the dnf system-upgrade command sequence successfully updated the kernel and GRUB’s boot menu as well as the OS modules.

Conclusion

First and foremost: with all OSes and Linux in particular- read the instructions, and understand the differences in the vendor’s definition of update and upgrade.

Secondly, never assume that OS automated maintenance is actually working. Verify that the OS is being kept current including any boot loader mechanism.

Thirdly, remember and follow these steps for manual Fedora Linux upgrades.

  1. upgrade the OS using the dnf –refresh upgrade; dnf system-upgrade download; dnf system-upgrade reboot sequence
  2. check the kernel version with uname -r after upgrade
  3. check the grub version with grub2-install –version and upgrade the GRUB2 boot loader with grub2-install <bootDevice> and grub-mkconfig <grub.cfg file> as necessary

And hope that, maybe one day, fully automated OS upgrades will be part of the base OS and actually work as expected. A stooge can wish, amiright?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.