I have noticed a strange behaviour on some RHEL 9 servers: GRUB does not boot on the latest kernel, despite the fact that
it is supposed to do so automatically whenever a new kernel is installed, and
grubby --default-kernelreports the default booting kernel is set to the latest kernel.
In this example, the server boots automatically on kernel v5.14.0-362 although there are 5.14.0-427 and 5.14.0-503 installed.
[myrhel9host root ~]# uname -a Linux myrhel9host 5.14.0-362.13.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 24 01:57:57 EST 2023 x86_64 x86_64 x86_64 GNU/Linux [myrhel9host root ~]# grubby --default-kernel /boot/vmlinuz-5.14.0-503.34.1.el9_5.x86_64 This only happens on some servers, although they are configured all the same and have the same updates installed.
As a workaround, I have:
- uninstalled the two oldest kernels (
kernel-corepackages) - deleted their GRUB entries on
/boot/loader/entries/ - rebuilt the GRUB via
grub2-mkconfig -o /boot/grub2/grub.cfg
However, the problem is going to reappear at the next kernel update, so I'd like to find the reason for this bug and - most importantly - a permanent solution for what seems to be a recurrent problem on RHEL 7/8/9 and Fedora servers (see below).
Here is an example of this problem; however, I've checked and on my systems /etc/default/grub has the correct option GRUB_DEFAULT=saved.
This solution from the Red Hat KB says that it might be due to /etc/sysconfig/kernel containing the entry
DEFAULTKERNEL=kernel-uek instead of (for RHEL 7 machines)
DEFAULTKERNEL=kernel or (as seen on my RHEL 9 machines)
DEFAULTKERNEL=kernel-code However, this is not the case on my servers. Also, we have vanilla RHEL (not Oracle Linux) installations, so it's unlikely that this setting is going to point to the Unbreakable Enterprise Kernel.
This Reddit post (thanks to @Chester_Gillon) points out the New Release Notes for RHEL 9.3 mentioning new bootloader behaviour, which could be the cause of this problem.
To disable the new behaviour, the procedure is to change this line in /etc/default/grub
GRUB_ENABLE_BLSCFG=true to
GRUB_ENABLE_BLSCFG=false On all servers the option is set to GRUB_ENABLE_BLSCFG=true. However, as said, only some of them do not boot to the latest kernel.
In any case, I've modified the setting GRUB_ENABLE_BLSCFG=true to GRUB_ENABLE_BLSCFG=false on a machine, then installed the new kernel. This did not solve the problem; the server still booted on an old kernel.
Here is the same problem on Fedora, and it was caused by a missing file /etc/sysconfig/kernel. However, the file exists on my systems.
GRUB_DEFAULT=0? That used to be to tell to boot the first (newest) entry.