0

I'm facing a problem that I can't solve by myself so I decided to ask here for some help :

Introduction :

Moving to my new house in a few weeks I decided to give my NAS a little hardware refresh going from :

  • Mini-Itx Case
  • Intel Celeron g4900 (using integrated graphic when needed)
  • 2x4 GB DDR4-2400
  • 4x3 TB NAS Drives Wd Red setup in ZFS Z1
  • Asus H310I-PLUS motherboard
  • Corsair Force mp500 120 GB boot nvme

With the help of refurbished hardware for some parts I changed to this hardware :

  • 2U Server Rack
  • Seasonic Focus 650w power supply
  • 4x4 GB SK Hynix DDR4 2400 ECC
  • Intel Xeon E5 2683 V4 SR2JT, 2.1GHz, 16 core, LGA2011-3 (refurb)
  • Machinist X99 K9 motherboard
  • Noctua NH-L12S CPU Cooler
  • Nvidia GT 710 low profile (refurb)

First of all, I know there's an easy temptation to say that all the problems presented here are the fault of the refurbished hardware or the motherboard, but I tested all those parts after facing my problems on a Ubuntu Live USB and some CPU / Mem load / stress test and as for the live distro everything went fine...

The problem :

At boot pluged with HDMI cable on my TV, the system is freezing after turning green with white text.

Here is a video where I already tried to change some setting in the grub.cfg file (removing the quiet option and setting nomodeset flag) :

https://youtu.be/aZlQ-ADaghw

It seems that the only way to recover the system is the reset button as the system is 100% frozen, no shortcut working, no console...

Logs and investigation :

Here is a boot log I managed to extract with the Live USB :

https://pastebin.com/jqQV2Q1b

In this log those lines I don't know if this seems to be the problem (divide error) :

nvidiafb: Unable to detect display type... nov. 29 02:03:41 NomadNas kernel: ...Using default of CRT nov. 29 02:03:41 NomadNas kernel: nvidiafb: Unable to detect which CRTCNumber... nov. 29 02:03:41 NomadNas kernel: ...Defaulting to CRTCNumber 0 nov. 29 02:03:41 NomadNas kernel: nvidiafb: Using CRT on CRTC 0 nov. 29 02:03:41 NomadNas kernel: fbcon: NV28 (fb0) is primary device nov. 29 02:03:41 NomadNas kernel: divide error: 0000 [#1] SMP PTI nov. 29 02:03:41 NomadNas kernel: CPU: 0 PID: 389 Comm: kworker/0:3 Tainted: P O 5.4.143-1-pve #1 nov. 29 02:03:41 NomadNas kernel: Hardware name: Default string Default string/X99-k9, BIOS 5.11 01/11/2021 nov. 29 02:03:41 NomadNas kernel: Workqueue: events work_for_cpu_fn nov. 29 02:03:41 NomadNas kernel: RIP: 0010:nvGetClocks+0x186/0x280 [nvidiafb] nov. 29 02:03:41 NomadNas kernel: Code: 0f 00 00 3d 00 03 00 00 74 73 3d 30 03 00 00 74 6c 41 8b 89 04 05 00 00 0f b6 c5 44 0f b6 c9 c1 e9 10 0f af c2 31 d2 83 e1 0f <41> f7 f1 d3 e8 89 06 48 8b 87 40 11 00 00 8b 88 00 05 00 00 0f b6 nov. 29 02:03:41 NomadNas kernel: RSP: 0018:ffffaecd009dfa80 EFLAGS: 00010246 nov. 29 02:03:41 NomadNas kernel: RAX: 0000000000000000 RBX: ffff95e08d5aa510 RCX: 0000000000000000 nov. 29 02:03:41 NomadNas kernel: RDX: 0000000000000000 RSI: ffffaecd009dfab8 RDI: ffff95e08d5aa418 nov. 29 02:03:41 NomadNas kernel: RBP: ffffaecd009dfa88 R08: ffffaecd009dfabc R09: 0000000000000000 nov. 29 02:03:41 NomadNas kernel: R10: ffff95e08d5aa418 R11: 0000000000062570 R12: 0000000000000020 nov. 29 02:03:41 NomadNas kernel: R13: 0000000000006247 R14: 0000000000000010 R15: 0000000000000068 nov. 29 02:03:41 NomadNas kernel: FS: 0000000000000000(0000) GS:ffff95e09f400000(0000) knlGS:0000000000000000 nov. 29 02:03:41 NomadNas kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 nov. 29 02:03:41 NomadNas kernel: CR2: 00007f323bf6a22d CR3: 00000005c660a005 CR4: 00000000003606f0 nov. 29 02:03:41 NomadNas kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 nov. 29 02:03:41 NomadNas kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 nov. 29 02:03:41 NomadNas kernel: Call Trace: nov. 29 02:03:41 NomadNas kernel: NVCalcStateExt+0x1c7/0x950 [nvidiafb] nov. 29 02:03:41 NomadNas kernel: ? _cond_resched+0x19/0x30 nov. 29 02:03:41 NomadNas kernel: ? _cond_resched+0x19/0x30 nov. 29 02:03:41 NomadNas kernel: ? kmem_cache_alloc_trace+0x172/0x240 nov. 29 02:03:41 NomadNas kernel: nvidiafb_set_par+0x49e/0xa40 [nvidiafb] nov. 29 02:03:41 NomadNas kernel: fbcon_init+0x2ad/0x570 nov. 29 02:03:41 NomadNas kernel: visual_init+0xd5/0x130 nov. 29 02:03:41 NomadNas kernel: do_bind_con_driver+0x1ed/0x2e0 nov. 29 02:03:41 NomadNas kernel: do_take_over_console+0x129/0x1a0 nov. 29 02:03:41 NomadNas kernel: do_fbcon_takeover+0x5c/0xb0 nov. 29 02:03:41 NomadNas kernel: fbcon_fb_registered+0x113/0x120 nov. 29 02:03:41 NomadNas kernel: register_framebuffer+0x230/0x310 nov. 29 02:03:41 NomadNas kernel: nvidiafb_probe.cold.12+0x78e/0x80a [nvidiafb] nov. 29 02:03:41 NomadNas kernel: local_pci_probe+0x47/0x80 nov. 29 02:03:41 NomadNas kernel: work_for_cpu_fn+0x1a/0x30 nov. 29 02:03:41 NomadNas kernel: process_one_work+0x20f/0x3d0 nov. 29 02:03:41 NomadNas kernel: worker_thread+0x233/0x400 nov. 29 02:03:41 NomadNas kernel: kthread+0x120/0x140 nov. 29 02:03:41 NomadNas kernel: ? process_one_work+0x3d0/0x3d0 nov. 29 02:03:41 NomadNas kernel: ? kthread_park+0x90/0x90 nov. 29 02:03:41 NomadNas kernel: ret_from_fork+0x35/0x40 nov. 29 02:03:41 NomadNas kernel: Modules linked in: snd_hda_codec_hdmi(+) intel_rapl_msr intel_rapl_common uas usb_storage input_leds joydev usbkbd x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek kvm_intel snd_hda_codec_generic ledtrig_audio kvm irqbypass snd_hda_intel crct10dif_pclmul snd_intel_dspcfg crc32_pclmul ghash_clmulni_intel snd_hda_codec aesni_intel snd_hda_core crypto_simd snd_hwdep cryptd glue_helper snd_pcm nvidiafb(+) snd_timer vgastate rapl snd fb_ddc intel_cstate serio_raw pcspkr mxm_wmi i2c_algo_bit soundcore mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) coretemp nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 hid_generic usbmouse usbhid hid btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse ahci xhci_pci r8169 ehci_pci i2c_i801 libahci lpc_ich realtek xhci_hcd ehci_hcd wmi nov. 29 02:03:41 NomadNas kernel: ---[ end trace 91e53edc0a767313 ]--- nov. 29 02:03:41 NomadNas kernel: RIP: 0010:nvGetClocks+0x186/0x280 [nvidiafb] nov. 29 02:03:41 NomadNas kernel: Code: 0f 00 00 3d 00 03 00 00 74 73 3d 30 03 00 00 74 6c 41 8b 89 04 05 00 00 0f b6 c5 44 0f b6 c9 c1 e9 10 0f af c2 31 d2 83 e1 0f <41> f7 f1 d3 e8 89 06 48 8b 87 40 11 00 00 8b 88 00 05 00 00 0f b6 

Questions to the community :

First of all thanks to anyone spending some time to help me on this one, you are my last hope before I wipe all my boot drive and start with a new setup (with everything to set-up, dockers, ZFS...).

  • How can I have a minimal working environment (ex: no nVidia drivers loaded, I tried nomodeset it didn't work) in order to manipulate the console on the system and not on a live usb distro
  • Is this really related to this "divide error" as there's still some log lines after this problem
  • Will a new install of OMV and loosing almost all my setup is my only option ? (yes I can still backup some .config files but...)
  • Will a new install even have a chance to work ? (hardware / OMV incompatibility ??)

Many thanks for your help :)

2
  • does the system still boot from a live USB? You said your live USB is Ubuntu...what is the installed system running? and what version? Is it the same kernel version as on your live USB? do you have a different video card available to try? Either another nvidia or an ati/amd/radeon card. BTW, if the live USB works then there is a very good chance that a new install would work. My guess is that whatever distro you have installed and/or whatever kernel it's running is incompatible with your hardware (probably the video card. maybe the motherboard. or the CPU). Commented Nov 29, 2021 at 6:39
  • If you can boot the live CD, it might be worth trying to mount and chroot into the installed system, and then upgrade the kernel (NOTE: you will need /proc, /dev, and /sys bind-mounted into the chroot in order for grub to work properly - e.g. if you mount the system's rootfs on /mnt in the live USB, by running something like for i in proc dev sys ; do mount -o bind /$i /mnt/$i ; done before running chroot /mnt). If the installed distro is Ubuntu too, maybe upgrade the entire distro as well as the kernel to whatever version your live CD is running. Commented Nov 29, 2021 at 6:45

1 Answer 1

0

Finally got it working by blacklisting (for the moment) the nvidiafb module with ./etc/modprobe.d/openmediavault.conf

blacklist nvidiafb 

I also added :

blacklist sb_edac 

But I think this one is not the problem, so I better remove this line because ECC messages shown in the list were just some warning... If I'm not replying in this post, consider this last blacklist line useless :).

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.