Revisions to “rcu_sched detected stalls on CPUs/tasks” - jiffies - ESXi Ubuntu 16 FileServer Guest

added 911 characters in body

edited Nov 20, 2018 at 20:41

2.7k
15
41
60

I rebuilt the VM w/ Ubuntu 14 which is what the raid was built on and ran fine for years. I had one freeze the next day, but after a re-sync it has been stable for aboutTLDR; About a week.Week Later I suspect itslost a mix between my VMKernel and Ubuntu 16 Problem, but I doubt I'll ever know. My suggestion if you get this error, change your KernelCPU Core likely due to overheating/OS versionbadly placed heatsink/fan.

If you are using ESXi I would fire up another OS and check on Temp's and/or consider re-seating your CPU heat sink.

This post has got a lot of views, and when I had the issue, google had very little information for me. Please share your experiences in comments or answers!

UpdateTimeLine: a few weeks later one

I get error about Jiffies

Have to use power button to reboot

MDADM Array Rebuild --> Successful.

I get another error next day

Reboot/Rebuild Successful.

Another error!

Rebuild VM w/ new OS

Stable for about a week

Single Core in CPU dies!

Further research into ESXi showed me that ESXi does NOT gather device Temps without some sort of advanced hardware addition that I didn't have (Possibly because I wasn't use a computer from the cores on"Hardware Compatibility List". (https://communities.vmware.com/thread/547244). If it had, ESXi could have likely throttled my CPU died. If you see this error you may want to considerI now use KVM which checks all my device Temps via normal methods and reacts accordingly. Not just that your CPUbut my RW speed has greatly increased as my Hypervisor is overheatingnow also my FileServer vs before I had to passthrough the disks to a FileServer VM since ESXi doesn't support SMB/NFS/MDADM etc. (I'm talking about a 2 or malfunctioning3 fold increase in RW speeds now that my clients talk directly to the Hypervisor/FileServer).

added 154 characters in body

Source Link

edited Apr 8, 2018 at 7:51

FreeSoftwareServers

2.7k
15
41
60

I rebuilt the VM w/ Ubuntu 14 which is what the raid was built on and ran fine for years. I had one freeze the next day, but after a re-sync it has been stable for about a week. I suspect its a mix between my VMKernel and Ubuntu 16 Problem, but I doubt I'll ever know. My suggestion if you get this error, change your Kernel/OS version

Update: a few weeks later one of the cores on my CPU died. If you see this error you may want to consider that your CPU is overheating or malfunctioning.

Source Link

answered Mar 20, 2018 at 2:47

FreeSoftwareServers

2.7k
15
41
60

I rebuilt the VM w/ Ubuntu 14 which is what the raid was built on and ran fine for years. I had one freeze the next day, but after a re-sync it has been stable for about a week. I suspect its a mix between my VMKernel and Ubuntu 16 Problem, but I doubt I'll ever know. My suggestion if you get this error, change your Kernel/OS version.

Stack Exchange Network

Return to Answer