I have a homeserver running on quite decent hardware (CPU is a Threadripper 2970WX with 24/48 cores). There is a lot of different stuff running on it, a handful of virtual machines with different software, partly office software, but also servers, bots and many other tools. Sometimes, the computer crashes out of nowhere, which is not good, as this is a production system and multiple people are dependent on it. It crashes rarely, and most of the times I was home and could restart it immediately. Still, I would like to know if there is a tool which can monitor my system health and try to warn of a system failure beforehand, or generate crash reports so I know what exactly caused the problem and how to fix it. Is there a tool to handle this, preferably GUI? I don't think manually crawling through various logfiles is a good option for a production system.
I am running Debian 10 Buster as my host system and also for all of my important virtual machines.