0

I have a homeserver running on quite decent hardware (CPU is a Threadripper 2970WX with 24/48 cores). There is a lot of different stuff running on it, a handful of virtual machines with different software, partly office software, but also servers, bots and many other tools. Sometimes, the computer crashes out of nowhere, which is not good, as this is a production system and multiple people are dependent on it. It crashes rarely, and most of the times I was home and could restart it immediately. Still, I would like to know if there is a tool which can monitor my system health and try to warn of a system failure beforehand, or generate crash reports so I know what exactly caused the problem and how to fix it. Is there a tool to handle this, preferably GUI? I don't think manually crawling through various logfiles is a good option for a production system.

I am running Debian 10 Buster as my host system and also for all of my important virtual machines.

1
  • A "production system" hosted at home? Weird, but okay. There are lots of monitoring tools, but not one is gonna find what is crashing your system. You'll always resort to reading logs to find out the reason, be it through a nice GUI or not. Commented Jul 16, 2021 at 16:05

1 Answer 1

1

There is a utility called, kdump. You can configure it on your system and whenever your system crashes, your on-time system status gets captured in crash-dump folder. This is what I have used and can suggest. If you know what causing the system to crash, you can write a script to fix the issue and schedule it as cron job.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.