13

Quite often times when I do a reboot, I get the following error message:

kernel: watchdog watchdog0: watchdog did not stop! 

I tried to find out more about watchdog by doing man watchdog, but it says no manual entry. I tried yum list watchdog and found that it was not installed. However, when I look at the /dev directory, I actually found two watchdogs:

watchdog and watchdog0

I am curious. Do I actually own any watchdogs? Why does the kernel complain that it did not stop when I do a reboot?

2 Answers 2

8

Most modern PC hardware includes watchdog timer facilities. You can read more about them here via wikipedia: Watchdog Timers. Also from the Linux kernel docs:

excerpt - https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt

A Watchdog Timer (WDT) is a hardware circuit that can reset the computer system in case of a software fault. You probably knew that already.

Usually a userspace daemon will notify the kernel watchdog driver via the /dev/watchdog special device file that userspace is still alive, at regular intervals. When such a notification occurs, the driver will usually tell the hardware watchdog that everything is in order, and that the watchdog should wait for yet another little while to reset the system. If userspace fails (RAM error, kernel bug, whatever), the notifications cease to occur, and the hardware watchdog will reset the system (causing a reboot) after the timeout occurs.

The Linux watchdog API is a rather ad-hoc construction and different drivers implement different, and sometimes incompatible, parts of it. This file is an attempt to document the existing usage and allow future driver writers to use it as a reference.

This SO Q&A titled, Who is refreshing hardware watchdog in Linux?, covers the linkage between the Linux kernel and the hardware watchdog timer.

What about the watchdog package?

The description in the RPM makes this pretty clear, IMO. The watchdog daemon can either act as a software watchdog or can interact with the hardware implementation.

excerpt from RPM description

The watchdog program can be used as a powerful software watchdog daemon or may be alternately used with a hardware watchdog device such as the IPMI hardware watchdog driver interface to a resident Baseboard Management Controller (BMC). watchdog periodically writes to /dev/watchdog; the interval between writes to /dev/watchdog is configurable through settings in the watchdog sysconfig file.

This configuration file is also used to set the watchdog to be used as a hardware watchdog instead of its default software watchdog operation. In either case, if the device is open but not written to within the configured time period, the watchdog timer expiration will trigger a machine reboot. When operating as a software watchdog, the ability to reboot will depend on the state of the machine and interrupts.

When operating as a hardware watchdog, the machine will experience a hard reset (or whatever action was configured to be taken upon watchdog timer expiration) initiated by the BMC.

4
  • Thanks, the kernel doc is useful. To clarify, does this mean that the kernel owns a watchdog and the user, me, do not own one since I have not installed any? Commented Aug 25, 2014 at 3:01
  • 1
    @QuestionOverflow - as I understand it the system provides the watchdog facility (it's essentially hardware). The kernel therefore owns it and manages this hardware just as it would any other piece of hardware within the system. You the user interact with it through the kernel, but do not own it in any official capacity. You're simply a consumer of it. WDT's are used as a built-in protection, in case running software ties up the hardware in unforeseen ways. They're a safety mechanism that allow the system the ability to recover. Commented Aug 25, 2014 at 3:09
  • I see.. But it seems I am able to interact with it directly if I install watchdog. There seems to be a config file /etc/watchdog.conf to alter its behaviour directly. Commented Aug 25, 2014 at 3:23
  • @QuestionOverflow - Take a look at the watchdog description in the RPM. It explains it all. I'll add it to my A. Commented Aug 25, 2014 at 4:06
1

As a side note, to search your manuals, you can do:

man -k watchdog 

and any manual which uses the word at least in the name or title/description will show up in your console. If you expect many options, you may want to use it with less as in:

man -k watchdog | less 

In our case here, though, you probably won't get much more than 2 or 3 entries.

That shows a utility which is named wdctl, allowing you to see the current status/setup of the watchdog. Here is an example on one of the Jetson boards I'm working with:

$ wdctl Device: /dev/watchdog Identity: Tegra WDT [version 1] Timeout: 120 seconds Pre-timeout: 0 seconds FLAG DESCRIPTION STATUS BOOT-STATUS KEEPALIVEPING Keep alive ping reply 0 0 MAGICCLOSE Supports magic close char 0 0 SETTIMEOUT Set timeout (in seconds) 0 0 

We can see the Timeout entry which tells you how long the watchdog will wait before forcing an auto-reboot.

In newer versions of Linux, this is controlled through systemd. Look at /etc/systemd/system.conf where you can find a couple of parameters (usually commented out by default):

[Manager] ... #RuntimeWatchdogSec=0 #ShutdownWatchdogSec=10min ... 

Note: If easier for you, you could also use the apropos command line tool instead of man -k. It does the same thing, although the -k is not required in this case.

apropos watchdog 

Also the keyword (watchdog in this example) can be a regular expression by default.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.