6

What is the behaviour if I start a long test with smartctl (i.e., sudo smartctl -t long /dev/...), and then I suspend the machine or shut it down and restart it later? Will the test "suspend" too and "continue on" when the machine is started up again?

What my question mainly aims at: I would like to schedule regular SMART tests (e.g., with cron), but it still may happen that I suspend or shut down the machine while the test is running, and then I don't know whether the test will continue or is "not finished" and I would have to restart it again. With current, 16+ TB HDDs SMART tests easily can run more than 20 hours...

The same question holds for external disks, can I detached an USB disk while a SMART test is running and will it be continued next time I connect the device?

0

2 Answers 2

6

Suspending the machine, shutting down, powering off or external usb disks, will aborts the tests and must be restarted.

maybe some spezial drives will retain the test progress which depends on the disk firmware, controller behavior, and power state handling but i don't think so, and the most consumer drives will aborts and restart

When a smartctl command is issued, its effects persist until explicitly changed or until the system is powered off. If the system is powered off, the drive will retain the state set by the smartctl command when it was last used. This means that any ongoing test or command will be interrupted and need to be restarted after power is restored.

Invoking hdparm with the query option is known to wake-up some drives. In this case, consider smartctl provided by smartmontools to query the device which will not wake up a sleeping disk.

Short: runs tests that have a high probability of detecting device problems

Use cron to run tests at times when the system is unlikely to be suspended/shut down, etc., so schedule tests for times when the system is stable, and check logs afterward to ensure completion

From @frostschutz comment & answer:

You can try your luck with selective self-tests.

I switched from long selftests to select,cont tests. It's like the long selftest, but only one slice of disk at a time. So while the long selftest may take well over a day (with an otherwise busy 3TB disk), the selective test can run every night when the server is the least busy and actually finish, without harming performance in the more busy hours.

So basically you would be distributing a monthly long self-test of the entire disk, to a nightly selective test that still covers the entire disk over the course of a month...

1
  • 2
    Thank you for the very elaborate and precise answer! It confirm what I was afraid of, that I will need to do "magic" (e.g., disable auto-sleep after 60 minutes) if I'm running long tests. C'est la vie. Commented Jun 8 at 8:16
6

Quite aside from the system itself sleeping, if you are running long tests you may need to also consider the drive's own standby/spin-down timer. IME most drives will fail long tests by going into standby - the test does not keep the drive awake.

The easiest way to get around this is to force some form of continuous activity on the drive; typically I will loop a repeating touch every minute, but you might also get away with just a read (e.g. ls) assuming it doesn't hit cache.

As you've observed, large modern drives can take many hours to run a long test. While it's possible to edit the standby timer with hdparm, often it does not go high enough!

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.