4

I would like to understand why I get the below mails about S.M.A.R.T. of my new NVMe drive.

DMESG

$ dmesg --ctime | grep -i nvm [Mon Aug 8 10:48:31 2022] nvme nvme0: pci function 0000:3d:00.0 [Mon Aug 8 10:48:31 2022] nvme nvme0: missing or invalid SUBNQN field. [Mon Aug 8 10:48:31 2022] nvme nvme0: Shutdown timeout set to 8 seconds [Mon Aug 8 10:48:31 2022] nvme nvme0: 8/0/0 default/read/poll queues [Mon Aug 8 10:48:31 2022] nvme0n1: p1 p2 [Mon Aug 8 10:48:37 2022] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none. [Mon Aug 8 10:48:37 2022] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none. 

NVME ERRORS

$ sudo nvme error-log /dev/nvme0 ... Entry[63] ................. error_count : 0 sqid : 0 cmdid : 0 status_field : 0(SUCCESS: The command completed successfully) phase_tag : 0 parm_err_loc : 0 lba : 0 nsid : 0 vs : 0 trtype : The transport type is not indicated or the error is not transport related. cs : 0 trtype_spec_info: 0 ................. ... 

Could anyone shed some light on why I am getting new mails like this:

MAIL

# mail Message 44: From root@dell-laptop-CENSORED Sun Aug 7 08:13:07 2022 X-Original-To: root To: root@dell-laptop-CENSORED Subject: SMART error (ErrorCount) detected on host: dell-inspiron-15 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Date: Sun, 7 Aug 2022 08:12:59 +0200 (CEST) From: root <root@dell-laptop-CENSORED> This message was generated by the smartd daemon running on: host name: dell-inspiron-15 DNS domain: [Empty] The following warning/error was logged by the smartd daemon: Device: /dev/nvme0, number of Error Log entries increased from 485 to 486 Device info: Samsung SSD 970 EVO Plus 2TB, S/N:<!--CENSORED-->, FW:2B2QEXM7, 2.00 TB For details see host's SYSLOG. You can also use the smartctl utility for further investigation. The original message about this issue was sent at Fri Apr 22 09:53:56 2022 CEST Another message will be sent in 24 hours if the problem persists. 

SMART

# smartctl -a /dev/nvme0n1 smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-43-generic] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 970 EVO Plus 2TB Serial Number: <CENSORED> Firmware Version: 2B2QEXM7 PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 2,000,398,934,016 [2.00 TB] Unallocated NVM Capacity: 0 Controller ID: 4 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB] Namespace 1 Utilization: 544,784,187,392 [544 GB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 002538 5221904ad7 Local Time is: Mon Aug 8 11:13:10 2022 CEST Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 85 Celsius Critical Comp. Temp. Threshold: 85 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 7.50W - - 0 0 0 0 0 0 1 + 5.90W - - 1 1 1 1 0 0 2 + 3.60W - - 2 2 2 2 0 0 3 - 0.0700W - - 3 3 3 3 210 1200 4 - 0.0050W - - 4 4 4 4 2000 8000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 44 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 5,565,230 [2.84 TB] Data Units Written: 2,658,490 [1.36 TB] Host Read Commands: 29,877,415 Host Write Commands: 18,211,598 Controller Busy Time: 112 Power Cycles: 240 Power On Hours: 215 Unsafe Shutdowns: 5 Media and Data Integrity Errors: 0 Error Information Log Entries: 502 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 44 Celsius Temperature Sensor 2: 39 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 502 0 0x1005 0x4004 - 0 0 - 

SYSLOG

# cat /var/log/syslog | grep -i smart | grep -i nvm Aug 7 16:08:27 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, opened Aug 7 16:08:27 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB Aug 7 16:08:27 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list. Aug 7 16:08:27 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state Aug 7 16:08:27 dell-inspiron-15 smartd[1001]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices Aug 7 16:08:28 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, number of Error Log entries increased from 486 to 487 Aug 7 16:08:28 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state Aug 8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, opened Aug 8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB Aug 8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list. Aug 8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state Aug 8 07:17:38 dell-inspiron-15 smartd[973]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices Aug 8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, number of Error Log entries increased from 487 to 488 Aug 8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state Aug 8 08:21:16 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state Aug 8 11:14:00 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, opened Aug 8 11:14:00 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB Aug 8 11:14:00 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list. Aug 8 11:14:00 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state Aug 8 11:14:00 dell-inspiron-15 smartd[971]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices Aug 8 11:14:00 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, number of Error Log entries increased from 488 to 494 Aug 8 11:14:01 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state Aug 8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, opened Aug 8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB Aug 8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list. Aug 8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state Aug 8 12:48:40 dell-inspiron-15 smartd[1024]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices Aug 8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, number of Error Log entries increased from 494 to 502 Aug 8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state 
2
  • I have a 960 Pro and the error count increases every day by 1, similarly as your 970 EVO as I can see. Commented Feb 7, 2023 at 19:21
  • In my case the daily error is 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field) Commented Feb 7, 2023 at 19:22

1 Answer 1

1

I think it is highly likely this is caused by this bug in smartmontools. Someone participating in this thread wrote about the cause of the messages:

The NVMe specification is not very consistent on how to identify what features the controller supports, so in some cases the driver just has to try it and see if it worked.

The log entries are likely harmless driver initiated admin commands (SqId 0) checking if a particular feature is supported. The SSD doesn't need to log an error entry for such commands as it has no impact on media health (which is what SMART is supposed to care about), but it is allowed to save the error if it wants. I personally find these types of errors to be less than useless.

The bug in smartmontools, i.e. the useless messages, was solved here, and it is not present in the release 7.4 (changelog here – "smartd: No longer issues LOG_CRIT warnings if new entries of NVMe error information log do not indicate device problems."). However, the smartmontools bug report has a followup, and it is not clear to me whether it has been solved yet.

1
  • Thanks for the info, and welcome to our community! Commented Oct 27, 2024 at 16:41

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.