Quantcast
Channel: Rockstor Community Forum - Latest topics
Viewing all articles
Browse latest Browse all 1913

SMART monitoring workaround for NVMe SSDs with smartd inkluding e-mail notification

$
0
0

While checking my Rockstor server (5.0.14), I noticed that SMART is only enabled for my SATA Disks (2x 18TB HDD + 1x 500GB SSD), but not for my PCIE nvme SSDs (2x 4 TB):

I stumbled across this thread: SMART service won't turn on - #2 by phillxnet

So I did the same checks on my system:

admin@Kolibri:~> sudo systemctl status smartd.service
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
     Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-11-12 10:02:47 CET; 2 months 19 days ago
       Docs: man:smartd(8)
             man:smartd.conf(5)
   Main PID: 878 (smartd)
     Status: "Next check of 0 devices will start at 15:32:47"
      Tasks: 1 (limit: 4915)
        CPU: 546ms
     CGroup: /system.slice/smartd.service
             └─878 /usr/sbin/smartd -n -q never

Nov 12 10:02:47 Kolibri systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
Nov 12 10:02:47 Kolibri (smartd)[878]: smartd.service: Referenced but unset environment variable evaluates to an empty string: smartd_opts
Nov 12 10:02:47 Kolibri smartd[878]: smartd 7.4 2023-08-01 r5530 [x86_64-linux-6.4.0-150600.23.25-default] (SUSE RPM)
Nov 12 10:02:47 Kolibri smartd[878]: Opened configuration file /etc/smartd.conf
Nov 12 10:02:47 Kolibri smartd[878]: Configuration file /etc/smartd.conf parsed but has no entries
Nov 12 10:02:47 Kolibri smartd[878]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Nov 12 10:02:47 Kolibri systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
admin@Kolibri:~> 

… so I also have the issue about a missing config (I haven’t touched the SMART settings since the rockstor installation.

When I run the smartctl manually, all 5 disks show up as expected:

admin@Kolibri:~> sudo smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/nvme0 -d nvme # /dev/nvme0, NVMe device
/dev/nvme1 -d nvme # /dev/nvme1, NVMe device
admin@Kolibri:~> sudo smartctl -a /dev/nvme1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.0-150600.23.25-default] (SUSE RPM)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SSD_M.2_PCIe4_4TB_InnovationIT_Y
Serial Number:                      H031302309130264
Firmware Version:                   H230829a
PCI Vendor/Subsystem ID:            0x1e4b
IEEE OUI Identifier:                0x000000
Total NVM Capacity:                 4.096.805.658.624 [4,09 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       2.0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          4.096.805.658.624 [4,09 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            000000 2309130264
Local Time is:                      Fri Jan 31 15:19:03 2025 CET
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     95 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.50W       -        -    0  0  0  0        0       0
 1 +     5.80W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.7460W       -        -    3  3  3  3     5000   10000
 4 -   0.7260W       -        -    4  4  4  4     8000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        39 Celsius
Available Spare:                    100%
Available Spare Threshold:          1%
Percentage Used:                    0%
Data Units Read:                    39.301.209 [20,1 TB]
Data Units Written:                 20.074.168 [10,2 TB]
Host Read Commands:                 253.402.037
Host Write Commands:                247.394.123
Controller Busy Time:               455
Power Cycles:                       68
Power On Hours:                     6.457
Unsafe Shutdowns:                   27
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               39 Celsius
Temperature Sensor 2:               46 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged

admin@Kolibri:~> 

5 posts - 2 participants

Read full topic


Viewing all articles
Browse latest Browse all 1913

Trending Articles