Decoding machine tests in MAAS: a closer look at smartctl-validate
Errors or typos? Topics missing? Hard to read? Let us know!
When managing infrastructure, transparency is paramount. With MAAS, you'll find that the "Tests" tab is your trusty sidekick. As soon as a machine undergoes testing, the Tests log screen reveals a comprehensive list of executed tests. Accompanying each test are a timestamp and the test's result.
Getting smart with
smartctl-validate
One script you'll see quite often is smartctl-validate
. Provided by Canonical, this nifty script uses the smartmontools kit to ensure your disk's integrity. What does a successful run look like? Take a look at the typical output below:
INFO: Verifying SMART support for the following drive: /dev/sda
INFO: Running command: sudo -n smartctl --all /dev/sda
INFO: SMART support is available; continuing...
INFO: Verifying SMART data on /dev/sda
INFO: Running command: sudo -n smartctl --xall /dev/sda
SUCCESS: SMART validation has PASSED for: /dev/sda
--------------------------------------------------------------------------------
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-115-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: QEMU HARDDISK
Serial Number: QM00001
Firmware Version: 2.5+
User Capacity: 5,368,709,120 bytes [5.36 GB]
Sector Size: 512 bytes logical/physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
Local Time is: Wed Sep 2 22:29:12 2020 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Unavailable
Write cache is: Enabled
ATA Security is: Unavailable
Wt Cache Reorder: Unavailable
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 288) seconds.
Offline data collection
capabilities: (0x19) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 54) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate PO---- 100 100 006 - 0
3 Spin_Up_Time PO---- 100 100 000 - 16
4 Start_Stop_Count -O---- 100 100 020 - 100
5 Reallocated_Sector_Ct PO---- 100 100 036 - 0
9 Power_On_Hours PO---- 100 100 000 - 1
12 Power_Cycle_Count PO---- 100 100 000 - 0
190 Airflow_Temperature_Cel PO---- 069 069 050 - 31 (Min/Max 31/31)
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
Read SMART Log Directory failed: scsi error badly formed scsi parameters
General Purpose Log Directory not supported
SMART Extended Comprehensive Error Log (GP Log 0x03) not supported
SMART Error Log Version: 1
No Errors Logged
SMART Extended Self-test Log (GP Log 0x07) not supported
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Selective Self-tests/Logging not supported
SCT Commands not supported
Device Statistics (GP/SMART Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11) not supported
Decoding
smartctl
output: a deep dive
The smartctl
output can be a maze of numbers, acronyms, and technical jargon. Let's decode each section, so you know precisely what you're looking at.
The header and general information
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-115-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
This part provides metadata about smartctl
itself, such as the version you're running and the copyright information. It helps in ensuring you're using an updated toolset.
Device Model: QEMU HARDDISK
Serial Number: QM00001
Firmware Version: 2.5+
User Capacity: 5,368,709,120 bytes [5.36 GB]
Sector Size: 512 bytes logical/physical
Here, you see details about the hard disk model, its serial number, firmware, storage capacity, and the size of its data sectors. These give you an overall snapshot of your drive's hardware specifics.
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
This section confirms whether SMART capabilities are available and enabled. AAM (Automatic Acoustic Management) and APM (Advanced Power Management) are also mentioned, but they are unavailable in this example.
Local Time is: Wed Sep 2 22:29:12 2020 UTC
ATA Version is: ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
The timestamp informs you when the test was conducted. The ATA Version gives details about the ATA protocol that your drive supports.
The lengthy section on SMART attributes provides specific metrics about your drive's health. Each attribute—like Raw_Read_Error_Rate
or Reallocated_Sector_Ct
—has its numerical values and flags. These serve as indicators for disk performance or upcoming failures. For example, Reallocated_Sector_Ct
refers to the number of sectors that have been flagged as faulty and reallocated.
No Errors Logged
If there were issues during the SMART data collection or previous tests, they would be listed here.
SCT Commands not supported
The absence or presence of SCT (SMART Command Transport) commands could influence the kinds of tests and operations you can perform on the disk.
Device Statistics (GP/SMART Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11) not supported
Finally, these lines indicate features that are not supported by the disk. It's useful to know these limitations for advanced troubleshooting.
By diving into each section of the smartctl
output, you gain a comprehensive understanding of your disk's status. These details can be instrumental in both day-to-day management and diagnosing issues before they become critical problems.
Beyond just raw numbers, MAAS equips you with the ability to scrutinise individual logs. Navigate to a machine of interest and head over to the 'Hardware tests' page. There, you'll see a 'Log view' link in the 'Results' column for each test. Clicking this grants you access to detailed outputs, enabling more sophisticated diagnostics.
MAAS ensures you're not flying blind when it comes to your hardware. By employing scripts like smartctl-validate
, you get a robust first line of defence against unexpected hardware failures.