Monitoring Disk Health with smartctl (with examples)

Smartctl is a command-line tool that allows users to monitor the health of their disk drives using the Self-Monitoring, Analysis, and Reporting Technology (SMART) system. This system provides information about the current state of the drive, including factors such as drive temperature, error rates, and remaining life expectancy. By regularly checking the SMART data of your disks, you can identify potential issues and take appropriate actions, such as replacing a failing drive before losing any data.

In this article, we will explore the various use cases of the smartctl command and learn how to retrieve information about disk health and perform self-tests. Each use case will include the code, motivation, explanation of the arguments, and example output.

Use Case 1: Display SMART Health Summary

Smartctl allows us to quickly check the overall health of a disk by displaying a summary of its SMART attributes. This summary provides a high-level view of the health status and can help us decide if further investigation or maintenance is required.

Code Example:

sudo smartctl --health /dev/sdX 

Motivation:

To quickly assess the health of a disk and identify any potential issues such as an imminent failure or high error rates.

Explanation:

Example Output:

=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED . 

The output displays the overall health self-assessment test result, indicating whether the drive has passed or failed the test.

Use Case 2: Display Device Information

In addition to checking the health status, smartctl can also provide detailed information about the disk drive itself. This includes details such as the manufacturer, model, firmware version, capacity, and supported features.

Code Example:

sudo smartctl --info /dev/sdX 

Motivation:

To gather comprehensive information about the disk, including its manufacturer, model, firmware version, and supported SMART and non-SMART capabilities.

Explanation:

Example Output:

=== START OF INFORMATION SECTION === Model Family: Samsung Based SSDs Device Model: Samsung SSD 850 PRO 512GB Serial Number: S2XBNB0J982595E Firmware Version: EXM01B6Q . 

The output provides detailed information about the disk’s manufacturer, model family, device model, serial number, firmware version, and more.

Use Case 3: Perform a Short Self-Test

Smartctl allows us to initiate self-tests on our disk drives. These self-tests are designed to check the integrity of the disk and detect any potential issues. A short self-test is a quicker test that focuses on major areas, making it ideal for regular periodic checks.

Code Example:

sudo smartctl --test short /dev/sdX 

Motivation:

To perform a short self-test to quickly verify the integrity of the disk and identify any potential issues that may have arisen since the last test.

Explanation:

Example Output:

=== START OF ENABLE/DISABLE COMMANDS SECTION === . Please wait 1 minutes for test to complete. Test will complete after Sun Aug 1 22:35:38 2021 . 

The output confirms that the short self-test has been initiated and provides an estimated completion time. To check the test results, we can either wait for the completion time or use the --capabilities command discussed in the next use case.

Use Case 4: Display Current/Last Self-Test Status and SMART Capabilities

Besides performing self-tests, smartctl can also display current or last self-test status along with various SMART capabilities and features of the drive. This gives us insights into the self-test history and provides information about the disk’s capabilities.

Code Example:

sudo smartctl --capabilities /dev/sdX 

Motivation:

To retrieve the current or last self-test status and obtain detailed information about the SMART capabilities of the disk, including supported self-tests and error logging features.

Explanation:

Example Output:

=== START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 14127 3675052 . 

The output provides detailed information about the self-test history, including the number, description, status, remaining execution percentage, lifetime hours, and the logical block address (LBA) of the first encountered error (if any).

Use Case 5: Display Exhaustive SMART Data

For a deep insight into the disk’s health and performance, smartctl allows us to retrieve comprehensive SMART data. This includes detailed attributes such as temperature, error rates, power cycles, and disk utilization.

Code Example:

sudo smartctl --all /dev/sdX 

Motivation:

To get an exhaustive view of the disk’s SMART attributes and gain detailed information about the various parameters monitored by the SMART system, including physical and logical sector sizes, error rates, temperature, and more.

Explanation:

Example Output:

=== START OF INFORMATION SECTION === Device Model: Samsung SSD 850 PRO 512GB Serial Number: S2XBNB0J982595E LU WWN Device Id: 5 002538 8b0fec89b . === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED . 

The output includes extensive information about the disk, including device model, serial number, logical unit (LU) WWN device ID, and then displays the SMART overall health self-assessment test result.

Conclusion

Monitoring the health of your disk drives is crucial for ensuring the reliability and stability of your data storage. Smartctl provides a versatile and powerful command-line tool for accessing the SMART data of your disks. By utilizing the different use cases of the smartctl command demonstrated in this article, you can effectively monitor the health of your disk drives, diagnose potential issues, and plan preventive maintenance before any critical data loss occurs.