3ware 9650SEでHDDの不良セクタに対処する
我が家のネットワーク環境には、3ware 9650SE-4LPML で RAID 運用しているサーバが 2 台ある。そのうち 1 台、この blog 用の MySQL も稼動しているマシンの 3DM2 から以下のようなメールが来た。
May 24, 2012 02:02.03PM - Controller 0 ERROR - Drive timeout detected: port=1
タイムアウトしただけでデグレードはしていないので HDD の故障というわけではない。そこで、smartmontools で S.M.A.R.T. 情報を取得してみた。
通常、ハードウェア RAID カードの場合、RAID カード上のプロセッサが各 HDD にデータを読み書きする仕様になっている。そのため、ホスト側の CPU からはドライバがなくても RAID アレイが 1 台のディスクドライブに見えるしブートも可能となる反面、実ドライブにアクセスすることはできない。だが、3ware や LSI の RAID カードの場合、smartctl
にオプション -d 3ware,N
を追加することで実ドライブの S.M.A.R.T. 情報を取得することができる。N はポート番号で、デバイスファイルは /dev/twaX
を指定する。X はコントローラ番号となる。
今回は 1 番ポートでタイムアウトが発生しているので、1 番ポートのドライブを確認する。
# smartctl -a /dev/twa0 -d 3ware,1 smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.0-gentoo] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Hitachi Deskstar P7K500 Device Model: Hitachi HDP725050GLA360 Serial Number: GEA5********** LU WWN Device Id: 5 000cca 32cd9c149 Firmware Version: GM4OA52A User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Fri May 25 01:18:33 2012 JST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 7890) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 131) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 133 133 054 Pre-fail Offline - 139 3 Spin_Up_Time 0x0007 120 120 024 Pre-fail Always - 330 (Average 307) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 117 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 144 9 Power_On_Hours 0x0012 095 095 000 Old_age Always - 35613 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 117 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 145 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 145 194 Temperature_Celsius 0x0002 176 176 000 Old_age Always - 34 (Min/Max 15/52) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 3 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
SMART overall-health self-assessment test result
が PASSED
なので、ドライブに致命的な問題が発生しているわけではなさそう。稼働時間が 35,613 時間 = 4 年 23 日 21 時間と相当長く稼動しているドライブだが、問題なのは Current_Pending_Sector
が 3 になっていることで、これは不良セクタのうち再配置がまだのセクタの数らしい。セクタに障害が発生しているのか確認するため、自己テストを実行する。
# smartctl --test=long /dev/twa0 -d 3ware,1 smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.0-gentoo] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 131 minutes for test to complete. Test will complete after Fri May 25 03:40:28 2012
容量と I/O 性能にもよるが概して時間がかかるので、寝るなり仕事をするなりして完了予定時間まで待つ。完了したらドライブ内にログが残るのでそれを確認する。
smartctl -l selftest /dev/twa0 -d 3ware,1 smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.0-gentoo] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 10% 35615 948956112
やはり問題があるらしい。Bad block HOWTO for smartmontools によると、エラーのあるセクタに無理やりファイルを作成してそのセクタを読まないようにすればいいらしい。これは大昔の MS-DOS にあった CHKDSK にもあった機能ではある。ただ問題は、このドライブが RAID カードにぶら下がっていて RAID5 を構成しているということ。
このような場合、RAID カード側で対処できるらしい。3ware 9650SE の場合、RAID アレイのベリファイを行えば自動的に修復してくれる。ベリファイは 3DM2 から Management → Maintenance で「Verify Unit」を押下するか、tw_cli より
# tw_cli //poseidon> maint verify c0 u0 start
のどちらかで即時に実行できる。ただ、今回は 9650SE 側で土曜 0 時より定時ベリファイをするようになっていた(デフォルト設定)ので、それで対応できた。
ベリファイでセクタが再配置されれば、それがログに表示される。
tw_cli でも確認できる。
//poseidon> info c0 diag ### CLI Version: x86_64 (64 bit) ### Time Stamp: 22:31.19 26-May-2012 ### Host Name: poseidon ### OS Version: Linux 3.2.0-gentoo ### Driver Version: 2.26.02.014 ### Controller ID: 0 ### Model: 9650SE-4LPML ### Firmware: FE9X 4.10.00.007 ### BIOS: BE9X 4.08.00.002 ### Serial #: L326*********** ### Available Memory: 224MB ========================================================================== Diagnostic Information on Controller //poseidon/c0 ... -------------------------------------------------------------------------- 〜
タイムアウトが発生した箇所は
E=0204 T=14:55:57 : Port timeout (ext) task file written out : cd dh ch cl sn sc ft : 61 68 43 AD 0A 08 08 Send AEN (code, time): 0009h, 05/24/2012 14:55:57 Drive timeout detected (EC:0x09, SK=0x04, ASC=0x00, ASCQ=0x00, SEV=01, Type=0x71) port=1 task file read back : st dh ch cl sn sc er : 50 A0 C2 4F 00 00 00 E=0204 T=14:55:57 P=1 : Soft reset drive E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0207 T=14:55:57 P=1 : ResetDriveWait E=0204 T=14:55:57 P=1 : Inserting Set UDMA command E=0204 T=14:55:57 P=1 : Check drive swap, same drive E=0204 T=14:55:57 P=1 : Check power cycles, initial=117, current=117, port=1 E=0204 T=14:56:02 P=1h: exitCode = 0 Retrying chain
再配置のログは
[A:78][A:79][A:7a] [A:7b][A:7c] E=0202 T=07:33:32 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:33:32 P=1 : Soft reset drive E=0207 T=07:33:32 P=1 : ResetDriveWait E=0207 T=07:33:32 P=1 : ResetDriveWait E=0202 T=07:33:32 P=1 : Inserting Set UDMA command E=0202 T=07:33:32 P=1h: Repair LBA 0x388febc6...OK E=0202 T=07:33:32 P=1h: Repair LBA 0x388febc7...OK E=0202 T=07:33:32 P=1h: Repair LBA 0x388febc8...OK Send AEN (code, time): 0023h, 05/26/2012 07:33:32 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBC7 E=0202 T=07:33:32 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba0fb0 (ErrorCode): 202 Start Stripe #: 00711fd7 E=0202 T=07:33:57 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:33:57 P=1 : Soft reset drive E=0207 T=07:33:57 P=1 : ResetDriveWait E=0207 T=07:33:57 P=1 : ResetDriveWait E=0202 T=07:33:57 P=1 : Inserting Set UDMA command E=0202 T=07:33:57 P=1h: Repair LBA 0x388febc8...OK E=0202 T=07:33:57 P=1h: Repair LBA 0x388febc9...OK E=0202 T=07:33:57 P=1h: Repair LBA 0x388febca...OK Send AEN (code, time): 0023h, 05/26/2012 07:33:57 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBC9 E=0202 T=07:33:57 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba1941 (ErrorCode): 202 Start Stripe #: 00711fd7 E=0202 T=07:34:17 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:34:17 P=1 : Soft reset drive E=0207 T=07:34:17 P=1 : ResetDriveWait E=0207 T=07:34:17 P=1 : ResetDriveWait E=0202 T=07:34:17 P=1 : Inserting Set UDMA command E=0202 T=07:34:17 P=1h: Repair LBA 0x388febca...OK E=0202 T=07:34:17 P=1h: Repair LBA 0x388febcb...OK E=0202 T=07:34:17 P=1h: Repair LBA 0x388febcc...OK Send AEN (code, time): 0023h, 05/26/2012 07:34:17 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBCB E=0202 T=07:34:17 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba228c (ErrorCode): 202 Start Stripe #: 00711fd7 E=0202 T=07:34:42 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:34:42 P=1 : Soft reset drive E=0207 T=07:34:42 P=1 : ResetDriveWait E=0207 T=07:34:42 P=1 : ResetDriveWait E=0202 T=07:34:42 P=1 : Inserting Set UDMA command E=0202 T=07:34:42 P=1h: Repair LBA 0x388febcd...OK E=0202 T=07:34:42 P=1h: Repair LBA 0x388febce...OK E=0202 T=07:34:42 P=1h: Repair LBA 0x388febcf...OK Send AEN (code, time): 0023h, 05/26/2012 07:34:42 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBCE E=0202 T=07:34:42 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba2c2e (ErrorCode): 202 Start Stripe #: 00711fd7 E=0202 T=07:35:07 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:35:07 P=1 : Soft reset drive E=0207 T=07:35:07 P=1 : ResetDriveWait E=0207 T=07:35:07 P=1 : ResetDriveWait E=0202 T=07:35:07 P=1 : Inserting Set UDMA command E=0202 T=07:35:07 P=1h: Repair LBA 0x388febcf...OK E=0202 T=07:35:07 P=1h: Repair LBA 0x388febd0...OK E=0202 T=07:35:07 P=1h: Repair LBA 0x388febd1...OK Send AEN (code, time): 0023h, 05/26/2012 07:35:07 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBD0 E=0202 T=07:35:07 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba358d (ErrorCode): 202 Start Stripe #: 00711fd7 E=0202 T=07:35:32 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:35:32 P=1 : Soft reset drive E=0207 T=07:35:32 P=1 : ResetDriveWait E=0207 T=07:35:32 P=1 : ResetDriveWait E=0202 T=07:35:32 P=1 : Inserting Set UDMA command E=0202 T=07:35:32 P=1h: Repair LBA 0x388febd1...OK E=0202 T=07:35:32 P=1h: Repair LBA 0x388febd2...OK E=0202 T=07:35:32 P=1h: Repair LBA 0x388febd3...OK Send AEN (code, time): 0023h, 05/26/2012 07:35:32 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBD2 E=0202 T=07:35:32 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba3f05 (ErrorCode): 202 Start Stripe #: 00711fd7 E=0202 T=07:35:57 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:35:57 P=1 : Soft reset drive E=0207 T=07:35:57 P=1 : ResetDriveWait E=0207 T=07:35:57 P=1 : ResetDriveWait E=0202 T=07:35:57 P=1 : Inserting Set UDMA command E=0202 T=07:35:57 P=1h: Repair LBA 0x388febd3...OK E=0202 T=07:35:57 P=1h: Repair LBA 0x388febd4...OK E=0202 T=07:35:57 P=1h: Repair LBA 0x388febd5...OK Send AEN (code, time): 0023h, 05/26/2012 07:35:57 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBD4 E=0202 T=07:35:57 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba488d (ErrorCode): 202 Start Stripe #: 00711fd7 E=0202 T=07:36:27 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:36:27 P=1 : Soft reset drive E=0207 T=07:36:27 P=1 : ResetDriveWait E=0207 T=07:36:27 P=1 : ResetDriveWait E=0202 T=07:36:27 P=1 : Inserting Set UDMA command E=0202 T=07:36:27 P=1h: Repair LBA 0x388febd8...OK E=0202 T=07:36:27 P=1h: Repair LBA 0x388febd9...OK E=0202 T=07:36:27 P=1h: Repair LBA 0x388febda...OK Send AEN (code, time): 0023h, 05/26/2012 07:36:27 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBD9 E=0202 T=07:36:27 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba53f3 (ErrorCode): 202 Start Stripe #: 00711fd7 E=0202 T=07:36:52 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:36:52 P=1 : Soft reset drive E=0207 T=07:36:52 P=1 : ResetDriveWait E=0207 T=07:36:52 P=1 : ResetDriveWait E=0202 T=07:36:52 P=1 : Inserting Set UDMA command E=0202 T=07:36:52 P=1h: Repair LBA 0x388febd5...OK E=0202 T=07:36:52 P=1h: Repair LBA 0x388febd6...OK E=0202 T=07:36:52 P=1h: Repair LBA 0x388febd7...OK Send AEN (code, time): 0023h, 05/26/2012 07:36:52 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBD6 E=0202 T=07:36:52 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba5d6c (ErrorCode): 202 Start Stripe #: 00711fd7 E=0202 T=07:37:12 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:37:12 P=1 : Soft reset drive E=0207 T=07:37:12 P=1 : ResetDriveWait E=0207 T=07:37:12 P=1 : ResetDriveWait E=0202 T=07:37:12 P=1 : Inserting Set UDMA command E=0202 T=07:37:12 P=1h: Repair LBA 0x388febda...OK E=0202 T=07:37:12 P=1h: Repair LBA 0x388febdb...OK E=0202 T=07:37:12 P=1h: Repair LBA 0x388febdc...OK Send AEN (code, time): 0023h, 05/26/2012 07:37:12 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBDB E=0202 T=07:37:12 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba66db (ErrorCode): 202 Start Stripe #: 00711fd7 E=0202 T=07:37:42 : Data ECC error (int) task file written out : cd dh ch cl sn sc ft : 60 78 8F EB 80 80 80 E=0202 T=07:37:42 P=1 : Soft reset drive E=0207 T=07:37:42 P=1 : ResetDriveWait E=0207 T=07:37:42 P=1 : ResetDriveWait E=0202 T=07:37:42 P=1 : Inserting Set UDMA command E=0202 T=07:37:42 P=1h: Repair LBA 0x388febdf...OK E=0202 T=07:37:42 P=1h: Repair LBA 0x388febe0...OK E=0202 T=07:37:42 P=1h: Repair LBA 0x388febe1...OK Send AEN (code, time): 0023h, 05/26/2012 07:37:42 Sector repair completed (EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71) port=1, LBA=0x388FEBE0 E=0202 T=07:37:42 P=1h: Complete IPRs in error A-Verifier ERROR, time: 39ba7256 (ErrorCode): 202 Start Stripe #: 00711fd7
といった感じになる。
ベリファイが完了したら、smartmontools で S.M.A.R.T. 情報を取得する。
# smartctl -a /dev/twa0 -d 3ware,1 smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.0-gentoo] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Hitachi Deskstar P7K500 Device Model: Hitachi HDP725050GLA360 Serial Number: GEA5********** LU WWN Device Id: 5 000cca 32cd9c149 Firmware Version: GM4OA52A User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Sat May 26 22:40:34 2012 JST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 7890) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 131) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 100 100 054 Pre-fail Offline - 0 3 Spin_Up_Time 0x0007 120 120 024 Pre-fail Always - 330 (Average 307) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 117 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 144 9 Power_On_Hours 0x0012 095 095 000 Old_age Always - 35658 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 117 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 147 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 147 194 Temperature_Celsius 0x0002 139 139 000 Old_age Always - 43 (Min/Max 15/52) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 35650 - # 2 Extended offline Completed: read failure 10% 35622 948956110 # 3 Short offline Completed without error 00% 35620 - # 4 Extended offline Completed: read failure 10% 35615 948956112 2 of 2 failed self-tests are outdated by newer successful extended offline self-test # 1 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Current_Pending_Sector
が 0 となり、かわりに Reallocated_Sector_Ct
と Reallocated_Event_Count
がそれぞれ 1 になっている。これで修復完了となる。
このマシンは他にも PostgreSQL に某相場情報が 1 億行以上入っていたりと割と大量のデータを保存していたりする関係で、予備ドライブを 1 本確保してある。それに交換しようか迷ったが、アレイがデグレードしていないこと、息子がおもちゃにすることがあり、投入する際にチェックする必要もあることから今回はセクタの再配置のみで済ませることにした。