Hardware Canucks

Hardware Canucks (http://www.hardwarecanucks.com/forum/)
-   Storage (http://www.hardwarecanucks.com/forum/storage/)
-   -   HDD fails S.M.A.R.T. short test, but passes long test? (http://www.hardwarecanucks.com/forum/storage/23040-hdd-fails-s-m-r-t-short-test-but-passes-long-test.html)

frontier204 September 13, 2009 06:25 AM

HDD fails S.M.A.R.T. short test, but passes long test?
 
Note this is a >1 year old thread that was brought up by a first-post...

Well my plan to wear out that F@H inefficient computer of mine (see second one in my sig) finally paid off. One of the two hard drives, a Seagate 160GB that's easily the loudest hard disk I have used in the last 8 years, is showing errors. (I back up everything on that untrustworthy rig so I lost nothing.) Since it's only a 160GB IDE disk, I'm not going to try to save it, so it's upgrade time for the whole rig (post in "New Builds" to come).
However, it's in a strange state in that the only thing that fails is a S.M.A.R.T. short self test. The long test passes, and I was able to use badblocks to write 0x55, 0xAA, 0xFF, 0x00 to the drive without it saying any of the sectors were bad. Linux's smartctl and SeaTools also me the same result.

Here's the lengthy S.M.A.R.T. log for anyone interested:
Code:

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:    Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model:    ST3160021A
Serial Number:    3JS2KVJK
Firmware Version: 3.06
User Capacity:    160,041,885,696 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Sun Sep 13 08:10:07 2009 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (  1) minutes.
Extended self-test routine
recommended polling time:        ( 111) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  058  051  006    Pre-fail  Always      -      193498618
  3 Spin_Up_Time            0x0003  097  096  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      7
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  087  060  030    Pre-fail  Always      -      586747607
  9 Power_On_Hours          0x0032  086  086  000    Old_age  Always      -      12331
 10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
 12 Power_Cycle_Count      0x0032  096  096  020    Old_age  Always      -      4998
194 Temperature_Celsius    0x0022  042  051  000    Old_age  Always      -      42
195 Hardware_ECC_Recovered  0x001a  058  051  000    Old_age  Always      -      193498618
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  199  000    Old_age  Always      -      1
200 Multi_Zone_Error_Rate  0x0000  100  253  000    Old_age  Offline      -      0
202 TA_Increase_Count      0x0032  100  253  000    Old_age  Always      -      0

SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 12235 hours (509 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 3f 6f 7a e0  Error: ICRC, ABRT at LBA = 0x007a6f3f = 8023871

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 3f 6f 7a e0 00      08:31:48.347  READ DMA EXT
  c8 00 80 bf 6e 7a e3 00      08:31:48.342  READ DMA
  c8 00 80 3f 6e 7a e3 00      08:31:48.324  READ DMA
  25 00 00 3f 6b 7a e0 00      08:31:48.320  READ DMA EXT
  c8 00 80 bf 6a 7a e3 00      08:31:48.316  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed: read failure      90%    12331        33787178
# 2  Short offline      Completed: read failure      90%    12331        27659034
# 3  Short offline      Completed: read failure      90%    12320        256115518
# 4  Short offline      Completed: read failure      90%    12317        167690272
# 5  Short offline      Completed: read failure      90%    12317        167690272
# 6  Short offline      Completed: read failure      90%    12317        167690272
# 7  Short offline      Completed: read failure      90%    12316        830
# 8  Short offline      Completed: read failure      90%    12316        152971154
# 9  Extended offline    Completed without error      00%    12304        -
#10  Short offline      Completed: read failure      90%    12302        69390739
#11  Short offline      Completed: read failure      90%    12293        271949021
#12  Short offline      Completed: read failure      90%    12284        14449929
#13  Short offline      Completed: read failure      90%    12260        278277010
#14  Short offline      Completed: read failure      90%    12236        212096089
#15  Extended offline    Completed without error      00%    12215        -
#16  Short offline      Completed without error      00%    12212        -
#17  Short offline      Completed without error      00%    12197        -
#18  Short offline      Completed without error      00%    12176        -
#19  Short offline      Completed without error      00%    12164        -
#20  Short offline      Completed without error      00%    12144        -
#21  Short offline      Completed without error      00%    12125        -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Has anyone seen this HDD fail behaviour before? As noted above, this is only an out-of-curiosity thing, since I'm ditching the whole rig.

Squeetard September 13, 2009 09:01 AM

Showing errors during use or just SMART errors? I've had a bazillion drives toss errors in SMART and never ever have problems. I disable SMART or ignore it if you can't disable it.

frontier204 September 13, 2009 10:34 AM

No errors during yet; I have to explicitly view the log to see the errors (e.g. OS and BIOS are not yelling at me).

Squeetard September 13, 2009 11:11 AM

Psssh. SMART fails. I have 3 clients whose SMART yells at them every time they boot, even with SMART turned off. Zero problems for years now.

tyreman September 14, 2009 04:03 AM

Disable smart stuff in cmos.
I have seen smart drive show up supposed stuff that never transpired.
I also did check though with appropriate drive test software from the manufactuer of the drive.
Download the drive manufacturers specific test software
See what that discloses.
No problems with the manufacturers software? then no issues.

enaberif September 14, 2009 09:28 AM

While you CAN ignore SMART warnings they are there for a reason and it is letting you know something isn't 100% right.

By ignoring these warnings you are taking chances with the data and the drive.

frontier204 September 14, 2009 07:28 PM

@tyreman: The manufacturer's test (Seatools) gives me the same result. I tried a short test which failed 2 / 2 times, then a long test which passed. Note that SMART was NOT tripped on the drive; I only discovered the issue because I set Ubuntu to run short tests on a schedule.

Anyway I had a 250 GB SATA drive lying around so I put that in and let the computer rebuild the raid 1 array on it. (resulting in a very strange software RAID with 1 IDE and 1 SATA drive). I didn't want to take chances with a drive containing a Subversion repo, since I read those can self-corrupt on a working disk.

Thanks for the advice / interesting to hear stories of some users tripping SMART and still losing nothing. The last two times my drive tripped a SMART warning (to the point where the BIOS is yelling), I had to reinstall my OS. ...then there was the Fujitsu drive which was eating my OS yet passing the manufacturer's test :doh:

starnsun May 1, 2011 06:25 PM

SMART Short Test Failure
 
Hi,

I have experienced this on my HP Pavillion All-In-One IQ846 Desktop. The log shows it is on my D (backup drive) and a search of the Web indicates that Hitachi is known to have physical failures of their product, so I assumed it was the hard drive. I am in the process of Cloning it, out of the unit, because the 846 went black a few days ago and I have to ship it back for a new motherboard. All the other tests passed. Just thought everyone would like to know they are not alone and it is not one specific drive. And this unit was a replacement for an 816 that had many problems. Good thing I bought the Extra Care Package, which allows for replacement; however, I had to go through h.... :ph34r:to get one. My advice...clone it on another drive after backup and trash the hard drive!

frontier204 May 1, 2011 06:44 PM

Wow, I'm surprised to see this thread pop up again...
I'm replying not because of thread necro (bumping up a >1 year old thread by replying to it), but because you seem to have jumped to conclusions very quickly about your HP machine and Hitachi hard drives. ANY hard drive can fail or last a decade without failing, depending on many factors other than the manufacture.

higuma May 19, 2011 02:45 PM

Seeing as how this thread is directly related to my issue, I will post here as well since I came upon it through the search.

I recently bought a WD2002FAEX (WD 2TB Caviar Black HDD) and the results that I have gotten are as follows when running tests with WD Data LifeGuard, HD Tune, and HD Sentinel. Should I be worried about my drive potentially dying? It looks like I am unable to retrieve any relevant SMART info at all. The extended tests on all 3 programs passed.

http://img37.imageshack.us/img37/2435/wd01.jpg
http://img837.imageshack.us/img837/7324/wd03.jpg
http://img153.imageshack.us/img153/3686/wd02.jpg


The status keeps alternating between fail and pass on WD Data LifeGuard.

http://img508.imageshack.us/img508/5606/hdtune01.jpg
http://img815.imageshack.us/img815/6549/hdtune02.jpg

HD Tune is not much different

And HD Sentinel is unable to get any readings from SMART.

System specs:
Gigabyte X58A-UD5 Mobo
Core I7 930
Windows 7 Ultimate
Bios set to AHCI mode (same results when set to IDE)

Would these results justify returning / exchanging the drive for a new one?


All times are GMT -7. The time now is 05:07 PM.