Some time ago I installed a Seagate Barracuda 7200.12, 500 GB drive. I
installed Windows 7 and Fedora 12 and everything was fine until I
deleted the Linux partitions, re-partitioned, and installed Fedora 13.
Almost immediately after installing F-13 a warning appeared that the
disk was failing because there were too many bad blocks, apparently a
count of blocks that it was necessary to re-assign. However, the disk
continued to operate normally with no disk read or write errors.
After several months of warnings I elected to replace the disk with a
Western Digital 500 GB disk and have had no further warnings.
After replacing the disk I tested it using a Hitachi disk test
utility, which performs a 1.5 hour disk test. The results - "No
Errors" The same utility can check the S.M.A.R.T. disk functions and
that shows normal operation - no error.
Depends on which SMART fields you're looking at. When it comes to bad
sectors, there's three fields in particular that you need to look at
manually: (1) Reallocated Sectors Count, (2) Current Pending Sectors
Count, and (3) Offline Uncorrectable Sectors Count. That list is in
order of seriousness.
For #1 Reallocated Sectors, those are sectors that have been replaced by
the drive hardware itself from its spare pool. This is good news, it
means that the drives own failsafes have done their job, and bad sectors
have been successfully replaced by spare sectors, and your data is safe.
Having a few of these reallocated sectors is fine, but if you notice
them increasing over time, then it's time to do something.
For #2 Pending Sectors, this is a bit more serious. It means that some
sectors have been found to be iffy. They are still readable, but you can
not write to them anymore. They will get rewritten to spare sectors at
the next write of that sector. That is, unless there are no more spare
sectors left, then go see field #3.
For #3 Uncorrectable Sectors, this means that the pool of spare sectors
is now finished. Hopefully at this point the OS itself will start
blocking out bad sectors, thus reducing the overall capacity of the drive.
Now, many SMART reporters don't pay attention to #1 at all, they just
assume that the drive has done its job, and everything is fine. But they
don't monitor the drive over time, so they have no idea if it's remained
the same as before, or if it's increased since the last time it checked.
What might be happening here was that Fedora was one of the ones that
monitor disk health over periods of time, and it noticed counts
increasing. So don't just blindly follow the Hitachi or Seagate disk
monitoring tools' report that everything is fine, as in actual fact the
bad sectors might be increasing over time.
Now the question. Should I assume that the disk is usable based on the
Hitachi tests or should I scrap it based on the Fedora tests?
You can probably keep using it as a non-critical data drive. Just not a
boot drive.
Yousuf Khan