When Good Drives Try To Go Bad
I logged onto my FreeNAS and found out that my boot drive was degraded. So I dug into the problem and attempted to fix it
I logged into my FreeNAS box to check a few things and maybe do a software update when I looked at the notifications and saw something that every admin hates to see.
Since I was still able to log into to web interface and access the volumes I decided it wasn't a complete and utter failure. I logged onto my server via SSH and decided to run good old zpool status
.
pool: freenas-boot
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 0 days 00:07:53 with 2 errors on Wed Jul 15 03:52:53 2020
config:
NAME STATE READ WRITE CKSUM
freenas-boot DEGRADED 2 0 0
gptid/7d4cfaf2-cf70-11e5-8a07-000c29242dfe DEGRADED 2 0 60 too many errors
errors: 2 data errors, use '-v' for a list
Okay only two data errors that's not too bad. Let's take a look at what files they are.
pool: freenas-boot
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 0 days 00:07:53 with 2 errors on Wed Jul 15 03:52:53 2020
config:
NAME STATE READ WRITE CKSUM
freenas-boot DEGRADED 2 0 0
gptid/7d4cfaf2-cf70-11e5-8a07-000c29242dfe DEGRADED 2 0 60 too many errors
errors: Permanent errors have been detected in the following files:
freenas-boot/ROOT/11.3-U2.1@2016-02-09-14:33:33:/usr/local/lib/python2.7/email/message.pyc
freenas-boot/ROOT/11.3-U2.1@2016-02-09-14:33:33:/usr/local/lib/python2.7/uu.pyc
Seems to be two python library files that I'm not even sure are used for anything. Still it's not great since it may mean the flash drive I've been using as a boot drive is failing. Before doing anything else the first thing I did was export out my FreeNAS configuration file so I could restore if I needed to reinstall.
I didn't have a flash drive handy, had already backed up the configuration and the NAS data pool was healthy so I decided to do something I normally wouldn't recommend. I decided to run a system update in the hopes it would replace the busted files and I wouldn't have to devote a chunk of time to rebuilding my FreeNAS box.
Of course if this makes the problem worse I'm going to need to rebuild the damn thing anyway. But I decided to go for the (potentially) quick option. The good news is on that on first glance it seemed to have worked.
pool: freenas-boot
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 0 in 0 days 00:07:24 with 0 errors on Wed Jul 15 22:21:45 2020
config:
NAME STATE READ WRITE CKSUM
freenas-boot DEGRADED 0 0 0
gptid/7d4cfaf2-cf70-11e5-8a07-000c29242dfe DEGRADED 0 0 0 too many errors
errors: No known data errors
Well it kind of worked, now I'm getting a slightly different error. Time to check it with a zpool scrub freenas-boot
.
root@Thorn:~ # zpool status freenas-boot
pool: freenas-boot
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 0 in 0 days 00:08:39 with 0 errors on Wed Jul 15 23:24:26 2020
config:
NAME STATE READ WRITE CKSUM
freenas-boot DEGRADED 0 0 0
gptid/7d4cfaf2-cf70-11e5-8a07-000c29242dfe DEGRADED 0 0 0 too many errors
errors: No known data errors
root@Thorn:~ #
Since the run of zpool scrub
didn't seem to identify any file errors I may have gotten lucky with the system update. I decided to run zpool clear freenas-boot
to clear the errors and then re-run zpool scrub
. A little under eight minutes later it came back clean
root@Thorn:~ # zpool status freenas-boot
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:07:49 with 0 errors on Wed Jul 15 23:38:26 2020
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/7d4cfaf2-cf70-11e5-8a07-000c29242dfe ONLINE 0 0 0
errors: No known data errors
Knowing I likely got incredibly lucky I ordered a new flash drive online and once it shows up I'll make it a copy of the existing boot drive and see if that works. Or I may just do a fresh install for paranoia's sake. Thankfully this is something that many other people have done so there are plenty of guides out there. I'll update this post when I get that working.
I really should get around to getting some sort of external monitoring set up but that's a project for another post.