My data-tank has a boo-boo, and a nice visit to the computer store

Oh well I guess its the life of a server monkey to replace broken components every now and then, the server has almost been running almost 24/7 for a year now so not much of a shock that it had to get some issue soon.

Luckily enough I was smart and used ZFS as my LVM and Raid platform as mentioned earlier here: http://nikolaiovesen.com/?p=69 and it turned out as with most ZFS related things this was also “nice and easy, ballet dancing and lemon squeezy” well not too much of the ballet or lemons.

Anyways!

What happened was that one of my WD green drives (propably the oldest one I have) got some sector issues and DMESG spewed out errors constantly, I tried to fix this by doing a scrub of the pool first, not that this mattered as I was just fumbling around in the dark, usually the way I learn things.

Then I ran a S.M.A.R.T. test on the drive that was reported to have failures, this spewed out a lot of information and also the stuff I needed; sector errors.

So now all there was to do was get a replacement drive, throw it in and get the array of disks working nicely together again, so I figured if this drive now broke I could assume that the others also would die within the foreseeable future (a few years), so I went for a Western Digital RED drive this time instead of a WD Green as the RED series is designed for NAS boxes and should most likely be more happy to be on 24/7 over longer periods.

Before swapping out the broken drive I ran another scrub, which is a function that checks for checksum errors and such on the drives and corrects them (another joyfully convenient feature of ZFS), once I had ensured that I had no data corruption on any drives I went to bed (4 am is definitively bedtime on a Sunday… or well Monday morning).

#performs a scrub of the pool
zpool scrub microserver

Snippet: Doing a scrub on zfs

Tuesday morning I went to the local computer store (Digital Impuls) and got hold of a new and shiny WD RED 3tb drive as well as some dirt-cheap second-hand DDR2 RAM for a server project at Hackheim (i’ll do a post about this project some time in the future).

While chatting about with the clerk at the shop I mentioned Hackheim and got asked about some stuff they had donated to us (Two wonky laptops, some hard-drives and some power supplies) about further sponsoring of second hand equipment that is functional but might need some fixing up or just stuff customers throw away (because that’s what people seem to do far too much). So as I have turned into the webmonkey at Hackheim I agreed to throw their logo and a link to them on our page and maybe get some posters and stuff at the Hackerspace and get wonky but useful computer equipment in return, I like how enthusiastic the guys were about the place and I hope to see them visit us and maybe we’d even get some new members?

Anyways safely back home from work I got cracking with this disk replacement, after some consultation with the guys at #zfsonlinux on freenode I got cracking  at it, and this was also unsurprisingly simple, what I had to do was shut down the server and loose my awesome uptime :(, pull out the broken drive, find my hex-bits for my electric screwdriver (manual labor is something you do in minecraft), replace the drive in the caddy, insert the drive, boot up, make sure to shut down any services that take use of the pool (SAMBA and my torrent client) and paste in one little command and things got cracking.

#replacing a device in a zpool
zpool replace -f microserver /dev/disk/by-id/ata-WDC_WD30EZRX-00DC0B0_WD-WMC1T0911124 /dev/disk/by-id/ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T0731752

#simplified version for readability
zpool replace -f [poolname] [device to be replaced] [replacement device]

Snippet: replacing a device in zfs

After this its just a matter of waiting while zfs reslivers the pool a process where ZFS moves the appropiate data from the other drives in the pool to the new and empty drive, I’ve currently reslivered 8% of the data and I have a pleasant 8+ hours of waiting left.

Screenshot of the reslivering process

Screenshot of the reslivering process

 

Leave a Reply