Dec 10
FreeBSD’s gmirror is a piece of shit
I do a lot of work on Fakku!, which is a weeaboo porn site. I swear the damn thing is cursed — it chews through HDDs at a daunting rate (though half of that, I suspect, is because the old host FDC, played dominos with the drives before installing them). Fakku’s main HDD (an SSD drive on top of that) blew up a month ago. Then a couple of weeks later the replacement drive they gave us also blew up. Naturally I was fucking through with their bullshit and migrated everything over to 100tb.
Instead of keeping with the monolithic-server-to-do-everything (8 cores and 16GB of RAM? fml), I decided it would be more cost effective to buy the shittiest servers they offer (4 core/8GB) and then balance the load between them. And then use gmirror to software RAID-1 the drives because restoring from backups takes bloody ages (separate datacenter, etc).
GUESS I FORGOT GMIRROR WAS A PIECE OF SHIT.
To install gmirror, you basically just load a kernel module in, create an array, then add the drives to an array. It’s kind of nice because you can create an array from an already-in-use drive, then add another drive to the array and it’ll work fine. This is helpful because sysinstall, FreeBSD’s installer, is an archaic piece of shit that only really supports plain vanilla installs on a single drive.
So I do all that jazz on the two machines and reboot them. One of them comes back up, the other one doesn’t. Fuck. So I read through the documentation, check the hardware specs, etc. The troubleshooting section has an “oh if it doesn’t work try compiling the kernel module in statically”. Guess what didn’t work. After dicking with the damn thing for a couple of hours I gave up on the second machine and just decided to run it without RAID.
Naturally, you’d assume having a RAID-1 setup would drastically increase your parallel read performance (and given we’re serving pornography, this is a pretty important figure). HAHA WELL YOU’D BE WRONG. Running a RAID-1 setup with gmirror does absolutely nothing to improve read throughput, which is fucking retarded.
But the fun doesn’t stop there.
A week after running the site on this setup, the RAID’d application server starts crashing randomly. It’s a soft crash — the machine is still running — you can ping it, you can do the first part of an SSH handshake — but anything involving disk I/O (validating SSH keys, requesting masturbation materials, etc) doesn’t work. Hard reboots don’t fix it — the data center tech has to go do something to it to power it back up (and they never tell you fucking anything). And there’s nothing in any fucking log about why it’s crashing. smartctl says the disks are peachy.
Whenever it’s brought back up, gmirror decides “SHIT A DRIVE IS DEGRADED MUST REBUILD THE ARRAY”. I’m not sure what the fuck it’s fucking doing but it’s probably related to the problem. God help you if the power cuts or anything bad happens while it’s rebuilding the array, too. I found the following advice on the FreeBSD mailing lists:
1) turn off gmirror
2) clear gmirror header on both providers
3) run fsck the other drive (not ad6, but the other used on mirror).
4) pray
5) after fsck will end it successfully (it should), create gmirror with the disk you checked
gmirror label
gmirror-name /dev/thedisk
6) reboot and start the system. should go well.
7) after system is running and not too much needing disk I/O, do
gmirror insert gmirror-name /dev/ad6
8) pray again, but with much less fear.
9) if gmirror will finish rebuild, all right.
Yeah… no thanks.
Fuck gmirror.
Tagged with: fakku, freebsd, fucking bullshit, gmirror, prayer solves all problems
7 comments
