Linux Software RAID

The Hunter · Mar 27, 2007

I'm currently looking at building a low cost home file server that will be used mostly for media storage and a few backups of documents etc (which will be backed up elsewhere as well). In other words it's not critical data. I'm looking for flexibility above all, and some reliability is secondary.

I'm probably going to be starting with 3x 320GB Seagate 7200.10's (for data, with a smaller disk just for the OS), and I'll be expanding storage as I go. For flexibility's sake, I want to have it in one big partition for the data. I have a few questions:
1) Is it worth it to run RAID5? The reliability seems nice, but with only 3 drives, the chances of drive failure don't seem to be significantly greater than running them in raid5.
2) If I go the raid route, is EVMS or mdadm better? I like the look of the flexibility of EVMS, but in this case I don't know if I need it. Keep in mind I'll be looking to expand the array at some point (from what I've read, mdadm can do this with recent versions, but I don't see much documentation on it)
3) If I go without RAID, I'd use LVM to great one big volume out of the disks. Is this increasing the risk of a failure? And if one disk in a volume group fails, is the data on another disk lost? Also, is the config data for LVM stored on the disks, or somewhere else? I.e. can I take the disks out, put them in another system and keep the volume without much hassle?

Thanks for your help.

unhappy_mage · Mar 28, 2007

The Hunter said:
1) Is it worth it to run RAID5? The reliability seems nice, but with only 3 drives, the chances of drive failure don't seem to be significantly greater than running them in raid5.

I'd say it is. I don't think you can migrate from raid 0 to raid 5 with LSR, so whatever you choose now you might be stuck with. The performance isn't a big issue for media stuff, because of its sequential nature and low data rate.

The Hunter said:
2) If I go the raid route, is EVMS or mdadm better? I like the look of the flexibility of EVMS, but in this case I don't know if I need it. Keep in mind I'll be looking to expand the array at some point (from what I've read, mdadm can do this with recent versions, but I don't see much documentation on it)

EVMS is certainly simpler than mdadm. However, I've had rather poor results with EVMS' speed as compared to mdadm. It might just be my setup, but I'm not sure.

The Hunter said:
3) If I go without RAID, I'd use LVM to great one big volume out of the disks. Is this increasing the risk of a failure? And if one disk in a volume group fails, is the data on another disk lost? Also, is the config data for LVM stored on the disks, or somewhere else? I.e. can I take the disks out, put them in another system and keep the volume without much hassle?

I don't know for sure about LVM, but LSR keeps its metadata on-disk, and deals just fine with moving disks from one kernel version to another, one machine to another, one controller to another, and so forth. You don't even have to get the disks in the right order; if they're mixed up, it re-orders them once at boot and Just Works. LVM may or may not have the same capability.

taqueso · Mar 29, 2007

The speeds under EVMS are definitely slower, search around for some benchmarks. It is massively more convenient when you want to change stuff around, though it does add complexity. I haven't really done much with EVMS, I played with it for a few days, but I generally just stick with straight md.

The reliability of the whole system is great. I have had several happy disk failures on a couple different systems. Like unhappy_mage says, you can move the disks to different ports, controllers, computers, etc and the raid sets will be autodetected.

Be careful with RAID-5, a common scenario is losing a disk and finding that the some of the other disks have errors that prevent full recovery. Make sure you check the array for errors regularly. This isn't the best link but it has the command to run. Sorry I couldn't find anything better in my quick search, but it is probably in most linux raid howtos. Why RAID-5 is bad.

unhappy_mage · Mar 30, 2007

taqueso said:
Why RAID-5 is bad.

For the home user, this link is highly misleading. They're talking about databases (and database loads!), and most of their discussion is only relevant to hardware several years old. Take, for example, this:

Note that the recently popular IDE/ATA drives do not (TMK) include bad sector
remapping in their hardware so garbage is returned that much sooner.

This is currently completely wrong. Sata and ATA drives support remapping sectors just fine, thank you very much. As another example:

A RAID 5 array is resilient to a single disk outage, but I/O performance for
the array degrades brutally during the outage.
...
By contrast, RAID 1 writes actually get faster during a disk outage.

Wait... what? When a disk fails, the system gets faster, so it's better?

For example, a disk drive with an MTTF of 200,000 hours can be expected to fail only once every 23 years.

He apparently doesn't understand how this metric is derived. The manufacturer takes a thousand drives and runs them for a week or a month. That's 168k or 672k power-on hours. So if three drives fail in a month, that's a 200k MTTF right there. But the failure curve of drives tends to go up as the drives age. Expecting 23 years of life out of a disk is unreasonable.

Don't get me wrong; for databases they still have some really good points. Small writes to raid 5 will quickly destroy performance. 2r2w instead of 1w is a huge hit on performance, and even with a gob of cache you can't avoid that all the time. But for home users, it's not relevant. Uptime isn't a huge concern - if your movie server goes down 5% of the time that's okay, you just have to dig up the DVD. Performance isn't a big issue, either; for the most part you're watching movies at less than 5 megabytes per second (laugh) or listening to music. Both are read-only, sequential, and easy to do with raid of any sort, or individual disks, or almost anything.

In short, businesses hold their systems to a different standard than home users, and that's okay. But don't recommend business-class systems to home users because of the (valid) business concerns presented by DBAs.

I dunno. If you want to use raid 5, it's probably the simplest way of getting redundancy in your data collection. You'll still want to have separate backups if your data is important, but it'll prevent you from a single disk failure. If uptime isn't a concern, you should still use some drives as backup volumes.

taqueso · Mar 30, 2007

I added that link to give some extra info on why scanning for errors was important. I probably should have linked it as something a little less negative. Although I would say RAID-1 is the simplest way to get redundancy. But it doesn't work out so well with 3 disks, does it.

The Hunter, please don't take my comments to mean that you shouldn't use raid5. Just choose to match your risk/inconvenience tolerance.

Also, I just noticed your question about using LVM to make a JBOD array and what happens when one fails. You will lose a big chunk of your filesystem. Even though the disks are not striped, the problem you will have is that the filesystem could be fragmented enough that a large number of files have portions stored in the missing area. Having multiple disks that are not part of any array is safer from this standpoint. Redundancy is better, of course.

Linux Software RAID

The Hunter

Limp Gawd

unhappy_mage

[H]ard|DCer of the Month - October 2005

taqueso

Gawd

unhappy_mage

[H]ard|DCer of the Month - October 2005

taqueso

Gawd