RAID setup for SMB file server

Silent.Sin

Gawd
Joined
Jun 23, 2003
Messages
969
I'm looking to put together a cheap whitebox NAS server for the office that needs to have 2TB+ capacity. I want to use some form of RAID that is both speedy and has redundancy so I'm thinking either RAID5 or RAID01. This box is also to be used as an NFS datastore for an ESXi server so the individual drive speed as well as having multiple spindles is going to be a major focus. There likely won't be more than 4 VMs running at a time, but some of those VMs will be hosting I/O intensive db type applications. I was thinking about using a cheapie CPU+mobo combo and use software RAID on whatever mobo chipset I decided to get, but when I started looking at the options and then user experiences my head started spinning. So here are some of my questions:

1. Should I even bother trying to do this with software RAID on the chipset?
2. If I can use the chipset instead of an add-on card, what platform generally performs better: Intel or AMD? What chipsets specifically? Recommendations on a <$150 mobo?
3. If I need to go add-on card, can I get away with using a cheaper solution that is <$100? (examples). This isn't meant to be enterprise class hardware with 512MB of onboard cache, I just need it to work reasonably well and with some decent reliability. This will all be backed up off-site so if the whole array crashes we won't be doomed.
4. Once I get the RAID support figgered out, what should my array consist of, ie- how many disks? If I go with RAID5 I've heard odd numbers of disks work best (3 or 5). Any truth to that?

If any of you guys have actually built something along these lines I'd love to hear what parts you used and how well it worked out. Thanks!
 
Will you be doing software RAID in a non-windows OS; or will you be using the windows-only (Fake)RAID drivers?

If you run ESXi server, does the VMs have direct access to the physical disks or is all I/O virtualized? I would give the database VMs their own disks; not shared with the others.

RAID5 on windows means you need an expensive IOP-based card. The onboard stuff is useful for RAID0, RAID1 and combinations of those; but not RAID5; especially not from the terrible quality drivers AMD/Promise/JMicron/Silicon Image/nVidia/ALi.
 
Sounds like a good candidate for OpenSolaris/FreeBSD/FreeNAS, 8GB of RAM, and RAIDZ
 
Will you be doing software RAID in a non-windows OS; or will you be using the windows-only (Fake)RAID drivers?

Unfortunately this will be Windows, our off-site backup client that will need to be run ON this server only runs on Windows so that is the main limiting factor, otherwise I'd probably try to go for a FreeNAS solution.

If you run ESXi server, does the VMs have direct access to the physical disks or is all I/O virtualized? I would give the database VMs their own disks; not shared with the others.

All I/O is virtualized. There are two light weight VMs running on the ESXi box itself atm. I would like to bring consolidated data storage to our network as opposed to only adding space to the ESXi box (or any other individual server) for future expansion purposes so that's why I want to move to using the NFS datastore.

I had also thought about giving any database VMs their own smaller physical disk to use. Would having multiple RAID1 arrays, 1 for each db VM, impact performance in a noticeable way or would I be better off just taking the risk of using a single drive for each?

RAID5 on windows means you need an expensive IOP-based card. The onboard stuff is useful for RAID0, RAID1 and combinations of those; but not RAID5; especially not from the terrible quality drivers AMD/Promise/JMicron/Silicon Image/nVidia/ALi.

Is this personal preference due to shoddy software or an actual limitation of Windows? I thought at least the server OS's could handle RAID5 themselves, and even old XP with some hackery. I'd imagine I might need more than my grandma's Celeron to run with that, but a beefy CPU is still cheaper than a full-featured RAID card. The more I read about people's speed issues with RAID5 outside of certain OS's the more I think "losing" more usable space to RAID01 with a 2x2 array is the better option here and might wind up cheaper in the long run without the need for a good IOP RAID5 card.
 
I had also thought about giving any database VMs their own smaller physical disk to use. Would having multiple RAID1 arrays, 1 for each db VM, impact performance in a noticeable way or would I be better off just taking the risk of using a single drive for each?
Just make sure your database VM isn't hampered by I/O done by the other VMs. A dedicated RAID0+1 or RAID1 for the database VM and another RAID1 for the rest makes sense to me. You don't need that much performance on the 'light-VMs' right?
Is this personal preference due to shoddy software or an actual limitation of Windows? I thought at least the server OS's could handle RAID5 themselves, and even old XP with some hackery.
Windows XP has a 'poor-mans' RAID5 that does not do write combining. The result is that virtually all writes will be read-modify-write which are very slow. It could be that newer versions of Windows have better RAID5 support. I'm sure many Windows admins here can inform you about that.

Linux, FreeBSD and OpenSolaris have much more powerful software RAID engines. Using RAID0 or RAID1 should be fine though; as these do not need write combining and can perform well with a simple engine. I managed to get 500MB/s with software 8xRAID0 in XP pretty easily. Didn't test random access here, though.
I'd imagine I might need more than my grandma's Celeron to run with that, but a beefy CPU is still cheaper than a full-featured RAID card.
Actually, high-performance RAID5 engines need a lot of memory bandwidth due to I/O combining and splits; not raw processor speed. The XOR itself is memory bottlenecked; not CPU bottlenecked. And even Celerons should get you at least 2GB/s XOR speeds.

So a simple RAID5 driver that performs poorly like the one in XP or nVidia Mediashield RAID5 drivers which i also tested, will have lower CPU utilization than a modern advanced RAID5 driver that does proper write combining. The truth is, you want the extra CPU usage because CPU's idling because of I/O bottlenecks is not preferable to a loaded CPU with good I/O throughput.

If you have a *good* software driver; RAID5 can be very fast even on modest hardware. You do need a good interface though; do not use PCI or PCI-X.

The more I read about people's speed issues with RAID5 outside of certain OS's the more I think "losing" more usable space to RAID01 with a 2x2 array is the better option here and might wind up cheaper in the long run without the need for a good IOP RAID5 card.
Very true. For database write access, a RAID0+1 would perform better than a RAID5.
But i would also think about dedicating disks/arrays to your I/O intensive VM.
 
I would use use 8 1tb drives for your virtualization system. I would setup all 8 disks as windows dynamic disks and configure 4 raid 1 arrays in windows software. This will not slow the drives down much at all. Then set each vm files to reside on an array. This way there will be no contention for disk resources and all systems will run full speed. Windows disk manager will be great for managing the arrays. Any benefits that a raid 10 or raid 5 system would give you would be dwarfed by the benefits of giving each vm a seperate array I can assure you. If your database is very heavy io then consider giving that vm a raid 10. You can use cheap silicon image pci-e 4 port sata cards for this as your mobo wont have enough ports.
 
Just make sure your database VM isn't hampered by I/O done by the other VMs. A dedicated RAID0+1 or RAID1 for the database VM and another RAID1 for the rest makes sense to me. You don't need that much performance on the 'light-VMs' right?

Correct, I was just worried about the thought of running any light VMs concurrently on RAID1 as well as some future additional VMs without some sort of striping. I would imagine there is a certain point, even with light load, where you either need a second physical array or more channels in the array in order to maintain usability if they are going to be in use at the same time.

Windows XP has a 'poor-mans' RAID5 that does not do write combining. The result is that virtually all writes will be read-modify-write which are very slow. It could be that newer versions of Windows have better RAID5 support. I'm sure many Windows admins here can inform you about that...

I figured XP RAID was gimped in some fashion I just hadn't dug deep enough yet. Wish I could use a FreeBSD solution...maybe I still have some time to take persuasive action. Thanks for the explanations on where the bottlenecks will lie.

Very true. For database write access, a RAID0+1 would perform better than a RAID5.
But i would also think about dedicating disks/arrays to your I/O intensive VM.

I think that settles it then, I'm gonna go for a 2x2 RAID0+1 array for the NFS datastore and a separate drive for the host OS. In all honesty even the VM I plan on using as the "heavy" server for right now won't be going balls out with IOPS so I think that should be enough. If I find that sharing the array with the lesser utilized VMs becomes an issue I can always just add a new one to alleviate congestion.

Here's what I just picked out at the 'egg:
Mobo: MSI 785GTM-E45 mATX w/ RAID 0/1/0+1/JBOD
CPU: Athlon II X2 2.8GHz 65W
RAM: Crucial 4GB DDR2-1066
Storage:
-Host: 160GB 7200.12 Barracuda
-RAID: 5x Western Digital Caviar Blue 1TB (1 for emergency)
Case: Enermax Pandora 185 w/ 7x3.5" internal bays + 400W PSU
+ Other Misc parts

Total comes out to be $850 w/ shipping. Not too awful for a 2TB NAS with bays to spare. Only thing that I wasn't sure on is the 4GB of RAM. I would think bandwidth > capacity in this situation, could I get away with 2GB to knock a few bucks off as long as I keep the host OS nice and lean? Darn DDR2 prices keep climbing...
 
Only thing that I wasn't sure on is the 4GB of RAM. I would think bandwidth > capacity in this situation, could I get away with 2GB to knock a few bucks off as long as I keep the host OS nice and lean? Darn DDR2 prices keep climbing...

2GB of RAM will be enough. Use the savings to get a better quality PSU since the included PSU in that Enermax case is not of googd quality enough to power six drives and that CPU. I recommend this PSU:
$50 - Corsair 400CX 400W PSU

Case wise, I recommend this better looking case:
$50 - Cooler Master Elite 335 RC-335-KKN1-GP ATX Case

Mobo wise, I recommend these mobos instead since they use DDR3 RAM:
$67 - BIOSTAR A785G3 AM3 AMD 785G Micro ATX AMD Motherboard
$80 - BIOSTAR TA790GXB3 AM3 AMD 790GX ATX AMD Motherboard

RAM wise, I recommend this RAM:
$55 - Patriot 2GB DDR3 1333 RAM

Oh and the price of that 160GB drive is pretty bad at $45 shipped considering that for $10 more, you can get this significantly faster drive:
$56 - Western Digital WD5000AAKS 500GB 7200RPM SATA 3.0Gb/s Hard Drive
 
Personally i would go with 4GB at least. It would help the database as it has more file-cache which can severely increase performance, as often only a part of the database is actively used and that part being partly cached can increase overall performance significantly.

RAM quantity upgrades are the best thing you can do to remedy database bottlenecks. You can also start with 2GB and add another 2GB later; in that case i would get 1x2GB not 2x1GB.
 
1. Should I even bother trying to do this with software RAID on the chipset?

Yes.

2. If I can use the chipset instead of an add-on card, what platform generally performs better: Intel or AMD? What chipsets specifically? Recommendations on a <$150 mobo?

Search the web for some benchmarks using onboard raid controllers.

4. Once I get the RAID support figgered out, what should my array consist of, ie- how many disks? If I go with RAID5 I've heard odd numbers of disks work best (3 or 5). Any truth to that?

There is no truth in that statement of odd disks working better in RAID5, most likely they are talking about RAD5 requiring a minimum of 3 disks or having hot spares. Performance gains are seen when there are more disks in the array, that is certain.

My question is when you say "This box is also to be used as an NFS datastore for an ESXi server" do you mean your intention is to create a single large NFS and have your team acces it for file transfer as well as host your VM instances?
 
There is no truth in that statement of odd disks working better in RAID5, most likely they are talking about RAD5 requiring a minimum of 3 disks or having hot spares. Performance gains are seen when there are more disks in the array, that is certain.

There were various articles/posts that I had found through google that found some quirks with the stripe size and other such stuff that was over my head, but they kept harping on the fact that the way to "fix" this had to do with using an odd number of disks and formatting in a very specific manner based on # of disks in the array :confused:. This had mainly to do with NV chipsets but I figured most driver based solutions would be just as "dumb" and might have the same problems. Example: http://forums.storagereview.com/ind...uperb-write-speeds-with-nforce-onboard-raid5/

My question is when you say "This box is also to be used as an NFS datastore for an ESXi server" do you mean your intention is to create a single large NFS and have your team acces it for file transfer as well as host your VM instances?

Sorry I didn't explain the usage too well. What we need is a small amount of networked storage (probably go CIFS since we're a windows shop) that will be used for document repositories as well as an SVN backup to be run over the weekend. That would require <100GB total so it's mostly insignificant. I was just going to put that on the host drive, especially as Danny pointed out the price difference between 160 and 500 GB these days is enough to make you just go for 500 so there's plenty of space there. I don't imagine too much concurrent usage hammering the lone host disk since it will only really be put to the test on the weekends.

The NFS datastore will be a separate array of disks and hold our library of VMs to be used as necessary by the ESXi server and administrated by whoever is manning vSphere at the time. These will mostly be demo systems of software products that we can simply power down and up to show off as needed via our VPN (another item on my list of things to tackle...IT work is never done). Most of the products we sell will be deployed in distributed environments so I will need at least one VM to act as the DB server that the other VMs can hit in order to better mimic and test what might be seen in the real world. There's a good chance we will need other DB VMs to host different flavors of servers. There are some space constraints in our office so that's the main reasoning why I don't want the DB box(es) to be separate physical machines. We have a very limited playground to work with at the moment server-wise which I'm attempting to turn around by flying under the budget radar and to let me put my system building geek hat on :cool:.

I've already done some preliminary tests using what I could with an NFS datastore for a single WinXP VM. This was done across a 100Mbps network, the NFS was done on a Win2003 box using SFU and the NFS shared the same physical disk as the host OS. Worked out much better than I thought it would even when the Win2003 host got a little busy. I saw nearly native performance when accessing the WinXP VM's console from my laptop. Was a bit worried with some of the horror stories I had heard about the NFS provided by SFU, but it's been stable as anything for me so far ::knocks on wood::. When I get this new NAS system in place it will be coming along with some infrastructure overhaul to GigE for our servers so performance should be even better, especially when the datastore is not anchored to the host drive.
 
There is no truth in that statement of odd disks working better in RAID5, most likely they are talking about RAD5 requiring a minimum of 3 disks or having hot spares. Performance gains are seen when there are more disks in the array, that is certain.
With Windows Vista and Windows 7 creating partitions with 1024KiB offset, this fixes alignment for RAID0 and RAID0+1 combinations; but not for all RAID5!

With 3-disk RAID5 and 128KiB stripe, the full stripe block is (3-1) * 128KiB = 256KiB large. This fits nicely with the 1024KiB offset.
However, with 4 disks we get (4-1) * 128KiB = 384KiB. Now you have partial misalignment with a 1024KiB offset.

So there's definately truth in using odd numbers for RAID5 and even numbers with RAID6. You can correct alignment yourself of course, but it's only necessary for RAID4/5/6 arrays.
 
What we need is a small amount of networked storage (probably go CIFS since we're a windows shop) that will be used for document repositories as well as an SVN backup to be run over the weekend. That would require <100GB total so it's mostly insignificant. I was just going to put that on the host drive

....

The NFS datastore will be a separate array of disks and hold our library of VMs to be used as necessary by the ESXi server and administrated by whoever is manning vSphere at the time.

Sounds like a plan. :D
 
Back
Top