ZFS and ESXi All in One Build

MentatBOB

n00b
Joined
Feb 21, 2011
Messages
10
Hey guys, I've been lurking on the forums for the past month or so while I've been researching ZFS NAS builds for home use and I was hoping I could get some feedback and advice before pulling the trigger and ordering stuff. I've done a lot of reading but I am getting to the point where I have reached the limits of my own personal experience and knowledge.

The main goal for this project is to come up with a scalable and fault tolerant solution that can be used to consolidate data that I have spread across many different drives on a Win7 desktop PC which also runs VMWare Server for 3 small virtual machines. I do not plan on using encryption, deduplication or compression but I can see the possibility of using snapshots eventually. This NAS will be multipurpose so I already know that I will be struggling to balance cost, capacity and performance. My budget top end is $2500 USD for a minimal initial capacity of 8TB usable.

Currently the bulk of my data is in a 4TB CIFS share that I use for storing media and embarrassingly enough, it consists of 3 USB drives passed through to a virtual machine, concatenated into a single logical volume and then shared out via Samba. I also have around 700GB of data on 2 other unprotected SATA drives that I would like to move to the NAS as well. I would classify 300GB of this data as unstructured that can be moved to a CIFS share without a problem. The other 400GB is application and game installations that would be moved to an iSCSI lun or split into 2 separate iSCSI luns. If I can get at least 8TB of usable storage then that should give me enough to consolidate and then plenty of room to grow.

What I am proposing is an ESXi All-in-One box with a Solaris 11 Express virtual machine running as a NAS appliance. The NAS virtual machine would share out an NFS datastore for ESX, a few CIFS/NFS shares for media and other unstructured data and an iSCSI lun to the Win7 desktop to replace the multiple aging SATA drives.

So far my base build looks like this and comes in just under $2200 USD.
  1. SUPERMICRO MBD-X8SIL-F-O
  2. Intel Xeon X3430
  3. 16G Unbuffered DDR3 1333 Memory (4x4)
  4. Intel SASUC8I PCI-Express x8 SATA / SAS Controller Card
  5. OCZ Agility 2 50GB SATA II MLC SSD (L2ARC for faster reads)
  6. NORCO RPC-4220 4U Rackmount Server Chassis
  7. CORSAIR Builder Series CMPSU-500CX 500W Power Supply
  8. 8 x Samsung F4 HD204UI 2TB drives, or Hitachi 5K3000 2TB drives for ZFS
  9. NETGEAR ProSafe GS108T-200NAS (LACP, Jumbo Frames)
  10. Reusing existing 40G WD Raptors for ESX Installation.

From everything that I have been reading this is a pretty standard configuration and should work well for what I want it to do. The drive configuration however is where I am having a difficult time making a decision and keep going back and forth between different options.

  1. 1 x 10GB raidz1 vdev (5 x 2TB drives) 8TB usable
  2. 2 x 8GB raidz1 vdev (4 x 2TB drives) 12TB usable
  3. 4 x 4GB raidz1 vdev (4 x 1TB drives) 12TB usable

Right now I am leaning heavily towards option 2 since it exceeds my minimum capacity requirements and it fits in very nicely with the SASUC8I controller and the NORCO RPC-4220 chassis (8 drives fills the SASUC8I and 2 backplanes of the chassis). As far as performance is concerned, it really comes down to whether or not I should expect iSCSI to perform well enough to replace the local SATA drives on the Win7 desktop. If this is not a realistic expectation then performance becomes less of a concern since anything is going to give me better performance and fault tolerance than what is currently in place.

I ran a couple of quick CDM benchmarks against the 2 SATA drives that I would like to see replaced with iSCSI luns. From what I can tell, the performance on these drives is already pretty poor compared to some of the iSCSI benchmarks that I have seen in other threads, which is what gave me the idea in the first place. It would be nice to protect the drives with ZFS and have the ability for extending them as they fill up instead of having to replace them with larger drives.

I apologize for the length of the post but I wanted to give as much info about what I am attempting to accomplish.Any recommendations or thoughts on fine tuning this build are greatly appreciated.

Code:
-----------------------------------------------------------------------
CrystalDiskMark 3.0.1 x64 (C) 2007-2010 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :    31.562 MB/s
          Sequential Write :    53.298 MB/s
         Random Read 512KB :    20.614 MB/s
        Random Write 512KB :    29.459 MB/s
    Random Read 4KB (QD=1) :     0.389 MB/s [    94.9 IOPS]
   Random Write 4KB (QD=1) :     1.385 MB/s [   338.2 IOPS]
   Random Read 4KB (QD=32) :     0.737 MB/s [   179.9 IOPS]
  Random Write 4KB (QD=32) :     1.395 MB/s [   340.6 IOPS]

  Test : 1000 MB [D: 78.8% (220.1/279.5 GB)] (x1)
  Date : 2011/03/16 22:11:35
    OS : Windows 7  [6.1 Build 7600] (x64)

Code:
-----------------------------------------------------------------------
CrystalDiskMark 3.0.1 x64 (C) 2007-2010 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :    46.579 MB/s
          Sequential Write :    45.719 MB/s
         Random Read 512KB :    27.722 MB/s
        Random Write 512KB :    31.348 MB/s
    Random Read 4KB (QD=1) :     0.515 MB/s [   125.7 IOPS]
   Random Write 4KB (QD=1) :     1.028 MB/s [   250.9 IOPS]
   Random Read 4KB (QD=32) :     1.111 MB/s [   271.1 IOPS]
  Random Write 4KB (QD=32) :     0.998 MB/s [   243.6 IOPS]

  Test : 1000 MB [F: 92.0% (274.0/298.0 GB)] (x1)
  Date : 2011/03/16 22:16:23
    OS : Windows 7  [6.1 Build 7600] (x64)
 
What VMs will be running on the hypervisor? Single processor, X3430, 16GB of RAM isn't a lot of hardware. And running a NAS/SAN and VMs on the same machine? With three VMs? I would be hesitant.

Agility 2 L2ARC? Isn't that a bit low end for a cache...and a very small size too.

I wouldn't bother using the 40GB 10K Raptors...replace with SSDs or newer drives. 40GB Raptors are like..5 years old?

I think you're trying to achieve too much with too little.

What level of performance is necessary?

Edit 1: I'm also wondering what the buffer size per port is on the Netgear switch...I think not a whole lot. That might be restrictive at higher levels of performance.
 
some comments

i have a similar setting for my home and testserver. it should work quite well.
use 6-8 GB/ 2 Cores for basic storage-VM and 8-10 for other VM's

about your WD raptors for ESXi boot and local datastore with Solaris
-no problem-

about your pools:
optimal is having three pools:


pool1: NFS datastore for your VM's
should be optimized for speed, must not be very large
but should be as fast as possible, so use one mirrored vdev or
multiple added/striped mirrored vdevs (raid-10)

best are something like WD raptors + SSD cache or
a mirrored SSD only pool

ex: hd-mirror + cache = 3 drives
or mirrored SSD


pool 2: CIFS fileserver
if you do nor have multiple concurrent user or want to work from
server with large files, use a pool, build from one raidz(1-3) vdev

ex: 5-6 x drives in one Raid-Z(1-2) vdev: 6-10 TB


pool 3: backup
use a pool, build from one basic, mirrored ior Raid-z(1-3) vdev
best on a second server/ location or use two backup pools and
remove/ replace them regularly after export/import pool

or do your backups on your Win7 machine to external USB drives


about iSCSI to you win7
i would not do. use a simple share instead to have file-based access
to snapshots. if you use iSCSI and want to restore a single file, you have
to clone the entire LUN and reimport it.

mount the share as a drive-letter


Gea
 
Last edited:
Gea, if you are using win7, I would assume already using the builtin backup/restore tool, which does incremental backups of the win7 HD's. Also, keep in mind that for backing up win7, you cannot back up to a network share with the home edition, only pro or better :( Using the builtin iscsi initiator in win7 allows you to make the network storage look like a local disk.
 
justin2net:
I agree that it sounds like I am trying to do too much with too little, however everything that I described in my original post currently runs on my Win7 desktop which is an older Q6700 machine with 8G of RAM. I figure at this rate anything is an improvement :). Originally I was going to do a bare metal NAS solution but then decided for a little extra money I could expand it to an ESXI All-in-One and then offload all the shared storage and VM's from my Win7 desktop.

The VM's that I mentioned are really small, each one is only 512M of RAM and around 30G of disk. The most heavily used one is the one that shares out media which is consumed by 3 HTCPs over wireless and ethernet over power devices.

As far as the ESX boot drives goes, I really only need something to land the hypervisor on and give me a large enough local datastore to host my NAS virtual machine. I already have two of the 40G Raptors hanging around so I figured I might as well use them. My only concern with them is how old they are and how long until they eventually fail.

My target performance is really a moving target. What I have now is 'good enough' and meets all of my current needs. It is also a house of cards ready to fall over and then I'll lose the majority of my data so if I can get better performance at the same time as protecting and expanding then it's a win win situation for me.

Gea:
Backups? What backups? :D. Seriously though, my plan was to backup my Win7 desktop to one of the 2TB USB drives that would be free'd up once data is moved off of my current monstrosity and onto the ZFS server. I have also looked into the possibility of using an online backup service for my important data and using the local backup for data that I would want to be able to restore from in the event of data loss. The bulk of my data is in my media share and it just isn't realistic to back that up. On the bright side that data is not unique and can be rebuilt in the event of a catastrophic loss.

I like the idea of having different pools for different data. The thought had crossed my mind to create a performance pool and a capacity pool but I wasn't sure if it would really make that much of a difference in the long run since my ZFS pools are typically limited to the speed of the slowest drive. I could cut back on the 2TB drives in my proposed build which would free up the budget for creating a smaller performance pool.

I'll have to do a bit more research on the iSCSI topic. It sounds like performance won't be an issue from what I have been reading, but I need to decide if I want file level or block level access to the NAS storage. The reason I am leaning toward iSCSI is so that I can mount the device as a folder instead of a drive letter. Having the ability to do ZFS snapshots is nice but the data on these luns would be fairly static. Snapshots would be awesome for my home area though where I store all of my unique data!

Thanks again guys for the feedback so far! I really appreciate it.
 
What VMs will be running on the hypervisor? Single processor, X3430, 16GB of RAM isn't a lot of hardware. And running a NAS/SAN and VMs on the same machine? With three VMs? I would be hesitant.

Agility 2 L2ARC? Isn't that a bit low end for a cache...and a very small size too.

So two comments, the X3440 is a bit better if you are going to load up a lot of VMs since it does support Hyper-Threading. You can also look into 8GB Registered ECC DIMMs which run about $150 each to reach 32GB in four slots. Using registered ECC 4GB DIMMs on the six slot boards would let you use 24GB (this does not work with UDIMMs).

Second a 50GB Agility 2 is fine for a L2ARC drive actually.
 
hey MentatBOB, I am looking at this solution for my own home and wondering how this build went for you. any advice and/or problems you had?

-j
 
hey MentatBOB, I am looking at this solution for my own home and wondering how this build went for you. any advice and/or problems you had?

Oddly enough, I just ordered the parts for my build this past Monday and I won't get them until the 18th since I've missed the first two attempted deliveries. I did make some changes based on a bit more research as well as the feedback that I got on this thread. All said and done the cost came in at $2450 so I was still able to hit my target budget of $2500.

Build List
4 x SAMSUNG Spinpoint F3 HD103SJ 1TB 7200 RPM SATA 3.0Gb/s 3.5" Internal Hard Drive -Bare Drive
5 x SAMSUNG Spinpoint F4 HD204UI 2TB 5400 RPM SATA 3.0Gb/s 3.5" Internal Hard Drive -Bare Drive
1 x Corsair Force F40 CSSD-F40GB2 2.5" 40GB SATA II MLC Internal Solid State Drive (SSD)
1 x NORCO RPC-4224 4U Rackmount Server Case with 24 Hot-Swappable SATA/SAS Drive Bays
1 x CORSAIR Enthusiast Series CMPSU-650TX 650W ATX12V / EPS12V
2 x Crucial 8GB (2 x 4GB) 240-Pin DDR3 SDRAM DDR3 1333 (PC3 10600) ECC Unbuffered Server Memory Model CT2KIT51272BA1339
1 x Intel PWLA8391GT 10/ 100/ 1000Mbps PCI PRO/1000 GT Desktop Adapter - OEM
1 x NETGEAR ProSafe GS108T-200NAS 10/100/1000Mbps Gigabit Smart Switch
1 x SUPERMICRO MBD-X8SIL-F-O Xeon X3400 / L3400 / Core i3 series Dual LAN Micro ATX Server Board w/ Remote Management
4 x NORCO C-SFF8087-D SFF-8087 to SFF-8087 Internal Multilane SAS Cable - OEM
1 x NORCO C-SFF8087-4S Discrete to SFF-8087 (Reverse breakout) Cable - OEM
1 x Intel Xeon X3450 Lynnfield 2.66GHz LGA 1156 95W Quad-Core Server Processor BX80605X3450
2 x Intel SASUC8I PCI-Express x8 SATA / SAS (Serial Attached SCSI) Controller Card

NewEgg had a deal NORCO cases for 10% off so I chose to get an RPC-4224 instead of the RPC-4220.

I decided to go with two zfs pools as suggested. The pool for the ESX datastore which will be two mirrored vdevs on the 1TB 7200 RPM drives with a 40G SSD included as L2ARC. I went with a Crucial F40 SSD since OCZ has such a bad reputation on these forums. I could have gotten more performance out of this pool by using more striped vdevs but I really don't want to use more than 5 drives in this pool. I plan to replace the SSD with a newer one that can be used for L2ARC and ZIL but I am still waiting for the next gen of SSDs to hit the market and settle down a bit. Until then what I've got should suffice for home use.

The storage pool will be 5 x 2TB drives in a standard raidz1 configuration, nothing special here.

The CMPSU-500CX power supply can only push 34 amps on the 12v rail so I spent a few extra dollars and got the CMPSU-650TX which can push 52 amps on the 12v rail. The CMPSU-500CX would have been fine for my initial build but would not be enough for a fully loaded RPC-4220 let alone RPC-4224.

Last but not least, I upgraded the Intel Xeon X3430 to a X3450 since the price difference was so small.

I will undoubtedly be tweaking this here and there as I gain more hands on experience with the hardware but all in all it should give me a pretty good start.
 
wow, nearly identical to my shopping cart right now, just some swaps of hardware, samsungs --> hitachis, LGA1156 --> LGA1155, SASUC81 --> IBM M1015, Corsair --> Seasonic X750 (same supplies, different brand, and only 750W because I got the newegg sale price last week)

I am very interested in what you come up with. i'm looking to marry a SAN VM and a WHS2011 VM under ESXi.

good luck on monday, can't wait to hear how the build goes!
 
It took me some time but I got everything in and going. The power supply and one of the 2TB drives were DOA. I picked up a replacement power supply locally and sent the DOA one back for a refund and the 2TB drive is still out in the RMA process. So for now I've been avoiding the raidz capacity pool until that gets back.

Everything has been working really well so far, a much needed improvement over what I previously had. The one thing that is bugging me is that my CIFS reads seem to be lower than I would expect. My CIFS write speeds seem top notch for GigE but I can't seem to get better than 30 MB/s reads.

CIFS over GigE from Bare-Metal Win7 Desktop Client
Code:
-----------------------------------------------------------------------
CrystalDiskMark 3.0.1 x64 (C) 2007-2010 hiyohiyo
                           Crystal Dew World : [url]http://crystalmark.info/[/url]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :    40.389 MB/s
          Sequential Write :    86.545 MB/s
         Random Read 512KB :    45.502 MB/s
        Random Write 512KB :    74.247 MB/s
    Random Read 4KB (QD=1) :     6.377 MB/s [  1557.0 IOPS]
   Random Write 4KB (QD=1) :     4.725 MB/s [  1153.6 IOPS]
   Random Read 4KB (QD=32) :    69.117 MB/s [ 16874.3 IOPS]
  Random Write 4KB (QD=32) :     5.509 MB/s [  1345.0 IOPS]

  Test : 1000 MB [Z: 0.0% (0.0/1473.7 GB)] (x5)
  Date : 2011/04/26 15:40:04
    OS : Windows 7  SP1 [6.1 Build 7601] (x64)

iSCSI over GigE from Bare-Metal Win7 Desktop Client
Code:
-----------------------------------------------------------------------
CrystalDiskMark 3.0.1 x64 (C) 2007-2010 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   101.439 MB/s
          Sequential Write :    87.630 MB/s
         Random Read 512KB :    87.481 MB/s
        Random Write 512KB :    85.523 MB/s
    Random Read 4KB (QD=1) :     4.754 MB/s [  1160.6 IOPS]
   Random Write 4KB (QD=1) :     3.299 MB/s [   805.5 IOPS]
   Random Read 4KB (QD=32) :    72.609 MB/s [ 17726.9 IOPS]
  Random Write 4KB (QD=32) :     6.286 MB/s [  1534.6 IOPS]

  Test : 1000 MB [F: 58.0% (290.0/500.0 GB)] (x5)
  Date : 2011/04/26 15:13:53
    OS : Windows 7  SP1 [6.1 Build 7601] (x64)

For fun, Virtual Win7 Desktop on the NFS datastore.
Code:
-----------------------------------------------------------------------
CrystalDiskMark 3.0.1 x64 (C) 2007-2010 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   218.978 MB/s
          Sequential Write :   190.893 MB/s
         Random Read 512KB :   207.948 MB/s
        Random Write 512KB :   194.959 MB/s
    Random Read 4KB (QD=1) :     7.395 MB/s [  1805.5 IOPS]
   Random Write 4KB (QD=1) :     4.200 MB/s [  1025.5 IOPS]
   Random Read 4KB (QD=32) :   103.306 MB/s [ 25221.1 IOPS]
  Random Write 4KB (QD=32) :     6.902 MB/s [  1685.1 IOPS]

  Test : 1000 MB [C: 33.0% (10.5/31.9 GB)] (x5)
  Date : 2011/04/26 16:32:05
    OS : Windows 7  [6.1 Build 7600] (x64)
 
I've seen CIFS reads slower too - think it must be something protocol related. On another note, the NFS numbers look bogus - how are you getting around 200MB/sec over gig-e? Or is this all internal to the esxi host? I'm assuming yes since this is an 'all in one' thread :)
 
I've seen CIFS reads slower too - think it must be something protocol related. On another note, the NFS numbers look bogus - how are you getting around 200MB/sec over gig-e? Or is this all internal to the esxi host? I'm assuming yes since this is an 'all in one' thread :)

Yep! That last benchmark was all internal to the esxi host. Not really useful for most purposes but it's still fun. The CIFS read thing is irritating because I would expect reads to be faster than writes and it seems like some people get higher read rates while others don't. What is also interesting is I get the same read speed on the Virtual Win7 Desktop client but write speeds are in the 150MB/s range which can be attributed to being internal to the esxi host.
 
Sorry if I missed it, but what OS did you go with? I am working on a similar project, and will have to start weighing my options.
 
Actually, the NFS internal thruput is NOT useless! If that is where you are hosting the datastore, any disk reads in a VM (including launching applications) will win bigtime.
 
Sorry if I missed it, but what OS did you go with? I am working on a similar project, and will have to start weighing my options.

I finally decided on Solaris 11 Express with Gea's napp-it web interface. So far it's been a solid choice.

Actually, the NFS internal thruput is NOT useless! If that is where you are hosting the datastore, any disk reads in a VM (including launching applications) will win bigtime.

The internal thruput is very close to native so my virtual machines are happy. This is also the same pool that I use to serve iSCSI from and the limitation there is definitely the GigE connection and not the disk backend.

Code:
        NAME         STATE     READ WRITE CKSUM
        perf-r10-01  ONLINE       0     0     0
          mirror-0   ONLINE       0     0     0
            c10t3d0  ONLINE       0     0     0
            c9t3d0   ONLINE       0     0     0
          mirror-1   ONLINE       0     0     0
            c10t7d0  ONLINE       0     0     0
            c9t7d0   ONLINE       0     0     0
        cache
          c10t5d0    ONLINE       0     0     0

Version 1.03c       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
2011.04.26      16G 63283  98 214173  23 127026  18 51123 100 346658  16  3009   6
 
I assume the disk controllers are passed thru?

Yeah. The two SASUC8I cards are configured for VMDirectPath and the onboard SAS2008 controller is left for esxi, which is installed on a USB key and the two disks on the SAS2008 controller are available as local datastores, one of which is the NAS VM.

I have my two zfs pools setup so they stripe across controllers and ports. Now that I look at it, I probably should have mirrored c10t3d0 with c9t3d0 and c9t7d0 with c10t7d0 so data would stripe down c10t3d0 and then c9t3d0. At least my pools will stripe down controllers and ports instead of keeping all I/O on a single channel.

If you lay this out on the Norco RPC-4224 case, it looks something like this:
Code:
SASUC8I 0-0 (VMDirectPath)   | c10t3d0 | c10t2d0 | c10t1d0 |   --    |
SASUC8I 0-1 (VMDirectPath)   | c10t7d0 | c10t6d0 | c10t5d0 |   --    |
SASUC8I 1-0 (VMDirectPath)   | c9t3d0  | c9t2d0  |  --     |   --    |
SASUC8I 1-1 (VMDirectPath)   | c9t7d0  | c9t6d0  |  --     |   --    |
   ----                      |   --    |   --    |  --     |   --    |
SAS2008  (ESXi)              | vmhba0  | vmhba33 |  --     |   --    |

When I do my first expansion I wold want to relocate c10t1d0 and c10t5d0 to the first two drive bays on the 5th drive row; which would require that I get one last controller with at least 4 ports on it. I would then grow the NAS with two more 4+1 raidz sets.
 
Back
Top