Stripe size, file system recommendations for RAID 6?

Rakinos

[H]ard|Gawd
Joined
Jul 31, 2001
Messages
1,520
I have 7 new 1TB hitachi's on their way to be used in a RAID 6, and I want to make sure that I use an optimal configuration.

I'm currently using 10, 500GB drives in a RAID 6 with a 256K stripe on an Adaptec 51645.
The file system is xfs, configured as shown below:
Code:
tuckie@nibbler:~$ sudo xfs_info /dev/sdd1
meta-data=/dev/sdd1              isize=256    agcount=4, agsize=243736673 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=974946691, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

hdparm -tT currently returns:
Code:
Timing cached reads:   880 MB in  2.00 seconds = 439.81 MB/sec
 Timing buffered disk reads:   90 MB in  3.02 seconds =  29.80 MB/sec

Which is less than stellar. There are some tips here, but it sounds like it would require a battery unit installed (which I don't have, although I do use a standard UPS on the server). Note: this may be due to having too large of an array causing a slowdown, as well as using various versions of WD RE2s.

I'm going to be storing primarily DVD backups (to be played over the network via xbmc) and backups of files and documents from other computers on the network. Finally, the questions: What stripe size would your recommend for the array? Any (linux) file system preferences for large partitions with large files? What settings (block size, mount options, etc) would be best for that partition?
 
I use xfs on my file server, I also run it directly on block device without partitions ( I use lvm to carve up space)

this is a raid 6 array of 12x Seagate 1tb on Areca 1280ML, stripe size is 64kb

Code:
tera ~ # xfs_info /mnt/storage/
meta-data=/dev/mapper/tera-storage isize=2048   agcount=10, agsize=268435440 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=2441901056, imaxpct=5
         =                       sunit=16     swidth=160 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

Code:
tera ~ # hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   9036 MB in  2.00 seconds = 4523.69 MB/sec
 Timing buffered disk reads:  1318 MB in  3.00 seconds = 439.03 MB/sec

I'm not saying this is optimal for performance but I have no issues streaming 4 HD movies/shows at same time.

Just my $0.02
 
Last edited:
Tuning sunit and swidth is absolutely critical for decent performance out of XFS on RAID. You can do this after the fact at mount-time. It's also fairly important to make sure all your partition/volume boundaries are aligned to the RAID blocks, but not nearly as much. You can also tune the readahead to get better sequential performance at the cost of some memory and possibly slowing down random reads.

Some information at http://hep.kbfi.ee/index.php/IT/KernelTuning but it's a fairly complicated thing to set up quite right. Still, your performance is pathetic, I can't help but think there's more to it, a driver issue or such. My untuned software RAID5 with 4x500GB posted ~80MB reads and 70MB writes. With some tuning it gets almost 220MB/s reads and 180MB/s writes. With 10 spindles you should stomp those numbers.
 
I wasn't quite thinking when I gave my initial speeds, I had forgotten I was running on a degraded array :p .

Anyway, I'm currently copying files from the old array to the new one, and I just thought I'd post some new speed tests/benchmarks:

Code:
sudo mkfs.xfs -b size=4096 -d sunit=512,swidth=2560 -L raid /dev/sdc1
sudo mount -t xfs -o sunit=512,swidth=2560 /dev/sdc1 /mnt/newraid


write speed:
Code:
sudo time sh -c "dd if=/dev/zero of=bigfile bs=8k count=1000000 && sync"
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB) copied, 39.2657 s, 209 MB/s
0.26user 16.04system 0:41.25elapsed 39%CPU (0avgtext+0avgdata 0maxresident)k
256inputs+16000464outputs (2major+810minor)pagefaults 0swaps

read speed:
Code:
sudo time dd if=bigfile of=/dev/null bs=8k
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB) copied, 19.027 s, 431 MB/s
0.23user 14.32system 0:19.05elapsed 76%CPU (0avgtext+0avgdata 0maxresident)k
16000152inputs+0outputs (1major+281minor)pagefaults 0swaps

Not a bad bump in speed if I do say so myself ;) Thanks for the tip about sunit and swidth. I may try some of the tweaks on the adaptec blog later, but for now this is good enough.
 
looks good, I would suggest you also add this to your mount options
Code:
 noatime,nodiratime
Also there is no need to specify sunit and swidth on mount unless you want to mount the xfs system with diff values then it was created with.

I also noticed you created partitions on the raid device, if you plan on doing OCE in the future it's a lot better solution to run xfs directly on raw device with no partitions or to use lvm . Avoids a lot of headaches when it comes to growing xfs after the OCE.
 
Thanks for the tip on the mount options, but it looks like nodiratime isn't needed if noatime is specified: http://lwn.net/Articles/245002/ (discovered after a quick google to figure out what noatime was).

I'll go ahead and remove the sunit & swidth from fstab, one article I followed said to specify, so that's what I did :cool:

I'm not too concerned about OCE, I decided to abandon the notion of using it when it took over a week of nail biting waiting for it to finish the first time I did this on my old array (which is normal/quick according to adaptec). I will probably just copy to another location(s), wipe and rebuild, like I'm kind of doing now.
 
ya adaptec is bit slow on OCE on my areca it took ~30 hours to add 1 disk, going from 11 to 12.
 
Back
Top