The Hunter
Limp Gawd
- Joined
- May 29, 2002
- Messages
- 414
So, having found a pretty good deal on 3x 500gig 7200.10's, I've started setting up the box I'm going to use for my raid5 fileserver and playing around with mdadm using some virtual disks for practice. After getting the basics down, I decided to try something a little more adventurous - migrating an array from 3 'disks' of one size, to 3 larger 'disks'. I thought I would share my results.
For the impatient, here are the basic steps:
This was all done using Ubuntu 6.10 Server, mdadm 2.4.1 and the 2.6.17.14 kernel, compiled to support raid5 expand (Ubuntu 6.10 doesn't include this by default).
Here are the specifics of what I did for the curious:
Create 3 10MB images and 3 20MB images and mount them as loop devices:
Create a 3 device raid5 array composed of the 10M 'disks':
Create a filesystem and mount it - I used xfs and mounted at /testdir. If you were doing this on a real system you'd probably want to unmount for safety before expanding, but for demonstration purposes, I kept it mounted throughout.
Now add your larger 'disks' to your array (if doing this for real, you'd probably add them one by one as you remove each smaller disk in the next step):
Now we mark our smaller 'disks' as failed. IMPORTANT: do this one at a time, and make sure it finishes syncing (check using #cat /proc/mdstat) before you fail the next disk.
Your /proc/mdstat should look like this:
NOT this:
Once you have repeated that process for your three smaller 'disks', remove them from the array:
You should end up with something like this:
Now that we have our array consisting only of larger 'disks', we expand it to use the new space:
Expand the filesystem to match:
And we've migrated a full array to larger disks while still online.
Before:
After:
So, some questions remain:
This was just using loopback devices as virtual drives, is there something I've overlooked that would make this proceedure not work with actual drives?
If it is technically possible to do with live data/drives, is it a very good idea? It seems to me that aside from the time delays of having to wait for each drive to resync as you swap them one by one, there aren't really any more risks to this than to a normal raid5 expand.
Slightly off-topic, what are resync times like when a spare disk gets put into use? I've heard that online expansion can take almost a full day sometimes, is resync as long? (If so, it almost becomes prohibitively long to do this, if you're losing 2 or 3 days just to add the drives in, before you expand the capacity).
Has anyone actually tried this or thought about it?
For the impatient, here are the basic steps:
- Create your starting array with the smaller disks
- Add the larger disks to the array as spares
- Mark the old disks as failed *one at a time*
- Remove the old disks from the array
- Grow the array to encompass new space
- Wonder if this is really smart to try with live data and real disks?
This was all done using Ubuntu 6.10 Server, mdadm 2.4.1 and the 2.6.17.14 kernel, compiled to support raid5 expand (Ubuntu 6.10 doesn't include this by default).
Here are the specifics of what I did for the curious:
Create 3 10MB images and 3 20MB images and mount them as loop devices:
Code:
$ dd if=/dev/zero of=img0 bs=1M count=10
$ dd if=/dev/zero of=img1 bs=1M count=10
$ dd if=/dev/zero of=img2 bs=1M count=10
$ dd if=/dev/zero of=img3 bs=1M count=20
$ dd if=/dev/zero of=img4 bs=1M count=20
$ dd if=/dev/zero of=img5 bs=1M count=20
# losetup /dev/loop0 img0
# losetup /dev/loop1 img1
# losetup /dev/loop2 img2
# losetup /dev/loop3 img3
# losetup /dev/loop4 img4
# losetup /dev/loop5 img5
Create a 3 device raid5 array composed of the 10M 'disks':
Code:
# mdadm --create /dev/md0 --auto=yes -l 5 -n 3 /dev/loop0 /dev/loop1 /dev/loop2
# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 loop2[2] loop1[1] loop0[0]
20352 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
Create a filesystem and mount it - I used xfs and mounted at /testdir. If you were doing this on a real system you'd probably want to unmount for safety before expanding, but for demonstration purposes, I kept it mounted throughout.
Now add your larger 'disks' to your array (if doing this for real, you'd probably add them one by one as you remove each smaller disk in the next step):
Code:
# mdadm /dev/md0 --add /dev/loop3 /dev/loop4 /dev/loop5
mdadm: added /dev/loop3
mdadm: added /dev/loop4
mdadm: added /dev/loop5
# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 loop5[5](S) loop4[4](S) loop3[3](S) loop2[2] loop1[1] loop0[0]
20352 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
Now we mark our smaller 'disks' as failed. IMPORTANT: do this one at a time, and make sure it finishes syncing (check using #cat /proc/mdstat) before you fail the next disk.
Code:
# mdadm /dev/md0 -f /dev/loop0
mdadm: set /dev/loop0 faulty in /dev/md0
Code:
# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 loop5[0] loop4[3](S) loop3[4](S) loop2[2] loop1[1] loop0[5](F)
20352 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
Code:
# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 loop5[3] loop4[4](S) loop3[5](S) loop2[2] loop1[1] loop0[6](F)
20352 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU]
[==================>..] recovery = 90.0% (9600/10176) finish=0.0min speed=1600K/sec
Once you have repeated that process for your three smaller 'disks', remove them from the array:
Code:
# mdadm /dev/md0 -r /dev/loop0 /dev/loop1 /dev/loop2
mdadm: hot removed /dev/loop0
mdadm: hot removed /dev/loop1
mdadm: hot removed /dev/loop2
You should end up with something like this:
Code:
# cat /proc/mdstat Personalities : [raid5] [raid4]
md0 : active raid5 loop5[0] loop4[1] loop3[2]
20352 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
Now that we have our array consisting only of larger 'disks', we expand it to use the new space:
Code:
# mdadm /dev/md0 --grow --size=max
# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 loop5[0] loop4[1] loop3[2]
[B]40832 blocks[/B] level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
Expand the filesystem to match:
Code:
# xfs_growfs /testdir/
Before:
Code:
# df -h /dev/md0
Filesystem Size Used Avail Use% Mounted on
/dev/md0 15M 72K 15M 1% /testdir
Code:
# df -h /dev/md0
Filesystem Size Used Avail Use% Mounted on
/dev/md0 [B]35M[/B] 100K 35M 1% /testdir
So, some questions remain:
This was just using loopback devices as virtual drives, is there something I've overlooked that would make this proceedure not work with actual drives?
If it is technically possible to do with live data/drives, is it a very good idea? It seems to me that aside from the time delays of having to wait for each drive to resync as you swap them one by one, there aren't really any more risks to this than to a normal raid5 expand.
Slightly off-topic, what are resync times like when a spare disk gets put into use? I've heard that online expansion can take almost a full day sometimes, is resync as long? (If so, it almost becomes prohibitively long to do this, if you're losing 2 or 3 days just to add the drives in, before you expand the capacity).
Has anyone actually tried this or thought about it?