Looking for a tool/ Filesystem

drizzt81 · Jun 15, 2007

Hi,

I am looking for a tool or a filesystem (not sure which). I have two distant servers, which are connected to the internet via a slow connection. I have 400GiB of data (files) that I would like to replicate between the two servers. Since all of this data is at one site now. I can use a Digital Squab Line to transfer the initial data set between the two sites by mailing a HDD with all the source data.

I was planing on using rsync to manage changes within the files that I have. From my limited understanding there is a large drawback in rsync: It only looks at one file each time. This means that moving a file to a different location at the source will require me to retransmit the complete file to the destination. Or is rsync smarter than I think?

A more efficient approach would be to ``track changes'' done at the source file system and replicate these to the destination. My question now: Is there a tool or a filesystem that does this type of thing? Free would be preferred and windows compatibility is a bonus. From my understanding Win2K3 R2 does this type of thing with the improved file-replication service, but buying two copies of win2K3 r2 is out of the question due to the prohibitive price.

eth00 · Jun 15, 2007

drbd is a linux project which does exactly this, the new version .8 can even have both doing writes at the same time. I have personally never tried it on anything but private network connecting 2 boxes via a gige but it should work for you.

unhappy_mage · Jun 15, 2007

drizzt81 said:
I was planing on using rsync to manage changes within the files that I have. From my limited understanding there is a large drawback in rsync: It only looks at one file each time. This means that moving a file to a different location at the source will require me to retransmit the complete file to the destination. Or is rsync smarter than I think?

It's smarter than that. It looks at md4 sums of each file, and when those don't match it starts looking at individual chunks of the file. Each chunk gets its "rolling" checksum compared against the remote version, and then copies only the blocks that don't match. You can also use gzip compression over-the-wire to speed this up, if the network is the bottleneck.

I suggest you use an implementation of RFC2549 on your initial copy, lest you lose packets.

drizzt81 said:
A more efficient approach would be to ``track changes'' done at the source file system and replicate these to the destination.

What sort of data is this? Would having a repository of past versions be desirable?

drizzt81 · Jun 15, 2007

unhappy_mage said:
What sort of data is this? Would having a repository of past versions be desirable?

It's mostly audio and video files. Changes are slow (tag updates here and there) and past versions are not desirable. I guess moving deleted files to a special subfolder would be nice.

eth00 said:
drbd is a linux project which does exactly this, the new version .8 can even have both doing writes at the same time. I have personally never tried it on anything but private network connecting 2 boxes via a gige but it should work for you.

that does look interesting. What I am not sure of is whether it allows me to pre-seed the second mirror and use the first mirror without a second one online.

unhappy_mage said:
It's smarter than that. It looks at md4 sums of each file, and when those don't match it starts looking at individual chunks of the file. Each chunk gets its "rolling" checksum compared against the remote version, and then copies only the blocks that don't match. You can also use gzip compression over-the-wire to speed this up, if the network is the bottleneck.

I understand that, but how does it know that the file A exists on the destination system in a different subfolder?

Looking for a tool/ Filesystem

drizzt81

[H]F Junkie

eth00

[H]ard|Gawd

unhappy_mage

[H]ard|DCer of the Month - October 2005

drizzt81

[H]F Junkie