Looking for a tool/ Filesystem

drizzt81

[H]F Junkie
Joined
Jan 21, 2004
Messages
12,361
Hi,

I am looking for a tool or a filesystem (not sure which). I have two distant servers, which are connected to the internet via a slow connection. I have 400GiB of data (files) that I would like to replicate between the two servers. Since all of this data is at one site now. I can use a Digital Squab Line to transfer the initial data set between the two sites by mailing a HDD with all the source data.

I was planing on using rsync to manage changes within the files that I have. From my limited understanding there is a large drawback in rsync: It only looks at one file each time. This means that moving a file to a different location at the source will require me to retransmit the complete file to the destination. Or is rsync smarter than I think?

A more efficient approach would be to ``track changes'' done at the source file system and replicate these to the destination. My question now: Is there a tool or a filesystem that does this type of thing? Free would be preferred and windows compatibility is a bonus. From my understanding Win2K3 R2 does this type of thing with the improved file-replication service, but buying two copies of win2K3 r2 is out of the question due to the prohibitive price.
 
drbd is a linux project which does exactly this, the new version .8 can even have both doing writes at the same time. I have personally never tried it on anything but private network connecting 2 boxes via a gige but it should work for you.
 
I was planing on using rsync to manage changes within the files that I have. From my limited understanding there is a large drawback in rsync: It only looks at one file each time. This means that moving a file to a different location at the source will require me to retransmit the complete file to the destination. Or is rsync smarter than I think?
It's smarter than that. It looks at md4 sums of each file, and when those don't match it starts looking at individual chunks of the file. Each chunk gets its "rolling" checksum compared against the remote version, and then copies only the blocks that don't match. You can also use gzip compression over-the-wire to speed this up, if the network is the bottleneck.

I suggest you use an implementation of RFC2549 on your initial copy, lest you lose packets.

A more efficient approach would be to ``track changes'' done at the source file system and replicate these to the destination.
What sort of data is this? Would having a repository of past versions be desirable?
 
What sort of data is this? Would having a repository of past versions be desirable?
It's mostly audio and video files. Changes are slow (tag updates here and there) and past versions are not desirable. I guess moving deleted files to a special subfolder would be nice.

drbd is a linux project which does exactly this, the new version .8 can even have both doing writes at the same time. I have personally never tried it on anything but private network connecting 2 boxes via a gige but it should work for you.

that does look interesting. What I am not sure of is whether it allows me to pre-seed the second mirror and use the first mirror without a second one online.

It's smarter than that. It looks at md4 sums of each file, and when those don't match it starts looking at individual chunks of the file. Each chunk gets its "rolling" checksum compared against the remote version, and then copies only the blocks that don't match. You can also use gzip compression over-the-wire to speed this up, if the network is the bottleneck.
I understand that, but how does it know that the file A exists on the destination system in a different subfolder?
 
Back
Top