Verify Data Integrity automatically

Movendi

n00b
Joined
May 16, 2008
Messages
6
I need a simple system to detect file corruption and bit-rot of both original and backup copies. I'm looking for a windows-based solution so that it can warn me if file corruption is occurring. I'm aware people are fond of zfs and its data scrubbing/integrity abilities, however I'm working with windows 7 ntfs systems. I rather not work with anything that has to do with linux, command lines or anything complicated.

Requirements:

  1. Verification must be minimum md5 or higher (sha1 / bit comparison)
  2. Verify file integrity after transferring files using MD5 or higher.

  3. Schedule various checksum profiles and monitor for failed checksums
a) Schedule verification of original files with its original checksum fortnightly
b) Schedule verification of backups with original checksum monthly
c) Warn me if file integrity has failed checksum of its respective file/directory.
d) Profiles for monitoring checksum. Something like Syncback's profile management but purely for scheduling verifications of different directories (profiles). Some folders may require more frequent verification against data corruption (personal photos) vs other media (music)

Eg. 2 profiles
(Profile 1) - Original Photos [A] directory creates checksum/(snapshot?). Verifies fortnightly for failed checksums.

(Profile 2) - Backup Photos directory is verified with Original Photos [A] checksum. Verifies monthly for failed checksum



I'm not familiar with snapshots but would that be useful for what I'm after?

I'm not sure how to address modifications made to Original Files. This will send integrity mismatches of its associated [A] checksums. How would I elegantly address this issue?

What tool(s) do I require? What other options are there that are stress-free and easy to setup and maintain?

Tools I have tried:
Beyond Compare - So far the best tool for comparing files directories but not so much for validating original file integrity. It doesn't handle checksums. It uses md5 for comparisons of directories A-B but does not detect data corruption of the original directory [A-A]. That's where checksum comes handy and that's where this program is lacking. Otherwise an almost fit solution for the above requirements.

Teracopy - It's okay for simple file transfers but need md5 (+) verification

Syncback - Good program for making backup profiles BUT it only does (CRC) verification if backup is succesfully transferred. You can't use the program to detect data corruption of the original files or backups.

FileVerifier++ - Good checksum tool that will flag if a file has a modified checksum, but you have to manually browse to the directory each time you open program. No profiles or schedules can be made

Exactfile - Similar to above but not as easy to detect failed checksums
 
All modern drives have built-in logic to do "automatic repair" of the sectors they read. In order for this process this process to take place, two things must happen.

1. All sectors need to be read (for full file integrity to be achieved)
2. And (as I have learned in this very forum) an attempt must be made to also write to that sector which triggers the drives logic to (if the sector is not within the acceptable tolerance of reading) take the data from the bad sector and move it to a fresh and working sector.

As far as I am aware, this is the only way to achieve what you want.

Here is the problem with this, obviously your drive is not reading every single sector 24 hours a day. So, you are forced to schedule something that does read everything and verify. Outside of checksumming everything, I think that is the only option.

I'm still reading your post ...

Done reading ...

I'm still left with the same response.

To achieve what you want would require some sort of underlying application that would intercept the Windows api calls and so everything on the fly. I don't think the industry has ever called for such a program because they rely on backups (which I know is a bad way to handle it, why not be preventative).

You also have to consider that if you wanted such a program, you are adding overhead to your hard drives operation. I don't know how that would effect lifespan and whether you're speeding to the time when the drive will fail.
 
Last edited:
I need a simple system to detect file corruption and bit-rot of both original and backup copies. I'm looking for a windows-based solution so that it can warn me if file corruption is occurring. I'm aware people are fond of zfs and its data scrubbing/integrity abilities, however I'm working with windows 7 ntfs systems. I rather not work with anything that has to do with linux, command lines or anything complicated.

I would say, no real chance with ntfs as it does not offer checksums of your source data and you cannot trust the filesystem without a offline chkdsk run prior copy. You are able to do checksums when copying unsecure/ potentially bad data with the result that your backup is verified but contains the same bad data.

So only option on Windows + ntfs is doing checksums on data with a checksum tool but this is not realtime and does not cover the initial data and you have to rerun the tool and a chkdsk prior any copy if you want to be sure.

If you really want to adress file corruption/ silent data errors you need the newest generation of filesystems with ZFS as the highlight and with btrfs and ReFS as other options offering the same basic principles like realtime checksums on metadata and user data together with copy on write to offer an always consistent filesystem.

On Windows you can only use ReFS (not yet really comparable to ZFS) or you can use a web-based storage appliance based on Solaris where ZFS comes from or BSD (managed with a Web-UI) so you can use ZFS (CIFS share or via iSCSI that you can use like a local ntfs disk with ZFS security).
 
I would say, no real chance with ntfs as it does not offer checksums of your source data and you cannot trust the filesystem without a offline chkdsk run prior copy. You are able to do checksums when copying unsecure/ potentially bad data with the result that your backup is verified but contains the same bad data.

So only option on Windows + ntfs is doing checksums on data with a checksum tool but this is not realtime and does not cover the initial data and you have to rerun the tool and a chkdsk prior any copy if you want to be sure.

If you really want to adress file corruption/ silent data errors you need the newest generation of filesystems with ZFS as the highlight and with btrfs and ReFS as other options offering the same basic principles like realtime checksums on metadata and user data together with copy on write to offer an always consistent filesystem.

On Windows you can only use ReFS (not yet really comparable to ZFS) or you can use a web-based storage appliance based on Solaris where ZFS comes from or BSD (managed with a Web-UI) so you can use ZFS (CIFS share or via iSCSI that you can use like a local ntfs disk with ZFS security).

Would a scheduled SpinRite Level 2 pass every month or so not be sufficient in this case? The only downside to running a scheduled SpinRite run is it's not real-time, but as you have noted ZFS is basically the next step up. I want to be sure I'm posting reliable information.
 
Would a scheduled SpinRite Level 2 pass every month or so not be sufficient in this case?

This may detect corruption (not all possible corruption) but not gauranteed to fix it.
 
I would not so much be concerned with the amount of time or money as the actual quality of program you're going to end up with. If you're going into this feet first and there's a bug in the code that is supposed to protect your data, you will spend more time and money on this new method than you would using a tried and true method, like ZFS or some other proven medium for accomplishing file integrity.

On top of this, what if Microsoft comes out with some sort of change to the Windows API that starts breaking this program and you have to pay to get that fixed.
 
Would a scheduled SpinRite Level 2 pass every month or so not be sufficient in this case? The only downside to running a scheduled SpinRite run is it's not real-time, but as you have noted ZFS is basically the next step up. I want to be sure I'm posting reliable information.

If you write data to ntfs you have no security that the filesystem is ok at that moment and you have no security that any data is written correctly or not modified by choice for whatever reason until the time you run a checksum tool. A surface test is not helpful for these sort of problems. What you need is copy on write and end to end checksums over driver, cabling and disks.

You need to stay with a traditional security level (do backups and hope the best) or you need a next generation filesystem (btrfs, ReFS, ZFS or similar approaches like netapp). There is no way between.
 
well you could create parchive files for the backups, but it seems he wants some fully automated data backup and repair.

he could get somebody to write him some software based on this :- http://en.wikipedia.org/wiki/Parchive

doesn't seem like its worth it.

or am I getting confused as to what he wants?
 
If you write data to ntfs you have no security that the filesystem is ok at that moment and you have no security that any data is written correctly or not modified by choice for whatever reason until the time you run a checksum tool. A surface test is not helpful for these sort of problems. What you need is copy on write and end to end checksums over driver, cabling and disks.

You need to stay with a traditional security level (do backups and hope the best) or you need a next generation filesystem (btrfs, ReFS, ZFS or similar approaches like netapp). There is no way between.

Agreed. To be clear, SpinRite is not simply a surface scan. It takes control of the SATA controller and writes data, then reads it, writes the data in reverse, reads it again. This is not a software package you run in Windows as well to be clear. It runs in it's own FreeDOS instance which eliminates ANY "middle man" from altering the desired action on the drive. If the drive is at the point where it's being told one thing and it's doing another - this is a much bigger problem. But, SpinRite has logic to do it's OWN checksuming and verify that each 1 and 0 are "strong enough" to be read properly. Stepping back from ZFS and not being able to afford that kind of protection, SpinRite is a worthwhile product to schedule. But, to be clear, I think the ZFS model is the best - and the one I prefer.
 
IIf you really want to adress file corruption/ silent data errors you need the newest generation of filesystems with ZFS as the highlight and with btrfs and ReFS as other options offering the same basic principles like realtime checksums on metadata and user data
Actually, ReFS does not checksum user data, it only checksums the metadata. So after a fsck, your data might still be corrupt. You must manually activate checksumming of user data on ReFS. Default is checksum only metadata. Why is ReFS not protecting user data by default? A guess it because it is very difficult to get it right, and MS knows their ReFS checksumming is not good enough today.

As CERN concluded, to add checksums is not enough to provide data integrity protection. It must be done correctly. For isntance, all hard disks have a large amount of the area dedicated to checksums (something like 10% of the capacity is dedicated to checksums), but still you get data corruption on hard disks. Despite all checksums. It must be done right. There are several research papers showing that ZFS does checksumming correctly, but not research on ReFS nor on BTRFS. I would not trust either of them until we see research.
 
For isntance, all hard disks have a large amount of the area dedicated to checksums (something like 10% of the capacity is dedicated to checksums), but still you get data corruption on hard disks.

Because checksums aren't about fixing corrupt data, checksums are only about detecting corrupt data. To prevent corrupt data you also need redundancy via parity or mirroring.

If you run ZFS with a single disk it will get corrupt data like any disk despite all it's checksumming.

For example, I backup my ZFS array to single offline hard drives. I run a simple file integrity checker which computes the SHA-1 hash of each file. I run it every month or so to verify my backups haven't bit-rotted.

In order for the verification to pass that means every bit of that file had to have been read successfully and can't have been different than when the hash was first calculated, or else there is no way the hash would ever come out the same. If verification fails, I get a fresh copy of it from my array again.

I would love to see an example where a secure checksum says a file is correct, but yet the file is actually corrupt. The only way that could happen is if the file corrupted in such a way that the new file would cause a hash collision with the old file which is astronomical.
 
Because checksums aren't about fixing corrupt data, checksums are only about detecting corrupt data. To prevent corrupt data you also need redundancy via parity or mirroring.

Well said. Redundancy is paramount. From a probability standpoint, the more redundant data is, the less likely you are to have bad/corrupt data. Here's what I mean. I'm not just stating the obvious. If you have 3 drives that are all instructed to "write file A" to disk. The controller will then tell each drive, "hey, write file A to 1, 2 and 3 disks". The probability that ALL drives will write the data incorrectly even though they return they did is significantly lower when you have multiple drives. On top of that, if you have checksuming happening and you CHECK each time you write to see what the checksum is calculated as on each file, you will all but eliminate file corruption. I hope I'm making some sense.
 
For a secure data storage, you need them all

- A always consistent filesystem (CopyOnWrite)
- Realtime and end to end checksums on metadata and userdata to detect errors during read
- Paired with redundancy to auto-fix detected errors on access or scrubbing (self-healing filesystem)
 
- Paired with redundancy to auto-fix detected errors on access or scrubbing (self-healing filesystem)

I love the phrase "self-healing". It sounds like a monster out of LOTR or something. Not downplaying that by any means. But, every time I come across that phrase in IT, it just gives me a feeling of "mwahahah, you cut my hand off, I'll grow it back!" :D.

"It's a ZFS monster! RUN!"
 
Because checksums aren't about fixing corrupt data, checksums are only about detecting corrupt data. To prevent corrupt data you also need redundancy via parity or mirroring.
All hard disks are checksumming and are able to recover from 1 bit errors, because of the error correcting code that all hard disks have. It is like ECC RAM - they are able to recover from 1 bit errors. And detect 2 bit errors. Regarding hard disks, they surely are able to recover from 1 bit errors, and a lot more. Maybe they are using Goppa code, I dont know. My point is, hard disks have checksums/error detecting codes to detect errors. And still they get errors. Some enterprise hard disks (fibre channel, etC) even have IDF, the new standard to detect and correct data corruption - and still IDF disks also gets data corruption which you can see in the hard disk specification. Here is an IDF fibre channel disk, which says "1 irrecoverable error per 10^7 bits" or something like that:
http://en.wikipedia.org/wiki/Hard_disk_error_rates_and_handling#ERRORRATESHANDLING

If you run ZFS with a single disk it will get corrupt data like any disk despite all it's checksumming.
Not necessarily. You can use "copies=2" on ZFS for a single disk, which means ZFS stores all data twice (halving the storage capacity). Or you can even specify "copies=3" which means ZFS stores all data thrice. So, I dont agree with your claim.

I would love to see an example where a secure checksum says a file is correct, but yet the file is actually corrupt. The only way that could happen is if the file corrupted in such a way that the new file would cause a hash collision with the old file which is astronomical.
Some hash protocols are broken, which means you can find a collision in polynomial time. For isntance, some time ago there were some researchers that wrote a hashed text on a web site, prior to the USA president election. The researchers claimed they could foresee which president candidate would win. After the election, they gave away the hash procedure so people could decipher the text, which revealed the correct candidate name. It turned out that the researchers had found a flaw in the hash protocol so they had hashed all president candidates to the same hash message, which they pasted on the web site. So they had found a way to hash all president candidates to the same hash, allowing them to just swap the message to the correct president candidate, after the election.

For instance, by default, ZFS uses the fletcher2(?) hash protocol to detect data corruption. But it turned out that the fast fletcher2 protocol was unreliable. So now you can instead choose fletcher4, or the slow but reliable SHA-256 protocol for ZFS. If you want to find a collision on a "secure" hash protocol you must use super polynomial computation time.
 
If you spend the hours you're spending looking at Windows solutions on trying out ZFS instead and learning a few command line tools (or using Gea's great Napp-it for the most part) you would have a working, checksummed and self-healing system at the end as opposed to something that's second rate to your requirements, if close at all.

There are many great online guides and resources for learning the basics and if you're at all computer-minded it wouldn't take long. If you care that much about data integrity what's a little learning?
 
Not necessarily. You can use "copies=2" on ZFS for a single disk, which means ZFS stores all data twice (halving the storage capacity). Or you can even specify "copies=3" which means ZFS stores all data thrice. So, I dont agree with your claim.

Every time you reply to me you always put words in my mouth and I don't really appreciate it. Either that or you try to use your own interpretation of my statement in your own way which is simply incorrect. You make assumptions. Like assuming I was talking about "copies=2". I was not. I was talking about "copies=1".

My statement was:

"Because checksums aren't about fixing corrupt data, checksums are only about detecting corrupt data. To prevent corrupt data you also need redundancy via parity or mirroring."
"If you run ZFS with a single disk it will get corrupt data like any disk despite all it's checksumming."

My statement is that ZFS with "copies=1" with a single disk will get corrupt data like any disk despite all it's checksumming.

Do you now agree with my statement? Because that was always my original statement intention. Obviously If you change my statement and add "copies=2" then you will agree with that one and say you disagree with my statement, but that wasn't my statement so you are disagreeing with a completely different made up statement that I never intended. Adding "copies=2" means you are adding a form of "mirroring" of sorts which I also mentioned as what you DO need in order to prevent corrupt data.

If you really want to get detailed about it then I should add that you need checksums to detect corruption, and you also need "enough" parity or mirroring to prevent some amount of corrupt data.

Hard drives detect and repair single bit corruption as you said. The reason they fall short is because they don't have "enough" parity data. It's no different for ZFS. If you used "copies=2" data corruption could still happen. If the same bit corrupted in both copies before it could repair the corruption.

So it's not that hard drives don't prevent corruption and ZFS does. That's not accurate. They both do prevent corruption up to their configured level of parity/mirroring setup. They can also both fail if corruption exceeds the amount of parity/mirroring each is configured for. ZFS just has the ability to be configured with more parity/mirroring than a hard disk can on it's own.

That's the only point I was trying to get at I guess.
 
Last edited:
Every time you reply to me you always put words in my mouth and I don't really appreciate it. Either that or you try to use your own interpretation of my statement in your own way which is simply incorrect. You make assumptions. Like assuming I was talking about "copies=2". I was not. I was talking about "copies=1".

My statement was:
"If you run ZFS with a single disk it will get corrupt data like any disk despite all it's checksumming."

My statement is that ZFS with "copies=1" with a single disk will get corrupt data like any disk despite all it's checksumming.
Well, I only read what you write. I have no way of reading your mind. If you write "one single disk with ZFS is unsafe" - that is what I read. To that I must add "no, with copies=2, a single disk is not unsafe". Which is correct.

If you write "one single disk with ZFS is unsafe, if you are using copies=1", then you are correct. But you can not fault me, for not reading your mind. Maybe you did not know that "copies=2" exist? Maybe you would have answered "oh, I did not know that, thanks!". Instead you wrote "of course I meant with copies=1". How can I know what you mean or not mean? If you write that single disk ZFS is unsafe, then you are wrong. Simple as that. And no, I can not read your mind, and I dont know what you know or not know. Maybe you did not know about copies=2?

Do you now agree with my statement? Because that was always my original statement intention.
And how would I have known what your original statement intention was? You sound like my girl friend, asking me to read her mind and know about her original intentions. I cant read her mind, let alone your mind. I write wrong things, then I correct that. What is the problem with that? You wouldnt want people to believe that single disk ZFS is unsafe, would you? It is not even true.

So it's not that hard drives don't prevent corruption and ZFS does. That's not accurate. They both do prevent corruption up to their configured level of parity/mirroring setup. They can also both fail if corruption exceeds the amount of parity/mirroring each is configured for. ZFS just has the ability to be configured with more parity/mirroring than a hard disk can on it's own.
You have omitted the protocol they are using. Goppa codes are very efficient, and if hard disks use a simple error detection and correction protocol, and ZFS use a better - then it is clear ZFS is safer. ECC RAM only corrects one bit errors - so ECC ram use a simpler error detection protocol than ZFS. In fact, even if you discard all redundancy with extra copies, you get data protection if the protocol is good enough. For instance, when you transmit data in satellites, the data is not transmitted twice so that if one data block is faulty it will use another data block, no. Instead the data is sent once, with error correcting codes. Not twice nor thrice. So it is not just a matter of ZFS holds copies several times, and hard disks dont - no, ZFS also uses a better protocol than hard disks. And on top of that, ZFS also typically has redundancy storing copies several times.

That's the only point I was trying to get at I guess.
And how will I know what other implicit statements you are going to get angry of, for me not knowing?

I understand you get angry when you demand that I read your mind. I suggest you dont demand that of people, it will calm your mind a lot:
"Every time you reply to me you always put words in my mouth and I don't really appreciate it. Either that or you try to use your own interpretation of my statement in your own way which is simply incorrect. You make assumptions. Like assuming I was talking about "copies=2". I was not. I was talking about "copies=1".
 
Instead you wrote "of course I meant with copies=1". How can I know what you mean or not mean?

Because I am assuming that you are not taking me for a fool.

If I said "ZFS doesn't prevent corrupt data with 1 disk", then isn't it obvious, or at least at a minimal level more *likely* I meant with "copies=1"? If I meant with "copies=2" then my statement would have been false. Why would I have intentionally written a false statement? Please give me more credit that that, that's all that I ask heh.

When I read what someone writes, I assume that the context of their statement is the one which makes most logical sense. In my case of "ZFS with all it's checksums doesn't prevent corrupt data, it can only detect it" The most logical context IMO would be "oh, with "copies=1 that would be true so that must be what he is referring to".

Rather than immediately thinking "I disagree with that statement", "because If I make some less-logical assumption of", "he must be talking about using "copies=2" and then in that case his statement is wrong, ZFS actually can prevent corruption if "copies=2".

I think I would find it hard to have an intelligent discussion with someone if they always assumed the worst of my statements. It seems like it would take a very long time to get anywhere into the discussion if we had to reiterate every statement in such detail just to make sure they meant the case where each statement they make is actually true.

When reading my comments, just assume that I am always referring to the case that actually makes logical sense and to the case that is actually correct :). Unless there is no case that exists which allows the statement to be correct in which case I am wrong and then feel free to inform me why it's wrong so I can learn.
 
Last edited:
I rather not work with anything that has to do with linux, command lines or anything complicated.

Then pay someone who does. You don't want to pay and you don't want to learn and still you want your lunch. How does that work?

When my car breaks, I acknowledge that I know jack shit about it and let someone competent fix it. Why is IT a field where everyone assumes they can do it themselves with minimal effort as soon as they manage to connect their routing blackbox?
 
"I rather not work with anything that has to do with linux, command lines or anything complicated."


Why are people even helping this jackass?
 
Then pay someone who does. You don't want to pay and you don't want to learn and still you want your lunch. How does that work?

When my car breaks, I acknowledge that I know jack shit about it and let someone competent fix it. Why is IT a field where everyone assumes they can do it themselves with minimal effort as soon as they manage to connect their routing blackbox?

Why would i need to if there might be an existing windows solution that fits my needs? That was the whole purpose of this thread.
 
Why would i need to if there might be an existing windows solution that fits my needs? That was the whole purpose of this thread.

All you know is hammers so of course every problem looks like a nail to you.
 
Because I am assuming that you are not taking me for a fool.

If I said "ZFS doesn't prevent corrupt data with 1 disk", then isn't it obvious, or at least at a minimal level more *likely* I meant with "copies=1"?
No, it is not obivous. I have no idea of how much ZFS you know. There are lot of people saying that ZFS needs 1GB RAM for every TB of disk - and they are wrong. They dont know. Even I dont know everything about ZFS, I have followed ZFS since the very beginning reading the ZFS maill list from ZFS devs since the beginning and talked to them. There are stuff that I dont know.

I have no way of reading your mind, or figure out how much you know about ZFS. I wrote lot of posts here, there are lot of ZFS voices, and I dont keep track of what they say, how much they know, which threads they posted.

What is your problem? Why do you demand that I know how much ZFS knowledge you have - or you get angry?

How much do I know about Radon Nikodym derivatives, or HFT? Quick, tell me!
 
NP. I think this was just about miscommunication. And now that we have sorted that out, I wish you lot of beer in the future! :)
 
It's ridiculous that people jump all over the OP - he's simply asking if there is a (preferably) Windows solution for what he wants. He did as much research as he could, as cited in his post.

Not everyone knows everything.

It amazes me how hostile hardforum can get sometimes.
 
Back
Top