Why isn't RAID a backup?

Syndacate

Weaksauce
Joined
Mar 5, 2009
Messages
93
Please read, as I'm sure you're getting a "RAID explained" page up with your fingers ready to hit control + V.

Everytime RAID is mentioned in the same THREAD as backup people start flipping shit and pointing to wikipedia articles and crap like that. I figure that title will get a lot of answers..or at least a lot of views by enraging passer who see it o_O.

So right: If you do something like RAID1/5/6 - safe from a drive failure - your data is BACKED UP as the parody sections of each of the remaining drive, or in RAID1 as the mirror'ed drive.

Backup: Just transferring all your crap (or all important crap) to another drive.

So if I run rsync on a script every 30 minutes to another disk (I would never do this), unless there's a lot of drive writing, one can say that my data is pretty safe. Most would call this simply short incremental backups, but for all intents and purposes it's a low level software RAID1, the point being to keep the secondary drive loaded with what the 1st drive has on it.

So what am I missing? The ideas seem very closely related in both their objective as well as the concept of how they do it (excluding RAID5/6/0, etc.).

Then, on top of that:
What if I back up all my data (nightly) to a RAID1 volume...which I'm sure is impossible (divide by 0 impossible) according to some people. That's not a backup because the sentence contains the word "RAID" (I'm not lying, I actually said that first sentence on this forum and somebody linked me to the wikipedia for RAID telling me they're not the same).

So what's the deal, why do so many people flip shit when they're even mentioned in the same thread as one another...?? I'm quite confused if it's some fundamental concept of either that I'm not getting or if it's just trolls being..trolls...

Explanation?

As the way I see it, just about all RAIDs are backups from the sense that they provide safety from drive failures on some level or another (except RAID0, that's an oddity, but I'm talking about the rest of the common RAIDs here, 1/5/6).
 
raid doesn't protect you from accidental file deletion, virus, etc...
 
Backups provide protection from more than just drive failures (things like user negligence, different kinds of hardware failures, etc). RAID is not a replacement for backups, but naturally it can be used in conjunction with it to give you even more reliability, but it alone isn't a backup.
 
So a backup is a copy of your data which you can use for recovery in a number of different scenarios.

Technically RAID is a copy of your data but it doesn't meet all the criteria of a backup - it's an up to date copy rather than a backup. It doesn't help you if you delete some files you needed. It doesn't help you if you suffer from fire or theft and it doesn't protect you from corruption. All it does it does is protect you locally from a disk failure.

In most peoples eyes RAID does not meet the criteria of a backup. A lot of this is common sense, a lot of this stems from the IT world.
 
What the guys so far have said is the main thrust of the arguement.

I agree with you that perhaps people shouldn't be quite so forceful when pointing it out, but it is an important distinction.

There is a difference between what a RAID 1 array does (as in your example) and the act of copying the contents of the array to a different location.
The latter is a backup since the copy does not constitute the primary, user-available storage of the data. But the RAID 1 is not, since any changes made to the data (perhaps corruption, a virus, user error etc) on that array occur on both drives immediately, making the data no more recoverable than it would be on a single disk.

I gave the following answer in another thread about this, which I think explains it fairly well:
The idea of RAID 1 is that every drive in the array is told to store exactly the same thing and changes are made to every drive at the same time.
So if you delete a file from the array volume (intentionally or otherwise), it's deleted on all the drives. That's not a backup, because you can't go and retrieve the file from the other drive(s) any more than you can from the first.
The same is true if the file is corrupted through a virus infection (cue an offline backup) or filesystem problems, or any other reason other than drive failure.

A backup would be a copy of the data held on another drive or array that is available for retrieval regardless of what happens to the primary copy.
For example, copying the data to an external drive and then storing that drive elsewhere.
 
So a backup is a copy of your data which you can use for recovery in a number of different scenarios.

Technically RAID is a copy of your data but it doesn't meet all the criteria of a backup - it's an up to date copy rather than a backup. It doesn't help you if you delete some files you needed. It doesn't help you if you suffer from fire or theft and it doesn't protect you from corruption. All it does it does is protect you locally from a disk failure.

In most peoples eyes RAID does not meet the criteria of a backup. A lot of this is common sense, a lot of this stems from the IT world.

Bolded line
An up to date copy is referentially transparent to an up to date backup, in fact, a backup is acquired by copying data. This is where I get confused.

I suppose I see sort of what you're saying, but then again, I'm not an IT major. I suppose, as everybody's said it doesn't stop user negligence, fine, ok, but simple deleted files are easily recovered and regardless, doesn't affect the system, so put that aside for a second.

A backup isn't necessarily an "offsite" backup - that's just a characteristic of a backup, that being said, a backup doesn't necessarily protect you from fire nor theft neither.

As for corruption, I suppose if you're talking about file corruption then yes, backups would be better, and as for FS corruption, it would depend on what caused it, so I see what you're saying there.

----------------------------------------------

I think I see what you guys are saying in that backups simply provide more coverage in terms of user error or file damage. Though I simply see RAID as an instant backup by writing to two drives at the same time. I see the deleting/corruption thing though, but I think for corruption it depends on the type, although I do agree that backups prevent (mostly) that.

It's just weird (and quite annoying) the way everybody flips shit and links to "RAID explained" pages the the instant they're mentioned in the same thread, or using one as a replacement for the other for a certain person's setup.

So it sounds like I'm not as lost as I thought I was...

EDIT:
One of the things I like about RAID (especially RAID1) over backup though, is that you're constantly interacting with two drives drives, providing immediate recovery from failure, meanwhile with backup (ie. nightly backups) you'd have to wait 24 hours or whatever in between and the data is unprotected from drive failure in the lapse between backups :(. It's like there's no end all/be all solution for it, which makes me want to turn to RAID5/6, but that doesn't play nice with my many OS's that I have to use due to my work.
 
I think part of the problem is that the definition of a "backup" is blurred.

It seems that to people like yourself, a backup is some mechanism by which the reliability of a system is improved, usually by introducing redundancy in components and having data spread and copied between the components in a sort of damage reduction exercise.

I think the definition needs to be more specific. To me, a backup is a secondary system (be it data storage or other forms of "backup" such as a secondary domain controller) that is entirely disconnected from primary, day-to-day usage, but is periodically updated to bring it in-line with the primary system, such that if the primary were to fail the backup system could stand in it's place.

The purpose of the "entirely disconnected from primary, day-to-day usage" is to prevent damage to the primary system spreading to the backup, since that would render it useless.
In the case of RAID, as we have already explained, the drives in the array all comprise the same primary data volume. If damage to that data (excluding drive failure - which is the purpose of RAID) were to occur, the array would not help you in recovering it. Only a seperate copy of the data, made a (hopefully short) time before the damage was done will be undamaged.
 
RAID is strictly for in case a drive fails (as far as data protection is concerned). Backups provide better data protection based on specific time stamps. For example if you have a very critical folder to backup you may want to be able to restore anything within the last 7 days and last 12 months (I do this for lot of my code). Raid will not safe you from an accidental file deletation, virus, or other disaster.

Also offsite backups are important too. Even a drive that's upstairs as opposed to downstairs in the server room, it's better then nothing, but true offsite is more ideal.
 
Strictly speaking a backup would be a secondary solution should the first one fail.
Assuming RAID is a backup would be like assuming a quad core cpu has three backup cores for the first core.

So, assuming whatever you're trying to backup is important you want at least two separate copies of it, one of which should be offline to protect against everything that can destroy data other than drive failure. Really, really, really important data should have extra copies located offsite to protect against natural disasters and also your murder should your S.O. blame you for the loss of the family pictures or something :p
 
Bolded line
An up to date copy is referentially transparent to an up to date backup, in fact, a backup is acquired by copying data. This is where I get confused.

Think of a backup as a point in time copy of data that you keep separate from the main data and may keep for a long period of time.

Companies take their backups from (typically) 7pm onwards. If they have a problem during the next day they can use the backup. If they need a file from 18 months ago, they can use a backup.

A RAID copy isn't really a point in time copy, it's a current copy that's always up to date - the data isn't confirmed until the parity calc has been written (if R5 for example).
 
So right: If you do something like RAID1/5/6 - safe from a drive failure - your data is BACKED UP as the parody sections of each of the remaining drive, or in RAID1 as the mirror'ed drive.

I like your spelling of "parity"...which is actually highly appropriate in this case.

As is mentioned in other threads including this one, RAID is not a backup. I like to tell my customers to think of it as a downtime prevention method. It will help keep your system running in the event of a single drive failure (or potentially multiple drive failure if they're running 50 or 6). But raid can complicate data recovery in the event anything else goes wrong. That's why you wnat an external backup.
 
IMO:
Raid is not a backup by itself. Meaning if you use only it, you are just as vulnerable as if you would be with just one large disk. (maybe ab bit less depending on which type of raid type you use).

Raid is just as good a backup if you use it as secondary storage. That seems like a no brainier to me.

I use Raid5 now and prefer it over JBOD because if a drive fails I still have a chance of getting it all back.
I am going for Raid6 on my next NAS, cause that one will be a bit harder to fill up. It will last me longer, and therefore its drives will work longer and will be at more risk of failing at some point.

I don't use it as a secondary storage, as its capacity (read: my budget) doesn't allow that. But I use it for main file storage.
Important stuff gets back-upped on workstations, as it uses far, far les space and I can afford that!
 
There are several reasons why some of us stress that having backup is necessary even if you use RAID. RAID is one more thing that can go wrong. If your controller/PSU/cable causes 4 of your 16 drives in a 16 TB RAID6 to be dropped you're going to appreciate having backed up your files. Recovering a deleted file is no longer as easy as it was back in the days of MS-DOS and FAT. The people who use RAID are also likely to have more data to lose if there's a problem.
 
IMO:
Raid is not a backup by itself. Meaning if you use only it, you are just as vulnerable as if you would be with just one large disk. (maybe ab bit less depending on which type of raid type you use).

Raid is just as good a backup if you use it as secondary storage. That seems like a no brainier to me.

Wut?

I'm sorry, it's just not. Have you read all the explanation we've been giving? What do you mean by "if you use it as secondary storage"?
 
RAID1/5/6 doesn't protect you from a multilane cable going bad and taking out 4 drives in the process.
 
How about an analogy, albeit a poor one.

Let's say one has a back-up electricity generator at home. This generator does not run in conjunction with the grid electricity; the generator is a back-up to the grid electricity to enable a continuation of services if there were a failure of the grid electricity.

So,
Grid electricity OK: generator remains off but ready for use
Grid electricity failure: generator starts, service continues

Behind where I used to work there was a data centre. This company had a HUGE generator. I would see a person go and test the generator most monday mornings. The generator wasn't there for regular use, it was as a back-up.

So, surely computer (data) back-up is similar. The back-up offers to ability to continue (operations / work) in the event of a failure.

That failure might be hardware (hard drive, drive controller) or software (user deletion, virus, file corruption)
It just so happens that RAID offers minimal "protection" for the software based failure situation.
HOWEVER, the RAID based "back-up" provides a level of "back-up" that is VERY up-to-date. The nature of the "back-up" being very up-to-date is what makes RAID both good and bad.
That is, **GOOD** no/little data is lost due to hardware failure, (as the data is copied across the array the time-of-write), however **BAD** data may be lost due to a software failure (as the "error" is propagated across the entire array at the time-of-error).
 
RAID is online. A backup is offline. RAID is a copy, and due to its online nature can be modified and corrupted. An offline backup is also a copy, yet due to its offline nature can not be modified other than by environmental factors.
 
Wut?

I'm sorry, it's just not. Have you read all the explanation we've been giving? What do you mean by "if you use it as secondary storage"?

If I have a 1TB data drive and back it up on a 1TB 3x500GB raid5 NAS, than my NAS is secondary storage:)

And I agree about the online/offline fact. True backup should be offline:cool:
 
RAID in for hardware failure of drive.

BACKUP is for hardware failure of drive, and a million other things that could happen.

RAID + BACKUP = FTW
 
RAID in for hardware failure of drive.

BACKUP is for hardware failure of drive, and a million other things that could happen.

RAID + BACKUP = FTW

Of course, RAID often just shifts the single point of failure from the HDD to the RAID controller, mainboard, PSU or other component :p
 
And that ^^^ is why I gave up on RAID dreams back in the Socket A days. For my own personal uses the cost/benefit death spiral of maintaining spare controllers doubles the cost of setup (one is none, two isn't really enough), plus the gamble of having some lifetime-dependent failure smacking more than one drive at a time made it prohibitive. Luckily I smoked a mobo and learned my lesson before I got past the RAID0 level of RAIDlust.

I fell back to depending on redundant spindles both local and remote.
 
Raid doesn't protect you from angry girlfriend deleting content.

Raid doesn't really protect you if you don't follow the 4 basic firearm safety rules and have a ND through all your spindles while cleaning your gun.

Raid doesn't protect you at 4am when you decide to add a new drive, and format..some drive, oh crap what did I just format.

You can apply the motto "two is one, one is none" in a lot of things in life, and it is definately applicable to data.
 
Of course, RAID often just shifts the single point of failure from the HDD to the RAID controller, mainboard, PSU or other component :p

I think the OP was referring to data integrity, not uptime.
 
Please read, as I'm sure you're getting a "RAID explained" page up with your fingers ready to hit control + V.

Everytime RAID is mentioned in the same THREAD as backup people start flipping shit and pointing to wikipedia articles and crap like that. I figure that title will get a lot of answers..or at least a lot of views by enraging passer who see it o_O.

So right: If you do something like RAID1/5/6 - safe from a drive failure - your data is BACKED UP as the parody sections of each of the remaining drive, or in RAID1 as the mirror'ed drive.

Backup: Just transferring all your crap (or all important crap) to another drive.

So if I run rsync on a script every 30 minutes to another disk (I would never do this), unless there's a lot of drive writing, one can say that my data is pretty safe. Most would call this simply short incremental backups, but for all intents and purposes it's a low level software RAID1, the point being to keep the secondary drive loaded with what the 1st drive has on it.

So what am I missing? The ideas seem very closely related in both their objective as well as the concept of how they do it (excluding RAID5/6/0, etc.).

Then, on top of that:
What if I back up all my data (nightly) to a RAID1 volume...which I'm sure is impossible (divide by 0 impossible) according to some people. That's not a backup because the sentence contains the word "RAID" (I'm not lying, I actually said that first sentence on this forum and somebody linked me to the wikipedia for RAID telling me they're not the same).

So what's the deal, why do so many people flip shit when they're even mentioned in the same thread as one another...?? I'm quite confused if it's some fundamental concept of either that I'm not getting or if it's just trolls being..trolls...

Explanation?

As the way I see it, just about all RAIDs are backups from the sense that they provide safety from drive failures on some level or another (except RAID0, that's an oddity, but I'm talking about the rest of the common RAIDs here, 1/5/6).

Controller dies = RAID array dies.
 
Hows this for an answer:

A user today needed a file that she accidentally deleted a month ago, my RAID array had no record of this file.. my Backup did, and I restored the file for her. Thus RAID is not Backup.
 
There's a difference between redundancy and backup, something I think the OP is misunderstood for the concept of RAID. RAID provides redundancy; it doesn't substitute as a backup.
 
I think the main problem is with the terminology used. I think the term "archival backup" or "archival copy" should be used to more accurately reflect what a backup means in practical terms.

As for RAID, well "Redundant Array" is the give away clue - the array itself has an element of redundancy so that it remains available in time of unit failure. The array, and the data stored on the array, are two separate entities and should be treated as such.
 
Archival is long-term offline storage with no focus on quick recovery from these copies.
Backups are for recovery when someone deletes a file by accident, a HDD/RAID array goes south, etc.
RAID allows for resistance against HDD failure. It does not protect against other points of hardware failure and will happily destroy the array if told do so by software.

Within my company we decided to go for the fully redundant approach and mirror all data across six systems on two continents with backups made at both sites too. No RAID.
 
I what case?
I had a controller die on me. Replaced it with a same model, and the array newer knew it was gone!:cool:

Until you slot in a RAID controller from another type or manufacturer because you can't get the exact type any more and you find that the new controller does things slightly different, leading to a corrupt array. Fun times.
 
Until you slot in a RAID controller from another type or manufacturer because you can't get the exact type any more and you find that the new controller does things slightly different, leading to a corrupt array. Fun times.

Software RAID ftw ;)
 
What if you are running Raid 1 and get infected with a nasty virus.
Immediately both drives are infected.

I believe that running any kind of Raid by itself with no additional backup actually increases the likelihood that you will lose data compared to a single drive because of added complexity, higher likelihood of accidental formats due to unfamiliarity with raid card utility etc.
 
I like your spelling of "parity"...which is actually highly appropriate in this case.

As is mentioned in other threads including this one, RAID is not a backup.

I stopped reading the moment I saw that in the OP...

You'll realize the difference between redundancy and back-up the moment your controller dies, and the remaining drive(s) (even in a RAID-1 environment) is unreadable on anything but that controller...

Poof, your data's now unreadable...

Controller dies = RAID array dies.

My apologies, someone beat me to it...
 
Back
Top