Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
maybe im dense, but if huge drives in a RAID statistically fail before they can be rebuilt, how do they get built in the first place without errors?
Don't worry about it, it's really parnoid people who are worried about statistically failing drives because of the rebuilt time of their size... by this logic, a person would have a failing drive every week.
I guess it depends though Ockie. While the guy from the article might be off on his numbers, however If you are a business, these probabilities matter because they will help determine your maintenance costs over the next year(s). If you start approaching 100% on a rebuild failure...it may land up costing you weeks of downtime trying to get the array back online. It is a risk assessment.
In terms of the home, I guess it really is coming to the point where the average somewhat compitent but still ignorant user no longer gets the put a half dozen drives in RAID5 and have a aweseome chance of success even if a drive fails down the road. I think we are going to start seeing more and more people who setup RAID5 arrays and have failures pop up on the forums has HDD density increases.
Thats why SAS drives are there, enterprise systems should use SAS, not SATA to store business critical data,
If you do use SATA for your data in business, be sure to have a darn good proven backup system.
I think Kipper may have said a gem without realizing it. Most of the people here seem to realize raid isnt a backup solution. Correct me if I am wrong, but raid 5 is meant to allow you to lose one drive and keep running temporarily in a degraded state. I think the problem is when people rely on raid 5 rebuilds as the only means of backup. That coupled with larger drive sizes and the ERU factored in equals disaster. I also read that since it takes so long to rebuild an array you have a far better chance of a second drive failing due to the stress of long periods of high usage, and while that seems to make sense, I didnt find any facts to back that up, so if anyone has any facts on that I would be interested.
The interface of a drive has little to do with how reliable it is. Seagate makes the same physical drive with either a SAS or sata interface; would you argue that one is more reliable than the other?Thats why SAS drives are there, enterprise systems should use SAS, not SATA to store business critical data,
If you do use SATA for your data in business, be sure to have a darn good proven backup system.
Agreed---but if your business depends on uptime, backups are not the end-all be-all here, and some level of RAID (or other uptime/replication mechanism) is necessary.You know answer as well as I do, they're called backups. And RAID is NOT a backup.
The interface of a drive has little to do with how reliable it is. Seagate makes the same physical drive with either a SAS or sata interface; would you argue that one is more reliable than the other?
Agreed---but if your business depends on uptime, backups are not the end-all be-all here, and some level of RAID (or other uptime/replication mechanism) is necessary.
Running backups is damn important, but that's not what I see as the problem here. If you have a 20TB raid 5 array and it goes down, restoring from tape, even at 100 MB/s, will still take 200k seconds = two months! Even if you are restoring from another 20TB array that can transfer over network at 10gigE speeds (1 GB/s), that *still* takes 20k seconds = five days! RAID is sufficient to keep such an array functioning while disks are resilvered, if it's used right (single-digit-sized raid 6 groups are a good rule of thumb), but most businesses can't deal with five days of downtime to get the data back online, let alone two months.
I guess it depends though Ockie. While the guy from the article might be off on his numbers, however If you are a business, these probabilities matter because they will help determine your maintenance costs over the next year(s). If you start approaching 100% on a rebuild failure...it may land up costing you weeks of downtime trying to get the array back online. It is a risk assessment.
In terms of the home, I guess it really is coming to the point where the average somewhat compitent but still ignorant user no longer gets the put a half dozen drives in RAID5 and have a aweseome chance of success even if a drive fails down the road. I think we are going to start seeing more and more people who setup RAID5 arrays and have failures pop up on the forums has HDD density increases.
And Im sure one drive somewhere probably does fail every week.........
All things aside, its a possibility. One that manufacturers and persons in the business seem to care about a lot more than you or I.
Id definately makes for some interesting theorycrafting though.
Thats why SAS drives are there, enterprise systems should use SAS, not SATA to store business critical data,
If you do use SATA for your data in business, be sure to have a darn good proven backup system.
I just dont see what the Interface has to do with reliability and the context of the article?
How could you say that my 1TB Seagate ES2 SATA drive is less reliable then my 1TB Seagate ES2 SAS drive? Thats completely irrational.
Whoops, yeah. Divide numbers by 24 and they're more reasonable. The point stands, though: even with backups there can be significant downtime when an array fails.Uhm yea your math is a little off...=). 20k seconds is ~6 hours...not 5 days...you forgot there's more than 1 hour in a day =p.
The unrecoverable error rate is because the bits are stored on a hard drive. All drives, regardless of interface, get unrecoverable errors. SAS drives are usually quoted in the range of 10^15 BER, compared to 10^14 for sata, because of stronger ECC or better firmware that prevents errors somehow. This makes them safer, perhaps, but still not completely reliable.the SATA URE issue
About the same. SAS and sata both have decent checksums on every block sent over the wire, so corruption on the wire isn't usually a big problem.. What would this article look like when written about the SAS interface?
The solutions to this problem, IMO, are double parity of some sort and software-based checksums on the OS side. ZFS gives me both and I'm very happy with it.
Right, with checksums in place HD's will always have errors, the key is how the controller handles the errors, so in bottom line is that with a good controller with ECC/Checksums, rebuilding an array would be completely possible and will handle any errors that do happen.
So in the end, RAID 5 still has a happy home.
Define "good controller"
So all the low dollar cards and Mobo raid is no good for raid 5 anymore, maybe?? I mean, it seems like a game of chance if your running a controller with no ECC or some type of system built in to overcome an error.
That article about drive errors was extremely misleading. The unrecoverable read error rate as referenced, 10^14, is a drive manufacturer's spec, which is spindle based - NOT the total amount of data the RAID system must read (from multiple drives) during a rebuild. So, every 12 terabytes read from any given drive will result in one URE. Your RAD5 system will be just fine.
Sorta like saying condoms are 99% effective and thus you can have sex 99 times before they will fail. It just doesn't work that way.
That article about drive errors was extremely misleading. The unrecoverable read error rate as referenced, 10^14, is a drive manufacturer's spec, which is spindle based - NOT the total amount of data the RAID system must read (from multiple drives) during a rebuild. So, every 12 terabytes read from any given drive will result in one URE. Your RAID5 system will be just fine.
Let's assume "rebuild" means using the same set of drives and replacing the failed one. Simply "recovering" an N-1 RAID 5 is as simple as copying all the files to another set of HDs. That's a once over read of all the space on each of the disks. And you guys are saying if the array is too large, simply reading from all these drives will result in a drive failure? BS.
If so, why the hell wouldn't these arrays fail during normal use? Surely accesses over normal use amount to over 100x the amount during a rebuild.
Bit errors (flipped bits) are irrelevant to RAID 5 because they cannot be detected or corrected. You'll get them in a working array or an N-1 array, so files may become corrupt regardless. RAID 5 protects against a hard drive failure.
But I don't know how sound this guys numbers are.