OCZ SSD's

jen4950 · Jun 29, 2012

I bought into OCZ bigtime a while ago. Hey- RAID 50 with 12 drives. I need big problems before I need to worry- like a 33% failure rate before I need to sweat. (of course an 8% rate in RAID 50 and 16% in RAID60 if you are really paranoid)

I bought 1 x OCX Vertex 3 Series MAX IOPS on 10/28/2011
I bought 5 x OCX Vertex 3 Series MAX IOPS on 11/17/2011
I bought 6 x OCX Vertex 3 Series MAX IOPS on 11/23/2011

I put them in RAID 50- worked well for a while. Damn Fast. No drive got over about 86 degrees.

And then I had 2 drives fail.

I bought 2 more, haven't sent the dead drives in yet.

I bought 2 more x OCX Vertex 3 Series MAX IOPS on 3/2/2012

In the past 3 weeks (5/28/2012 to 6/7/2012), 7 of these remaining drives have failed.

I lost my boot drive array about 4 times and finally dropped the bad drives out of the array and now have 2 x 4 drive RAID6 arrays in RAID 0 (RAID60) with a hotspare- and I still triggered the hotspare today.

I'm using (2) Thermaltake Chassis to handle the (12) SSD's. Well- 9 active right now out of 14 drives.

I'm running an LSI MegaRAID 9265-8i and Chenbro CK23601 Expander card. I had 2 of those Chenbro cards too- the first one was DOA. This is all in a Dell Precision T7500 with proper airflow and on a big UPS.

Not too good of a track record with pretty good hardware.

Luckily I have good and current backups (important data was offloaded long ago when I noticed trouble), so I can just rebuild the array and reload the image.

It was good while it lasted. Now I am going to see if I can work my way out of it.

I'd like to hear from other OCZ patrons. I've invested heavily and would like to realize some value from the warranty- but if it's not worth my time and effort, I'll have to re-evaluate my approach.

jen4950 · Jun 29, 2012

Me eating crow:

http://hardforum.com/showthread.php?p=1038344699#post1038344699

That was about 2 weeks before I started having major problems.

mwroobel · Jun 29, 2012

jen4950 said:
I bought into OCZ bigtime a while ago. Hey- RAID 50 with 12 drives. I need big problems before I need to worry- like a 33% failure rate before I need to sweat. (of course an 8% rate in RAID 50 and 16% in RAID60 if you are really paranoid)

I bought 1 x OCX Vertex 3 Series MAX IOPS on 10/28/2011
I bought 5 x OCX Vertex 3 Series MAX IOPS on 11/17/2011
I bought 6 x OCX Vertex 3 Series MAX IOPS on 11/23/2011

I put them in RAID 50- worked well for a while. Damn Fast. No drive got over about 86 degrees.

And then I had 2 drives fail.

I bought 2 more, haven't sent the dead drives in yet.

I bought 2 more x OCX Vertex 3 Series MAX IOPS on 3/2/2012

In the past 3 weeks (6/7/2012 to 5/28/2012), 7 of these remaining drives have failed.

I lost my boot drive array about 4 times and finally dropped the bad drives out of the array and now have 2 x 4 drive RAID6 arrays in RAID 0 (RAID60) with a hotspare- and I still triggered the hotspare today.

I'm using (2) Thermaltake Chassis to handle the (12) SSD's. Well- 9 active right now out of 14 drives.

I'm running an LSI MegaRAID 9265-8i and Chenbro CK23601 Expander card. I had 2 of those Chenbro cards too- the first one was DOA. This is all in a Dell Precision T7500 with proper airflow and on a big UPS.

Not too good of a track record with pretty good hardware.

Luckily I have good and current backups (important data was offloaded long ago when I noticed trouble), so I can just rebuild the array and reload the image.

It was good while it lasted. Now I am going to see if I can work my way out of it.

I'd like to hear from other OCZ patrons. I've invested heavily and would like to realize some value from the warranty- but if it's not worth my time and effort, I'll have to re-evaluate my approach.

Jen-
9 out of 14 drives failing in a 3 week period would lead me to look elsewhere than the drives themselves. Take the backplanes out of the equation temporarily and try another power cable tree. Have you updated all the drives to at least firmware 2.15? Are the drives operable out of the array after they drop or are they completely unresponsive?

jen4950 · Jun 29, 2012

mwroobel said:
Jen-
9 out of 14 drives failing in a 3 week period would lead me to look elsewhere than the drives themselves. Take the backplanes out of the equation temporarily and try another power cable tree. Have you updated all the drives to at least firmware 2.15? Are the drives operable out of the array after they drop or are they completely unresponsive?

It's 7 out of 12 this week. Is that better? (LOL) And same failure period as before.

All drives are on firmware 2.15.

They drop out during heavy I/O and don't show back up until about 3 reboots. And then you have to rebuild the array with hotspares. They don't show up until after the power cycles- I'm not going to mess with situations off my RAID card and oiut of the chassis.

I've watched temerpatures close with the chassis- nothing over about 90F. I've rotated connectors, and have also used 2 new chassis. So I've tried (4) of the Thermaltake 6 bay units.

At the end of the day- it's the drives.

The kicker is everything runs dandy for 3 months- and then the wheels fall of without any intervention from us.

mwroobel · Jun 29, 2012

I had some problems with some early V3 MIOPs. Try somethign if you can. Use the toolbox to try and secure erase one of them.. Does it complete or kick back an error "Drive Frozen, cannot be erased"? Unfortunately, with some of the higher end RAID cards (LSI, Areca) OCZ SSDs seem to have issues in general. We have found that for SSD arrays with MLC drives we have the best success with Intel drives. For the stuff we really care about, we are using SAS 400GB SLC Toshiba MK4001GRZB. God-awful expensive but we haven't been able to wear any out (and there are extremely high writes of over 1TB/day to some we are using as D2D2T interposers.)

Red Squirrel · Jun 29, 2012

So are these really known for issues? I have two of them in production, one on my HTPC, which hardly gets used so not a huge deal, the other on my workstation. So far so good. *touch wood*

jen4950 · Jun 29, 2012

mwroobel said:
I had some problems with some early V3 MIOPs. Try somethign if you can. Use the toolbox to try and secure erase one of them.. Does it complete or kick back an error "Drive Frozen, cannot be erased"? Unfortunately, with some of the higher end RAID cards (LSI, Areca) OCZ SSDs seem to have issues in general. We have found that for SSD arrays with MLC drives we have the best success with Intel drives. For the stuff we really care about, we are using SAS 400GB SLC Toshiba MK4001GRZB. God-awful expensive but we haven't been able to wear any out (and there are extremely high writes of over 1TB/day to some we are using as D2D2T interposers.)

I bought 4 Intel 520 Drives today- so hopefully I'll be able to test the V3 drives (soon) once they are out of commission.

jen4950 · Jun 29, 2012

Red Squirrel said:
So are these really known for issues? I have two of them in production, one on my HTPC, which hardly gets used so not a huge deal, the other on my workstation. So far so good. *touch wood*

I've survived with RAID 60 - buyer beware otherwise!

mwroobel · Jun 29, 2012

Red Squirrel said:
So are these really known for issues? I have two of them in production, one on my HTPC, which hardly gets used so not a huge deal, the other on my workstation. So far so good. *touch wood*

I never had issues in motherboard-based 2 drive R0, but have had dropouts with 4 and 4+ drive R0 on Areca and LSI HWR6 adapters. We started with V3 miops drives and had the most problems with those, but also had problems with m4s. We went to intel's at the time and the problem was resolved.

-Dragon- · Jun 29, 2012

I can't find it right now but there's a site out there that uses retail return data from a decent sized european retailer to trend reliability data for a range of computer components, and OCZ is always 2-3x worse than the next worst manufacturer, and 10-20x worse than the best.

Basically OCZ is garbage. Corsair and Intel have far better reliability numbers, even with intel's 8MB bug on the 320 line earlier this year, they were still way more reliable than OCZ.

odditory · Jun 29, 2012

-Dragon- said:
Basically OCZ is garbage.

The OP's troubles are not unique and is why warnings have existed about OCZ SSD's in this forum and others going on years now, unfortunately some people end up learning the hard way or get too tempted by the fire sales and deep rebates (even before the recent SSD price drops) without bothering to question WHY OCZ might be so desperate that they're undercutting everyone else. Unfortunately they've become the defacto "disposable SSD" brand and provided the company stays in business long enough to honor your warranty on an RMA then its not the end of the world, just be very diligent about keeping up to date image based backups.

My condolences to the OP for such a hefty investment in this company and if you do get replacements back on any still within warranty I would sell them immediately. For lulz you can try copy/pasting the same opening post on the OCZ forum and see how long it stays up before a mod deletes it. Granted maybe things have changed over there but historically they've nuked/censored posts about drive failures, claiming it is "negativity". Another of many reasons for the continuing exodus.

-Dragon- · Jun 29, 2012

Oh here it is:

http://www.behardware.com/articles/862-7/components-returns-rates-6.html

And apparently it's corsair not crucial that's got the other really high reliability.

Granted it's returns not failures but like the article says at the begining:

The first question is of course where the stats come from. They’re taken from a large French etailer, whose database we have had direct access to. We were therefore able to extract the stats we wanted directly from source.

Under what conditions is a part declared as defective by this etailer? There are two possible cases: either the technician considers the exchange of information with the client (type of problem, cross testing) sufficient to declare that the product isn’t working, or there’s a question mark over the component and the etailer tests it to check if it’s working or not.

Among the returns that aren’t tested, some of the components announced as having an issue by customers probably aren't actually defective, in spite of the precautions taken by the technician. This is something inherent in the etailing sector and in practice, it’s unlikely that any model or product is more affected by this phenomenon than any other (at least we’re aware of no objective argument that shows this).

Of course, these statistics are limited to the products sold by this particular etailer and the returns made to it. Sometimes returns are made to the manufacturer itself, particularly with storage, but this represents a minority in the first year.

There’s no other way of obtaining more reliable statistics and, while not perfect, at least our system allows us to give you some indication of reliability.

And according to their returns even with the 8MB bug intel SSDs were still returned almost 5x less often than OCZ's average, with much of the OCZ line having 6 - 15% return rates compared to crucial's 0.8%. Another point of reference, prior to this cycle and the 8MB bug, intel's return rates were previously 0.3% for May 2011 and 0.1% for Nov 2011.

Had you bought that many intel drives, you'd have had like a 1% chance of having problems, but since you bought that many OCZ drives, you pretty much guaranteed you'd have issues.

Nov 2011
May 2011

TCM · Jun 29, 2012

People still buy OCZ?

Guys, there are S&M shops for your needs.

-Dragon- · Jun 29, 2012

Well between the absolutely atrocious reliability, the "lets swap out NAND modules in our drives to higher density ones, causing performance to get cut like, in half, don't tell anybody or update our advertising since we USED to ship these products at this speed and hope nobody notices", and, well, this, it should be obvious that the people buying OCZ drives can't find the pain they need at an S&M shop.

TCM · Jun 29, 2012

Touché.

-Dragon- · Jun 29, 2012

@OP

I'm not even trying to give you shit, fact is OCZ drives do tend to look all shiney and neat, they're cheap, they seem fast at least in benchmarks, they got shiney ads like, EVERYWHERE, and they even have "OC" in the name so they must be for overclockers right?

But really their products are more like those exploding snake cans of nuts, except instead of a snake, it's shit.

If one person comes to these forums and searches for OCZ reliability, finds this thread, and buys ANYTHING ELSE, then it will be worth it for me. Not for you obviously cause you got fucked in the ass, but for me.

jen4950 · Jun 29, 2012

-Dragon- said:
@OP

I'm not even trying to give you shit, fact is OCZ drives do tend to look all shiney and neat, they're cheap, they seem fast at least in benchmarks, they got shiney ads like, EVERYWHERE, and they even have "OC" in the name so they must be for overclockers right?

But really their products are more like those exploding snake cans of nuts, except instead of a snake, it's shit.

If one person comes to these forums and searches for OCZ reliability, finds this thread, and buys ANYTHING ELSE, then it will be worth it for me. Not for you obviously cause you got fucked in the ass, but for me.

Well- I figured with 12 drives in RAID 50 a failure or two would not be that big of a deal. And performance is exceptional when it is working.

Seven bad drives is a different story.

Jumping ship over to Intel drives.

ob1 · Jun 29, 2012

Well, everything considered, having 7 drives fail in the same exact manner in the roughly the same time frame is pretty statistically significant. I mean, has anything else changed, like an OS auto update? Anything internal/external?

Suprnaut · Jun 29, 2012

There is a newer firmware v 2.22 I believe. Also you may want to run the temperature fix that comes on the Linux cd. Basically they report the wrong temperature causing raid cards to drop them.

jen4950 · Jun 29, 2012

Suprnaut said:
There is a newer firmware v 2.22 I believe. Also you may want to run the temperature fix that comes on the Linux cd. Basically they report the wrong temperature causing raid cards to drop them.

I ran the temp fix; I'll try the new firmware.

jen4950 · Jun 29, 2012

ob1 said:
Well, everything considered, having 7 drives fail in the same exact manner in the roughly the same time frame is pretty statistically significant. I mean, has anything else changed, like an OS auto update? Anything internal/external?

It's been in the same machine, same cabinet with forced air cooling, and same OS.

Red Squirrel · Jun 29, 2012

Well speaking of failures, I think mine is starting to fail too. At least I'm pretty sure it's the drive. Twice yesterday, all my programs just randomly stopped responding. For some reason it was always triggered by saving a word document to my network. It's still early to tell if it's the drive, and it has not done it again since but basically I had to do a hard reboot. I could still move the mouse and try to open programs but it would just sit there thinking forever and every app that I tried to start using would go as "not responding". Time to order an Intel I guess.

mwroobel · Jun 29, 2012

Red Squirrel said:
Well speaking of failures, I think mine is starting to fail too. At least I'm pretty sure it's the drive. Twice yesterday, all my programs just randomly stopped responding. For some reason it was always triggered by saving a word document to my network. It's still early to tell if it's the drive, and it has not done it again since but basically I had to do a hard reboot. I could still move the mouse and try to open programs but it would just sit there thinking forever and every app that I tried to start using would go as "not responding". Time to order an Intel I guess.

If you ctrl-alt-del during the wait, does task manager immediately come up or does it pause until the rest of the system responds? Do you see an escalating total error count in Smart? Are you logging into a WS2008 domain? Any errors in event viewer on the client machine?

Red Squirrel · Jun 29, 2012

mwroobel said:
If you ctrl-alt-del during the wait, does task manager immediately come up or does it pause until the rest of the system responds? Do you see an escalating total error count in Smart? Are you logging into a WS2008 domain? Any errors in event viewer on the client machine?

Even task manager would not come up, start menu did not work, I could see that it was trying to load it, but it just stat there with the spinning wheel. Logging on to a Samba domain, which I think my version is equivalent to like NT4 so it's old school. No errors in event viewer. I can't seem to retrieve any SMART info in windows. I'm using a program called DiskCheckup, it says it's unavailable. I will try booting with a Linux CD to see if it lets me from there.

mwroobel · Jun 29, 2012

Red Squirrel said:
Even task manager would not come up, start menu did not work, I could see that it was trying to load it, but it just stat there with the spinning wheel. Logging on to a Samba domain, which I think my version is equivalent to like NT4 so it's old school. No errors in event viewer. I can't seem to retrieve any SMART info in windows. I'm using a program called DiskCheckup, it says it's unavailable. I will try booting with a Linux CD to see if it lets me from there.

Try CrystalDiskInfo for Windows.

Red Squirrel · Jun 29, 2012

mwroobel said:
Try CrystalDiskInfo for Windows.

Seems my drive does not support SMART. Even in Linux it just says it's unsupported. I could not find any diagnostic info in CrystalDiskInfo, it's basically a benchmark. I ran it and it went fine.

Seems the issue only happens randomly, guess it has to hit the right part of the drive or something.

alienate · Jun 29, 2012

I haven't heard many complaints about their Vertex 4's, but can understand those who are hesitant to try them again.

mwroobel · Jun 29, 2012

Red Squirrel said:
Seems my drive does not support SMART. Even in Linux it just says it's unsupported. I could not find any diagnostic info in CrystalDiskInfo, it's basically a benchmark. I ran it and it went fine.

Seems the issue only happens randomly, guess it has to hit the right part of the drive or something.

CrystalDiskMark is the benchmark, CrystalDiskInfo if the SMART app. I honestly cant think of a SATA SSD that has been released that doesn't support SMART. Are you directly connected to a motherboard port or are you connected to a RAID HBA (that can interrupt getting smart values on some devices?)

-Dragon- · Jun 29, 2012

alienate said:
I haven't heard many complaints about their Vertex 4's, but can understand those who are hesitant to try them again.

They're also what, a few weeks old max? Why on earth would you buy a new product from a company with such an HORRIBLE track record. "Well I know that in the past they've put out garbage and shit on customers, but maybe THIS time with a completely new and untested controller of their own design things will be great!"

For me it would take several years of them having BETTER reliability than intel before I'd even consider buying one of their products. I'll pay more for a product from a company who measures their failure rates in X in 1000 terms rather than one who measures it in X in 20.

Red Squirrel · Jun 29, 2012

mwroobel said:
CrystalDiskMark is the benchmark, CrystalDiskInfo if the SMART app. I honestly cant think of a SATA SSD that has been released that doesn't support SMART. Are you directly connected to a motherboard port or are you connected to a RAID HBA (that can interrupt getting smart values on some devices?)

Oh ok I must have hit the wrong link when I googled for CrystalDiskInfo. I found the right one now. Odd that smartctl in Linux did not work though. But anyway this is the result:

mwroobel · Jun 30, 2012

Red Squirrel said:
Oh ok I must have hit the wrong link when I googled for CrystalDiskInfo. I found the right one now. Odd that smartctl in Linux did not work though. But anyway this is the result:

There is nothing glaringly wrong here. There are no hard failures on any of the usual subjects and soft errors are well within normal. You haven't written all that much at all to the drive. While SMART is in no way perfect and I have seen drives that report perfect SMART stats completely flake out I would try the following first:
Backup your drive to one or more backup targets (can never be too safe)
Secure Erase the drive
Update firmware to latest
Restore drive
See if that resolves the problem.
If not, RMA it and try again

OCZ SSD's

[H]F Junkie

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

[H]F Junkie

[H]F Junkie

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

2[H]4U

Gawd

2[H]4U

Gawd

2[H]4U

[H]F Junkie

2[H]4U

Weaksauce

[H]F Junkie

[H]F Junkie

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Gawd

Supreme [H]ardness

2[H]4U

[H]F Junkie

Supreme [H]ardness