mdadm software raid problems on Debian

starfury6 · May 31, 2006

Hey all,

I have a server running Debian and did have a software raid1 array of 2 x 250GB SATA disks on a Sillicon Image 3112 onboard controller. I am running the latest 2.6 kernel for reference. About 3 months ago I had a disk fail (was marked faulty by mdadm monitor) with ATA ioerrors and then about 30 minutes later the other disk started to have the same errors. I shut the machine down and was able to recover the info from the drive which hadn't been marked faulty. I figured it wasn't out of the realms of possiblility of them both going wrong since they were the same drive, from the same manufacturer and had serial numbers almost identical.

So, not trusting them, I purchased 2 raid edition drives (300GB ones this time) one seagate and one maxtor and recreated my array using these new drives. They proved faster, quieter and were also put in IcyDocks, to keep them cool.

Last night the EXACT same thing has happened to the drives again. This to me is far too much of a coincidence.

My plan is to fire up the server with the disk marked faulty removed and copy the contents off onto another machine on the network. Then I can play around with things.

I obviously need to find the problem. When the original problem occurred I was using Debian Sarge, I switched to Debian Etch in the hopes that newer programs and drivers would solve the problem but alas it hasn't. I can only guess that it's either the motherboard, or Debian itself which is having problems.

Anyone else have any suggestions as to the cause and a solution?

Thanks guys. (and girls)

Bones · May 31, 2006

I haven't had any issues with the Silicon Image 3112 and md drivers in any recent kernel. I suspect the hardware is faulty. Try different cables first, then a different controller.

ameoba · May 31, 2006

bad power?

starfury6 · Jun 1, 2006

Not likely the power, it's a Tagan TG480-U22 480W ATX2.01. It's running the following:

Athlon XP 2400+
1GB TwinMOS DDR333
120GB Maxtor ATA133
300GB Maxtor SATA
300GB Seagate SATA
32MB Ati Radeon AGP4 (slow passivley cooled model)
and whatever is onboard the motherboard, 2 port SIIL3112, IDE etc.
3 x Icy Docks for the HD's.

Shouldn't be the problem.

I think like you said, the controller might be faulty. I have a friend who has a card he can lend me to try. In the meantime I'm going to stick the marked faulty drive in my Fedora box and get mdadm to --assume-clean and see what happens. I'm sure the disk is fine. I can then get the info off, and I think I will not raid them up again in the server, but mount both drives normally and rsync the data from one to another every night.

What do you think?

Bones · Jun 1, 2006

The main benefit of RAID 1 is uptime during a drive failure... if that is not a priority, rsync should work fine for you. You may want to keep the drives in separate machines on your network, to reduce the impact of future hardware failure.

starfury6 · Jun 1, 2006

Unfortunately I might have to keep them in the same machine for the short term. The others on the network are powered down at night and I'd like to keep doing so.

Still... I might be forced to if I have continuing problems...

starfury6 · Jun 2, 2006

I can't get the server booted up because the gfx is screwed. Borrowed another card for a minute (the wife wouldn't let me keep it!) and it was fine so it appears the gfx card is knackered. I had it a few times where I would reboot the server for something and it would come up with corrupted gfx, I just put it down to residual images in the frame buffer and a few restarts and it would go away. Now its totally screwed.

Knowing how wonderful the PC hardware architecture is, can a GFX card spewing crap out on the PCI bus cause problems for other devices? If so it could explain my previous problems. I put one of the original drives which I thought were toast when it originally occurred in my Fedora box (which I had to reinstall last night, not a good week) and it seems to be fine, suggesting my 2 new ones are okay too.

I'm going to order a new card for the server before I start screwing with it just in case the GFX card was causing other problems. It was only £16 so it's hardly worth trying to warranty it. (Connect3DAti Radeon 7000)

Will keep you posted.

ameoba · Jun 2, 2006

starfury6 said:
Knowing how wonderful the PC hardware architecture is, can a GFX card spewing crap out on the PCI bus cause problems for other devices? If so it could explain my previous problems. I put one of the original drives which I thought were toast when it originally occurred in my Fedora box (which I had to reinstall last night, not a good week) and it seems to be fine, suggesting my 2 new ones are okay too.

Once you have one piece of bad hardware, it can potentially break everything. What's scarier is that one piece of bad hardware can potentially fry the rest of the system...

mdadm software raid problems on Debian

starfury6

Limp Gawd

Bones

[H]ard|Gawd

ameoba

Supreme [H]ardness

starfury6

Limp Gawd

Bones

[H]ard|Gawd

starfury6

Limp Gawd

starfury6

Limp Gawd

ameoba

Supreme [H]ardness