Ubuntu Alpha Breaking Hardware?

HardOCP News

[H] News
Joined
Dec 31, 1969
Messages
0
Engadget is reporting that the latest Ubuntu alpha is killing Intel network cards by corrupting the NVRAM used to store MAC addresses. According to the article, the ISO hasn’t been pulled but warning has been added.

Well, it looks like the good times that are the Ubuntu alpha testing process hit a bit of a snag recently, as one of the latest kernels apparently had the nasty side effect of irreparably damaging some users' hardware.
 
Well, it's alpha, not even beta so bugs are to be expected. I wouldn't touch an alpha, betas I sometimes use if it seems to work alright.
 
It's not even just Ubuntu, it's any distro running the 2.6.27rc1 kernel. So, Mandriva 2009.0 RC2, Opensuse 11.1 beta, Fedora are having issues as well. Here is a comment from one of the Linux Intel NIC driver developers:

"I work on the e1000 team (including the e1000e driver) and here is what we know. A panic in another driver (believed to be the gfx driver but uncertain) which scribbles over the NIC/LOM non-volatile memory (NVM). This is only happening with the 2.6.27-rc kernels on ICHx systems. Since the NIC/LOM VNM is part of the whole BIOS image other things in the system could be effected by this driver panic as well. An update of the system BIOS will restore the NIC/LOM to be operational. We have some patches under test right now that we will be releasing later today to protect the NIC/LOM NVM. That should help narrow down who is scribbling over NVM."

Seems like you SOL if you run this kernel on a intel board that is using an intel nic. You should be fine if you are using a non-intel nic.

I hope they release a patch ASAP.
 
Well, it's good to know that by updating the BIOS everything goes back to normal.

So, again the media over exaggerate the whole thing...:rolleyes:
 
You know I wonder if this affects other kernels as well...

Reason I ask is the day before yesterday I ran a routine yum update on one of my offsite dedicated servers, and later that evening the server went completely dead to the world. After troubleshooting by the techs they determined that the machine was segfaulting on reboot and it was the NIC that was causing it. It was an Intel NIC and they replaced it with a Realtek and everything has been fine since.

Kernel version on this box is 2.6.18-92.1.13.el5 (Centos 5)
 
Did you know 40-70% of all Intel NICs in the wild are counterfeit? Did you know NewEgg is also NOT an authorized Intel NIC reseller?

After picking up a simple PCI Intel NIC from good old reliable New Egg and having the physical RJ-45 connector separate from the card breaking it, I learned these facts from Intel themselves, from an Intel representative who told me this is a very real problem and the end user has ZERO chance of telling the fake from the real as the fakes are made at the same factories and use the same stickers (fancy glowy and shiny ones too) and the serial numbers are the same format and all the silk screenings are the same.

How do they do it? Intel places an order for 50,000 cards and the factory instead makes 75,000... and sells those 25,000 on the black market, all the while billing Intel for about 55-60 thousand, so they don't get too suspicious about costs.

The extra cards however, made during the 'third-shift' use less of the expensive and necessary materials to keep the card together, hence more frequent failures..such as my failed connector. The end result is that Intel eats the cost or gains a fabulously angry customer who feels he is being duped by Intel, not by some shady operator far away.

My point? This reported issue might be amplified by counterfeit cards, who knows, maybe ONLY counterfeit cards are affected, or maybe only genuine cards are...detailed investigating must occur.

NewEgg apologized profusely for the inconvenience and replaced the card to save me the trouble. Intel refused to replace the card as a matter of policy, but bumped me up to high levels of management to brief me on the issue..I was fortunate to speak with someone who was as vexed with this wide-spread, and growing problem.
 
Boy, why am I not surprised? Oh, right, because I was making the argument to people for protected memory space for buffers and enforced boundaries - which would have prevented this incompetence - in 1996. Back then I asked the question, why can an unrelated driver write to the memory space of MY driver? Answer; because it's easier that way for some drivers. Okay, and you can't give them a specific shared buffer space and then give my NIC a separate buffer space they can't write to? Answer; no, because it's hard work and makes us change old drivers! So fine, why can't boundaries be enforced? Answer; because we don't want to. Adding boundaries is actually fairly simple, though you have to do it per-driver because of the API mess only a crack-addict could try to like.

Linux is the only 'major' operating system that has this particular idiocy, by the way. FreeBSD, NetBSD, OpenBSD, Windows (via HAL), BSD/OS, SCO, and the list goes on - they all have boundaries and separate buffer spaces so you can't do things like this.

Nice to see that people still haven't gone to Kmart and picked up the cluebucket. Was on blue light special back then, too.

Oh, and yes, I've nuked other NICs; this is far from the first time Linux has done this level of stupid. When you're working with the MAC on any relatively modern NIC, you can rewrite your MAC Address and write NVRAM. The EtherExpress family has a bit to prevent this in some chips; they likely aren't setting it after loading the TCP Checksum magic.
 
Dude, that's almost every HardOCP headline, we should all know this by now. :rolleyes:

I'll see your :rolleyes: and raise you two :rolleyes: :rolleyes:


For the record, the news headlines are copied verbatim from their source. As always, I forward your converns on to the original authors of the story in question. Once there, they will file it appropriately, I'm sure.
 
I'll see your :rolleyes: and raise you two :rolleyes: :rolleyes:


For the record, the news headlines are copied verbatim from their source. As always, I forward your converns on to the original authors of the story in question. Once there, they will file it appropriately, I'm sure.

Thanks, appreciate it! :D
 
...

again with the headlines...

This is a problem with beta kernels NOT Ubuntu. The problem was 1st spotted in a pre-release of a Suse distro AND all distro or custom builds that use the (not yet fully released) 2.6.27 have the potential to have this issue

Boy, why am I not surprised? Oh, right, because I was making the argument to people for protected memory space for buffers and enforced boundaries - which would have prevented this incompetence - in 1996. Back then I asked the question, why can an unrelated driver write to the memory space of MY driver? Answer; because it's easier that way for some drivers. Okay, and you can't give them a specific shared buffer space and then give my NIC a separate buffer space they can't write to? Answer; no, because it's hard work and makes us change old drivers! So fine, why can't boundaries be enforced? Answer; because we don't want to. Adding boundaries is actually fairly simple, though you have to do it per-driver because of the API mess only a crack-addict could try to like.

Linux is the only 'major' operating system that has this particular idiocy, by the way. FreeBSD, NetBSD, OpenBSD, Windows (via HAL), BSD/OS, SCO, and the list goes on - they all have boundaries and separate buffer spaces so you can't do things like this.

Nice to see that people still haven't gone to Kmart and picked up the cluebucket. Was on blue light special back then, too.

Oh, and yes, I've nuked other NICs; this is far from the first time Linux has done this level of stupid. When you're working with the MAC on any relatively modern NIC, you can rewrite your MAC Address and write NVRAM. The EtherExpress family has a bit to prevent this in some chips; they likely aren't setting it after loading the TCP Checksum magic.

Right so people have never bricked their mobo by flashing beta BIOS images,
Or how about windows users using software with Starforce which has been known to damage CD drives


software that damages hardware isn't something isolated to linux (so please stop with the spreading of FUD). It is something that occured when inadequate specs are provided or specs are violated
 
software that damages hardware isn't something isolated to linux (so please stop with the spreading of FUD). It is something that occured when inadequate specs are provided or specs are violated

Don't complain about the headlines that are posted here completely out of context, go complain to the webpages that post these sensationalistic headlines. :rolleyes:
 
Thanks, appreciate it! :D

Heh, that made me laugh :)

Don't complain about the headlines that are posted here completely out of context, go complain to the webpages that post these sensationalistic headlines. :rolleyes:

....because God only knows that a Headline + Added Question Mark + Paragraph Explanation + Paragraph Quote of the original article + Link to actual article = "completely out of context."
 
....because God only knows that a Headline + Added Question Mark + Paragraph Explanation + Paragraph Quote of the original article + Link to actual article = "completely out of context."

You know what, you're right. The only thing I can possibly be sore about is the reproduction of sensationalistic headlines here, which again you guys are just reproducing. What alternative it there? Even though they are still the same headlines (which is how news outlets of all types troll for more viewers/hits), it isn't like you are further embellishing them unless deliberately trolling in your commentary (hey, it happens and you guys have a very strong personal voice that isn't above making some fun or taking some shots). I mainly remember times when stories that are proven to be untrue in the comments (ie - the BSOD at the NIN concert that wasn't an actual BSOD, it was just a part of the show) is followed up by another article going deeper into the same topic. Maybe you didn't read the comments from the earlier thread but it wouldn't be the first time its happened in an example like that.

Either way, apologies for pinning everything on the news hounds. Its just too bad the stuff you're given to work with. Most news headlines by nature are created expressly to generate controversy and gain hits.
 
Seems things are moving foward
Intel (the main [if not only in most cases, they are very OSS friendly] contributor to these and other intel drivers in the kernel source) has pushed a patch that disabled writing to the EEPROM thus allowing debugging of what is actually triggering this to commence

http://lkml.org/lkml/2008/10/1/368
This patch is meant to prevent all future corruptions of the
e1000e NVM (non volatile memory) after the driver is loaded. The
registers stay locked until the machine is power cycled.

This should allow us to move forward with debugging without
allowing any other bad element or the e1000e driver, to write to
the NVM area unexpectedly.
 
Back
Top