So what went wrong with BD?

Stoly

Supreme [H]ardness
Joined
Jul 26, 2005
Messages
6,713
Is it the architecture?
the shared cache?
the longer pipeline?
the manufacturing process?
the shared FP?
all of the above?
 
Is it the architecture?
the shared cache?
the longer pipeline?
the manufacturing process?
the shared FP?
all of the above?

I see nothing wrong with BD.. infact I believe is about the best piece of tech that comes along, AMD has a champ here, but trolls can't gasp why is NOT killing SB.. just give this tech a time and you will see how it will perform... more cores is they future...
 
I see nothing wrong with BD.. infact I believe is about the best piece of tech that comes along, AMD has a champ here, but trolls can't gasp why is NOT killing SB.. just give this tech a time and you will see how it will perform... more cores is they future...

AMD HAS A CHAMP? DUDE! Lay off the brown acid! How deluded can a mammal be? Geezus!

BD SUCKS. END OF STORY. :(
 
I see nothing wrong with BD.. infact I believe is about the best piece of tech that comes along, AMD has a champ here, but trolls can't gasp why is NOT killing SB.. just give this tech a time and you will see how it will perform... more cores is they future...

Did you even read any reviews :rolleyes:.

I honestly have no idea what happened. Double the transistors, bigger die, more cache, cores, and this still did not beat the 2600K. Even the 1100T kept its ground against it in many areas and went past it in others. Just epic fail. I am sure someone like Anand will do a write up about what happened.
 
AMD HAS A CHAMP? DUDE! Lay off the brown acid! How deluded can a mammal be? Geezus!

BD SUCKS. END OF STORY. :(

AMD Modules are the Way of the Future, Intel FanBoys can't see this, heck I believe in the future we will have 3 cores sharing a mudule, why? the future is Highly paralel codes, apps, games..

but TROLLS... will be TROLLS... so I can't help you guys see the future, I for one I'm waiting on Interlagos...
 
Did you even read any reviews :rolleyes:.

I honestly have no idea what happened. Double the transistors, bigger die, more cache, cores, and this still did not beat the 2600K. Even the 1100T kept its ground against it in many areas and went past it in others. Just epic fail. I am sure someone like Anand will do a write up about what happened.

Nah. Dude's got AMDZone blinders on. Maybe he and Baron Matrix can go off somewhere and keep telling each other how great FX is and leave us sane people alone.
 
More expensive to manufacture, performance comparable to the previous generation. Sounds like a big waste of R&D funds for a more expensive product.
 
AMD Modules are the Way of the Future, Intel FanBoys can't see this, heck I believe in the future we will have 3 cores sharing a mudule, why? the future is Highly paralel codes, apps, games..

but TROLLS... will be TROLLS... so I can't help you guys see the future, I for one I'm waiting on Interlagos...

I'm waiting for you to produce a psychiatric evaluation. Baron called and said he's got candy. Now run off...
 
AMD Modules are the Way of the Future, Intel FanBoys can't see this, heck I believe in the future we will have 3 cores sharing a mudule, why? the future is Highly paralel codes, apps, games..

but TROLLS... will be TROLLS... so I can't help you guys see the future, I for one I'm waiting on Interlagos...

i think we can all agree on whos the troll in this thread. i hear we need 8 cores to run today and tomorrows games.
 
They should have called modules cores and cores modules. A quad-core, 8 module (or thread, if you will) Bulldozer would have compared more favorably.
 
I'm serious about the question. On paper nothing seems to be wrong with BD, but it looks like the opposite of what it should be. Slower, more power hungry and less efficient.
 
Wrong.. its actually less expencive to produce

If they would have been more conservative and used a more traditional architecture they would have gotten away with fewer transistors for the same performance which would have been dramatically cheaper.

Of course I'm not personally interested in this segment since I don't like leaf blower systems anymore.
 
Hmm... where did Kyle put that button... oh, here it is: "PERMANENTLY IGNORE ALL DELUDED IRRATIONAL RABID AMD FANBOIS AND SHILLS." Click. Bye.:)
 
I'm serious about the question. On paper nothing seems to be wrong with BD, but it looks like the opposite of what it should be. Slower, more power hungry and less efficient.


IPC per core is lower than AMD K10, just like hyper-threading is lower IPC per thread. Now the IPC per module is quite good, and if you were to complete re-write software from the ground up optimized for parellel threads instead of IPC parallelism, it wouldn't be such a disaster.


Bulldozer = New! Coke
 
I'm serious about the question. On paper nothing seems to be wrong with BD, but it looks like the opposite of what it should be. Slower, more power hungry and less efficient.

its new tech buddy.... think about how old RISC companies were laughing at the Idea of intel Based workstation on the 90s? well guess what? now supercomputers are build by using COTS, so RISC is dead.. AMD has a Champ on their hand, they new and I knew this was about to happe, but everything has a beginning and this new tech had to be let loose on the wild, with time the Module aproach will be the ONLY way to go, we will have 20 core desktop products in a few years and they will cost only $200 while Intel charges $5k for its 10 core server chips...
 
If they would have been more conservative and used a more traditional architecture they would have gotten away with fewer transistors for the same performance which would have been dramatically cheaper.

Of course I'm not personally interested in this segment since I don't like leaf blower systems anymore.

I am sure AMD saw that option too... but they new they could only compete with intel, what AMD did with BD is to create a new tech so radical and so new that NOBODY will dare to compete with it, why..?

in a few years there will be 20 core desktop CPU with 4 cores sharing a BD Module.. that makes things so cheap for AMD, all the software guys needs to do is wright apps for it.. how is Intel going to compete with 20 cores without charging $10k for one I can't see how they could do that with OLD CPU Tech..
 
The chip falls around 2500k performance . its main problem is its power draw and thats largely on the fabs fault.

So hopefully as they fix the process power usage will come down and performance will stay the same.
 
I am sure AMD saw that option too... but they new they could only compete with intel, what AMD did with BD is to create a new tech so radical and so new that NOBODY will dare to compete with it, why..?

in a few years there will be 20 core desktop CPU with 4 cores sharing a BD Module.. that makes things so cheap for AMD, all the software guys needs to do is wright apps for it.. how is Intel going to compete with 20 cores without charging $10k for one I can't see how they could do that with OLD CPU Tech..

Because if it takes 20 cores to do what Intel can do in 1/2 the amount is pretty inefficient if you ask me.
 
Is it the architecture?
the shared cache?
the longer pipeline?
the manufacturing process?
the shared FP?
all of the above?

It's a combination of issues.
  • GloFo hasn't perfected their new process yet I don't think, Llano having problems I hear through the grapevine.
  • The design is odd and a departure from anything we've seen in this particular market, except maybe Intel with their Netburst. The modules on Bulldozer have a monolithic design philosophy where each module is sharing 2MB L2 Cache.
  • Throughput advantage, a slim one at that, cannot be claimed with how W7 OS Task Scheduler assigns processes and threads to cores (haphazardly), it is beneficial if two related processes/threads/forks/children occupy the same module and share the same L2 because Bulldozer allows dynamic resource allocation between threads.
  • Some sort of unknown tying up of resources/thrashing seems to be occurring.
  • The thread retire logic, prediction prefetch, unified scheduler, or thread retire stuff seems to have given up some IPC.
  • This processor isn't meant for the desktop by nature. Most applications are still very heavily dependent upon single threaded performance, something eight cores or sixty-four cores won't really do to help solve the problem because sometimes in computing it is very difficult to make a problem vectorized or to make it parallel. More cores = not helping.
  • Performance of BD is erratic at best. Why did it do well in the second pass of the x264 encode when it was being hammered but when it comes to certain tasks it chokes and Thuban and Deneb actually sometimes tie or beat it in the metrics?
  • Longer pipeline and complexity in having to juggle a lot of complicated stuff.
  • I think everyone suspects that circuit power gating isn't working the way it ought to.

The design while brave and a walk in the right direction, is very short sighted. I believe the frame of mind was "We can combine common functionality/sharing functionality between two cores and minimize silicon area to increase die real estate so they can go and put more cores on a die.

That's fine and dandy (good to be the leader in technology and pave ahead), but something is seriously messed up with single threaded performance and even multithreaded performance.

The only thing can hope for is between fixing whatever issues there are with Bulldozer, production, design, or otherwise, they have the kindness to just go ahead and shrink Thuban to a 32nm process in the meantime.
 
Because if it takes 20 cores to do what Intel can do in 1/2 the amount is pretty inefficient if you ask me.

in todays software state? Yes.. but the future will be more simple cores each working in highly parallelized coded apps, just like relentless ants..
 
I am sure AMD saw that option too... but they new they could only compete with intel, what AMD did with BD is to create a new tech so radical and so new that NOBODY will dare to compete with it, why..?

in a few years there will be 20 core desktop CPU with 4 cores sharing a BD Module.. that makes things so cheap for AMD, all the software guys needs to do is wright apps for it.. how is Intel going to compete with 20 cores without charging $10k for one I can't see how they could do that with OLD CPU Tech..


Yes, so the real question is, how many modules can they stuff into a single package? Could AMD put 16 cores into a single package? I would think so, and the performance would be very compelling, especially for the cost, at least in the IT space.
 
That's fine and dandy (good to be the leader in technology and pave ahead),
somebody had to do it, if intel could not, AMD had to, I mean Apple Hardware is dead, as dead as the RISC cpu is, but I bet if it was Apple doing it, nobody would be bitching about it...
 
Yes, so the real question is, how many modules can they stuff into a single package? Could AMD put 16 cores into a single package? I would think so, and the performance would be very compelling, especially for the cost, at least in the IT space.

Well that's Interlagos for you, but technicaly is two 4 modules glued together...:D

I believe this world is heading for Highly parallel computers and AMD is leading the way with its Module tech.. it just new tech with very few apps taking advantage of its power..
 
in todays software state? Yes.. but the future will be more simple cores each working in highly parallelized coded apps, just like relentless ants..

This requires developers to start thinking about designing things in terms that take advantage of this architecture. In its current state, I don't see people moving over to this new way of thinking without it first gaining significant market penetration.

There is simply no reason to buy one of these things right now, so it becomes a chicken-and-egg problem because developers will not waste time writing software for anything other than the hardware platforms the vast majority of their client base uses. I know the use of things like .NET or Java or whatever other JIT/portable code whachamadgiggery can somewhat lessen the impact, but for high-performance computing there will still be a great deal of manual tweaking which will not get done if the weird new architecture runs like a dog regardless.

I guess we'll just look forward to Piledriver and see if that does any better, because I simply don't see BD making much of a difference at the current time.
 
I see nothing wrong with BD.. infact I believe is about the best piece of tech that comes along, AMD has a champ here, but trolls can't gasp why is NOT killing SB.. just give this tech a time and you will see how it will perform... more cores is they future...

Is that you AMD_Gamer?

Did you create this account just in case BD sucked?
 
AMD HAS A CHAMP? DUDE! Lay off the brown acid! How deluded can a mammal be? Geezus!

BD SUCKS. END OF STORY. :(

I think he's trolling, but he may be on to something.

AMD screwed the pooch, big time. They tried to introduce a major new architecture AND introduce a new process at the same time. There is a reason why Intel adopted its tick tock strategy and does this every other year.

By all accounts there is nothing wrong with the BD arch itself. It is intentionally designed with a long pipeline which reduces IPC somewhat, and allows higher clocks.

The problem is the 32nm process. Global Foundries yields and process maturity is likely preventing BD from reaching the clocks that AMD had planned by launch time, due to each clock increase requiring too much voltage. This is a typical symptom of a process yield issue.

We see this even more by the fact that the Opteron 6200 series low clocked parts have very good power usage and are competitive with Intel's offerings.

Code:
Model		Cores	Frequency	TDP		Pre-order price
Opteron 6204	4	3.3 GHz		115 Watt	$516.13
Opteron 6212	8	2.8 GHz		115 Watt	$303.17
Opteron 6220	8	3.0 GHz		115 Watt	$588.93
Opteron 6234	12	2.3 GHz		115 Watt	$430.00
Opteron 6238	12	2.5 GHz		115 Watt	$516.13
Opteron 6262 HE	16	1.6 GHz		85 Watt		$588.93
Opteron 6272	16	2.1 GHz		115 Watt	$588.93
Opteron 6274	16	2.2 GHz		115 Watt	$720.17
Opteron 6276	16	2.3 GHz		115 Watt	$881.22
Opteron 6282 SE	16	2.6 GHz		140 Watt	$1135.26

It is also notable that FX seems to use a TON of power when overclocked.

This means three things.

1.) AMD is getting poor yields from a immature process.

2.) The best parts (stable at the lowest voltages) are being binned as Opterons.

3.) Process yields mean that any clock increase comes with an exorbitant voltage and power penalty.


In Q1 2009, when Phenom II X4 first was launched, the highest cherry picked samples would clock on extreme cooling was 4.2Ghz. Fast forward less than 2 years and in December 2010 it was the norm for people to hit 4.2 with X4's on air.

Seeing that current cherry picked samples are hitting 8.5Ghz with extreme cooling tells us something about where the arch may be by 2013. Add to this that a 10% IPC gain ought to result each year from Piledriver -> Excavator.

Suddenly it makes so much sense why BD has been delayed so many times, and why we heard rumors this summer that AMD engineers were unhappy with the clocks they were getting on test samples.

AMD was hoping that Global Foundries process would mature more quickly than it did. we don't know how bad it was back in June, and how far it has come since then, but it is clear today that it didn't mature fast enough.

Hopefully if this speeds up though, we'll see a quick ramp-up in CPU speeds to the point where BD is a little more competitive.
 
somebody had to do it, if intel could not, AMD had to, I mean Apple Hardware is dead, as dead as the RISC cpu is, but I bet if it was Apple doing it, nobody would be bitching about it...

RISC design strategy is alive and well.
Just ask anyone with a smart phone. Which is exactly why Apple plans to jump on it and Microsoft has Windows 8 with ARM support out of the gate. ARM is a RISC based ISA.
 
ThiI guess we'll just look forward to Piledriver and see if that does any better, because I simply don't see BD making much of a difference at the current time.

For Desktop? Yes.. but that's for Noobs, I could care less about Desktop use or even gaming(PS3 for it) AMD is going to gaint alot from this new tech on the HPC Systems, Cloud computing, Sever consolidation, and High Performance Clusters.. I'll use amd UPU for personal computing...
 
RISC design strategy is alive and well.
Just ask anyone with a smart phone. Which is exactly why Apple plans to jump on it and Microsoft has Windows 8 with ARM support out of the gate. ARM is a RISC based ISA.

I was talking about Computers not toys..:D I seriouly doubt ARM will be taking on AMD on the HPC systems.. not in the next 20 years at least, but I see a future of 1,000's little ARM cores in desktop using not more than 10 Wats.. but that's too far away now.. maybe by the time Atom will be as small and as power hungry as ARMs..
 
I just threw together this chart based on power usage from various tests on the internet, as well as AMD's published figures.

It illustrates exactly the problem AMD is dealing with:

6238187253_dda84df8a9_o_d.jpg


The top data point was a review at 4.818Ghz

The two clustered together at 4.6 are the [H] tests. I have done some rough calculations to try to compensate for the fact that these are whole system tests.

The rest are AMD's TDP numbers for the full Opteron 6200 and FX ranges divided by core count.

This chart shows very clearly the exponential relationship with power as clock increases. As the process matures the curve should become flatter and allow much higher clocks.

The one outlier that messes this up seems to be the 4core Opetron. maybe they are betting that corporate purchasing isn't going to look into TDP of these as they are likely workstation chips, and corporate IT usually only looks carefully at TDP for server farms. This knowledge may have been used to bin for the 4 core Opteron.
 
Here are my thoughts:

On paper, Bulldozer has %66 the decoder bandwidth when all cores in a module are loaded, and each core has %66 the integer throughput on-paper. This is probably more like %90 of Thuban performance per-core in the real-world (and this is what we tend to see in integer-heavy stuff).

In FP-heavy loads (say, Cinebench 3D rendering) the gap widens because a Bulldozer module has somewhere around half the FP assets of two Thuban cores. There is no fix for this.

The reason for the massive power consumption? I believe that fault likes squarely in the cache. With 16MB of big cache blocks, this has more cache than any previous AMD desktop CPU, and it also raises the AMD cache-per-core ratio to 2.0 MB for the first time ever (Istanbul was the previous high at 1.5MB/core). I have to believe this necessary to keep the deep pipelines of Bulldozer continuously fed (remember how cache-hungry the Pentium 4 was?). I think AMD saw performance tanking with less cache, so they designed-in mountains of it.

Cache uses-up a ton of transistors and a ton of power, and I believe this is the cause for the massive die size and massive dynamic power. And since AMD is hitting the power wall, they can't scale the clock speed like they hoped they could with the deep pipeline. So, you have features designed to work together that are actually working against each-other. Intel learned this the hard way with the P4, and I'm astounded that AMD made a similar mistake.
 
Last edited:
It's a combination of issues.
  • GloFo hasn't perfected their new process yet I don't think, Llano having problems I hear through the grapevine.
  • The design is odd and a departure from anything we've seen in this particular market, except maybe Intel with their Netburst. The modules on Bulldozer have a monolithic design philosophy where each module is sharing 2MB L2 Cache.
  • Throughput advantage, a slim one at that, cannot be claimed with how W7 OS Task Scheduler assigns processes and threads to cores (haphazardly), it is beneficial if two related processes/threads/forks/children occupy the same module and share the same L2 because Bulldozer allows dynamic resource allocation between threads.
  • Some sort of unknown tying up of resources/thrashing seems to be occurring.
  • The thread retire logic, prediction prefetch, unified scheduler, or thread retire stuff seems to have given up some IPC.
  • This processor isn't meant for the desktop by nature. Most applications are still very heavily dependent upon single threaded performance, something eight cores or sixty-four cores won't really do to help solve the problem because sometimes in computing it is very difficult to make a problem vectorized or to make it parallel. More cores = not helping.
  • Performance of BD is erratic at best. Why did it do well in the second pass of the x264 encode when it was being hammered but when it comes to certain tasks it chokes and Thuban and Deneb actually sometimes tie or beat it in the metrics?
  • Longer pipeline and complexity in having to juggle a lot of complicated stuff.
  • I think everyone suspects that circuit power gating isn't working the way it ought to.

The design while brave and a walk in the right direction, is very short sighted. I believe the frame of mind was "We can combine common functionality/sharing functionality between two cores and minimize silicon area to increase die real estate so they can go and put more cores on a die.

That's fine and dandy (good to be the leader in technology and pave ahead), but something is seriously messed up with single threaded performance and even multithreaded performance.

The only thing can hope for is between fixing whatever issues there are with Bulldozer, production, design, or otherwise, they have the kindness to just go ahead and shrink Thuban to a 32nm process in the meantime.

Really well put, 100% agree with everything you said, especially the last paragraph. Or they could just remove the GPU from Llano and unlock the multipliers.
 
I also think its a fabrication process screw up. Possibly used some cheap material to save money that screwed up the electrical conductivity of the chip causing poor power performance.

Price wise at launch on paper it looks like a decent CPU. I think the FX8100 and FX6100 will do better on the reviews because of lower price and lower TDP.
 
Bulldozer's design is quite radical, and it just din't work out in the consumer market. The new architecture + new process size is just asking for disaster...Intel specifically avoids this with their tick-tock system.
 
I think the architecture is a good server chip now and will be a good desktop chip later. I have plans for a BD based server once we get around to the next stepping or so.

I compare it to the the Athlon64 when it was released. The chip's killer app was not realized until a couple years after it's release with XP 64bit.
 
Back
Top