AMD HyperThreading?

CentronMe said:
Are you calling a Pentium 4 a half-assed broken processor design? :p :D ;)


Just because someone has a patent does not mean they are ever going to use it.

Intel and AMD have patent sharing agreements so they do not have to work around each others designs and mess up the x86 market.

Ironic, isnt it?
 
why not have type of hyperthreading that makes use of the extra 32 registers that go used in 32bit mode ?
 
CentronMe said:
Are you calling a Pentium 4 a half-assed broken processor design? :p :D ;)
Some of us can recognize junks when we see it.

Just because someone has a patent does not mean they are ever going to use it.
Not all of us wait for Intel or AMD to invent or patented that hack before we used it, no - we used it long ago though we never sold the hack to our customers as a "feature".

Elios said:
why not have type of hyperthreading that makes use of the extra 32 registers that go used in 32bit mode
The AMD64 in your sig needed not explicit external supports to function at maximum efficiency outperforming other P4s in their native 32-bit x86 mode.

Nope, it required no programmer performing HyperThreading acrobatics for proper 32-bit fractional processing function efficiency, all are done transparently. Though you can also use it in its native non-fractional 64-bit mode.
 
nam-ng the Pentium 4 is actually a really good architecture…well its called NetBurst, but the thing that cripples is it the pipeline…
 
Duke3d87 said:
nam-ng the Pentium 4 is actually a really good architecture…well its called NetBurst, but the thing that cripples is it the pipeline…


its original netburst design was something to marvel at.... (from what ive heard) ... but they fucked it all up to lower its price. (again... just going from what ive heard)
 
Jason711 said:
its original netburst design was something to marvel at.... (from what ive heard) ... but they fucked it all up to lower its price. (again... just going from what ive heard)

for the most part,
786 was supposed to debute pretty much as we saw Northwood (and motherboards were supposed to support L3. L1 and trace cache were intended to be larger as well there was also some talk of fully independant FADD and FMUL, but I don't know how far into design that lasted.)
But Tbird was running away with the performance crown, which back then was pretty much determined by mhz, so the buisness side gutteted 786 to meet volume requirements with 180nm fabs.
They stripped off about 1/2 the total cache (in all forms) on the die, and P4's reputattion was born.
Wich is really some what of a shame, because Northwood, P4C in particular, were very competitive chips for several years and alot of that is lost on people. There was no performance leader, no matter how much you want to bash the 786 design, it gave no ground to AthlonXP for a year and a half.
 
Elios said:
why not have type of hyperthreading that makes use of the extra 32 registers that go used in 32bit mode ?

you mean the 8 new registers?
or the high DWORD (32 bits) on the existing registers?

the former might be interesting, but the problem is there are no extra FP registers to reproduce architectual FP, MMX and SSE registers for two threads. (in fact K7 doesn't have enough FP registers to fully reproduce the 128 bit SSE registers, since the FPRF wasn't designed with SSE in mind, K8 had to add 32 new registers to support SSE). So you'd have to signifigantly reduce the number of in flight FP ops to supply both threads with enough architectual registers.
 
Elios said:
why not have type of hyperthreading that makes use of the extra 32 registers that go used in 32bit mode ?

I assume you mean go unused in 32-bit mode?

That would be kind of stupid considering we do have these things called 64-bit OSs. Would we then not have this crappy implementation of HyperThreading only 32-bit mode? AMD is pushing 64-bit computing as the way of the future, not HyperThreading.

Everyone get it through your heads that it is not ever going to happen on any AMD designed CPU as long as they keep on producing efficient chips. HT is used on P4s to make up for the inefficiency of the NetBurst design. I admit Northwood and Prescott are good, but only because Intel fixed their mistakes from ages ago as FreiDOg already said above. But lets face it we allready know that Intel is going back to a 686 based Pentium M design for decktops come 2007. NetBurst just cannot scale any higher and give an improvemnt gain over the cost of cooling the CPUs.
 
Elios said:
yes unused >.< typo sorry
Do you suppose that AMD should include wasted unused resources in newer designs to come? So as to propagate customized and near useless P4's efficiency hack but for specific conditions called HyperTreading?

The only reason HyperTreading exist for programmers is due to its very nature being non-general haredware optimization for most conditions, not easily compensated transparently by internal hardware functions, even actually caused performance penalty as the norm outside of the specific conditions and greater waste of even more resources.

It is really and truly hilarious when Intel finally optimized HyperThreading virtual fractional processors to outperform normal single processor mode in all operating conditions. Broken became the norm and the norm became broken.

A processor that's only good as multiple partial fractional processors rather than being just itself.
 
yes HT is a hack that P4 needs to keep its pipeline full

im justing thinking how AMD can get more power out of what it already has and since
mass use of a full 64bit os is 1-2 years away why not put the extra registers to use in 32 bit mode unless im wrong and thay already get used ?
 
AMD will not get any additional performance by implementing HyperThreading, They have tweaked the K8 core about as much as they can and only scaling will increase performance now.
 
nam-ng said:
Do you suppose that AMD should include wasted unused resources in newer designs to come? So as to propagate customized and near useless P4's efficiency hack but for specific conditions called HyperTreading?

don't you think a CPU should try and make the most use of the resources it has?
do you prefer your CPU to sit idle while a fetch is made from system memory?
do you enjoy waiting around for the next packet to be written to PCI memory?
you think it's a good idea for the FPU to twidle it's thumbs because the current thread doesn't need it?
these are good uses of resources while other threads sit blocked waiting for their time slice?
 
CentronMe said:
Basically correct.

The P4 has a very high branch mis-prediction and large amount of pipelines. As we all know the Presott P4 has 31 pipelines, which will allow it to scale clock speed higher but the branch mis-predictions go up with this. This is why it also has an increased L2 cache size. The K8 core on the other hand only has 12 data pipelines and a very low mis-prediction because of this.

HyperThreading works by using the unused pipelines in the P4 since they can sit there unused a lot of the time. With the K8 all of the pipelines are generally being used all of the time as the chip is a lot more efficient over the P4. So even if AMD did implement SMT is would not do anything because AMD is already using their CPU as efficient as they can.

Saying pipelines you make it sound more parallel like GPU's. It's more like an assembly line. Thats why another word is usually attached like "pipeline stages". Visually 1 pipeline or assembly line with multiple stages.

It seems subtle but it's a big difference. It's not "31 pipelines" but instead "31 pipeline stages." This is why 31 in this sense often is referred to as a "deep" or "long" pipeline, not several ala GPUs. Pretty sure I've got this right, not trying to nitpick. In this sense it can be easier to visualize the stalls in your pipeline (or assembly line) as opposed to a massive amount of pipelines.
 
I do not remember where I read it but some guy was working to a feature set that would let the processors registers to become any of the registers so for repedid register commands it will not have to make multiple passes to complete but change the next register to the needed one increasing operations completed for every pass I will try to find the article
 
texuspete00 said:
Saying pipelines you make it sound more parallel like GPU's. It's more like an assembly line. Thats why another word is usually attached like "pipeline stages". Visually 1 pipeline or assembly line with multiple stages.

It seems subtle but it's a big difference. It's not "31 pipelines" but instead "31 pipeline stages." This is why 31 in this sense often is referred to as a "deep" or "long" pipeline, not several ala GPUs. Pretty sure I've got this right, not trying to nitpick. In this sense it can be easier to visualize the stalls in your pipeline (or assembly line) as opposed to a massive amount of pipelines.

You're correct.. its a 31 stage pipeline, not 31 pipelines.

Its a rather important difference.
 
FreiDOg said:
don't you think a CPU should try and make the most use of the resources it has?
Anyone designing hardware stupid enough not to do so? Anyone designing hardware stupid enough to add the wasted unused fuck-ups specific to another design, just so the same hack called HyperThreading could become a "supported feature" in Opterons?
do you prefer your CPU to sit idle while a fetch is made from system memory?
do you enjoy waiting around for the next packet to be written to PCI memory?
you think it's a good idea for the FPU to twidle it's thumbs because the current thread doesn't need it?
these are good uses of resources while other threads sit blocked waiting for their time slice?
Those are problems inherent to classical Intel UMA/Symmetric Processing designs, they are not the same in AMD64/Opterons NUMA/Distributed Processing designs. AMD needed not importing hardware fuck-ups sold to idiots as "features" from Intel.

We didn't sell hacks to correct specific fuck-ups of our designs to our customers as "features", besides... our customers were never that stupid.
 
nam-ng said:
Anyone designing hardware stupid enough not to do so? Anyone designing hardware stupid enough to add the wasted unused fuck-ups specific to another design, just so the same hack called HyperThreading could become a "supported feature" in Opterons?

Those are problems inherent to classical Intel UMA/Symmetric Processing designs, they are not the same in AMD64/Opterons NUMA/Distributed Processing designs. AMD needed not importing hardware fuck-ups sold to idiots as "features" from Intel.

We didn't sell hacks to correct specific fuck-ups of our designs to our customers as "features", besides... our customers were never that stupid.
Sounds like a bit of bias in there...

Bottom line, HT improves performance, who gives a damn why?

I don't like Rambus, but RDRAM worked nicely on the P4s. Their new design looks promising.

HT was either in my design, which is my opinion, or as an afterthought to improve performance, which is what many of you say here. Bottom line, it brings the IPC up on a P4 and closes the gap to the Athlon.
A marketing gimmick? Something like a 33% increase in video encoding is a damn nice marketing gimmick.

I prefer AMD, I own AMD systems, one P4 laptop. To say the P4 is a better design I consider to be misleading at best, to say HT is a worthless gimmick is an outright lie. The P4 is one of Intel's biggest architectural screwups in my opinion, but it sold chips, which is what it was suppose to do. After HT, by design or afterthought, appeared, the P4 became a contender. I suppose you'd rather they did nothing to correct the performance issues than fix them?

The other option is a total redesign, which would leave them in a bad situation for the duration... not a good option either.
 
0ldman said:
Sounds like a bit of bias in there...

Bottom line, HT improves performance, who gives a damn why?

See the original query below?

****************************

AMD HyperThreading?
I am wondering why AMD dose not use HyperThreading or there own version of it on there processors. Wouldn't implementing some form of this help in multitasking and other applications that the P4 takes advantage of, would the shorter pipelines make the amd perform slower or is it just that Intel has it so they cant use it.

****************************

The original poster probably wanted a lot of know nothing ignorant dumbshit technical opinions and maybe some real actual facts of whys to go with them.

I haven't much use for ignorant dumbshit technical opinions and had none to give, just some facts of why he/she may wanted. My facts were not for you, only to he/she which do give a damn of why.
 
nam-ng said:
Anyone designing hardware stupid enough not to do so? Anyone designing hardware stupid enough to add the wasted unused fuck-ups specific to another design, just so the same hack called HyperThreading could become a "supported feature" in Opterons?

Well, Athlon had a read world IPC of around 2.5 instructions dispacted per clock cycle, Athlon has 5 execution units, 50% usage seem a little, i don't know, wastetful to you?
If you believe exactly what AMD says, K8 gets about 25% higher IPC, for just over 3.1 instructions dispacted per clock. Better, but you don't think it's wastefull to let ~2 out of 5 execution units idle on the average?
What if a Hyperthreaded K8 could dispatch just 1 instruction per clock from a second thread?
That's 50% less idleness, just becuase it's hyperthreaded.
That's 30% more work getting done.


Those are problems inherent to classical Intel UMA/Symmetric Processing designs, they are not the same in AMD64/Opterons NUMA/Distributed Processing designs. AMD needed not importing hardware fuck-ups sold to idiots as "features" from Intel.

We didn't sell hacks to correct specific fuck-ups of our designs to our customers as "features", besides... our customers were never that stupid.

No, they're inherent to SuperScalar pipelined CPUs.
SMP didn't cause it
and NUMA sure as hell doesn't solve it.

The only thing numa does is keep the most memory access penalties the same as in a 1P system. Which is very important in 8, 4 or even somewhat in 2P systems today, as programs get larger, require more data and CPU speeds continue to scale faster than the system bus speeds.
Just because SMT has the benifit of covering up a bit of the problems with SMP, doesn't mean a NUMA architecture shouldn't use it. SMT benifits either, SMT benifits any modern CPU in a multitasking / multithreaded enviorment.

Believe it or not, even on Athlon,
there are still cache misses
I/O devices are still slow
and programs still have instruction level dependancies.

Athlon doesn't have these 'fuck ups' as you call them, do you still not see there is a substantial benifit from using SMT?
 
FreiDOg said:
do you still not see there is a substantial benifit from using SMT?

I agree, although some CPUs see greater benefit than others from HT(SMT) it could benefit most to some degree. Whether that benefit is worth the effort and cost is something that the manufacturer must decide.

So far there doesn't seem to be a real push, on the AMD side, to institute SMT, but I don't doubt that they've tried it. It could be that it was already tested and the performance boost just wasn't enough to justify further development...we may never know.

I can say that is some circumstances on my 2.8 HT does improve things and in others it seems th have no effect or slow things down a bit.
Honestly, I have a strong SMP bias so multicored CPUs really interest me, SMT doesn't impress me much. But I have found uses for both...I've seen great performance and response on terminal servers with SMP AND SMT together...a heavily multithreaded environment with a lot of time sharing on the CPUs.
 
FreiDOg said:
Well, Athlon had a read world IPC of around 2.5 instructions dispacted per clock cycle, Athlon has 5 execution units, 50% usage seem a little, i don't know, wastetful to you?
If you believe exactly what AMD says, K8 gets about 25% higher IPC, for just over 3.1 instructions dispacted per clock. Better, but you don't think it's wastefull to let ~2 out of 5 execution units idle on the average?
What if a Hyperthreaded K8 could dispatch just 1 instruction per clock from a second thread?
That's 50% less idleness, just becuase it's hyperthreaded.
That's 30% more work getting done.
It is exactly as it should be, just as mine are the same. What had thoughts and belief anything to do with facts of how things actually worked?

No, they're inherent to SuperScalar pipelined CPUs.
SMP didn't cause it
and NUMA sure as hell doesn't solve it.

The only thing numa does is keep the most memory access penalties the same as in a 1P system. Which is very important in 8, 4 or even somewhat in 2P systems today, as programs get larger, require more data and CPU speeds continue to scale faster than the system bus speeds.
Just because SMT has the benifit of covering up a bit of the problems with SMP, doesn't mean a NUMA architecture shouldn't use it. SMT benifits either, SMT benifits any modern CPU in a multitasking / multithreaded enviorment.

Believe it or not, even on Athlon,
there are still cache misses
I/O devices are still slow
and programs still have instruction level dependancies.
You didn't know Simultaneous Multi-Threading (SMT) originated with NUMA/Distributed Processing? "HyperThreading hack" was derived and used from that SMT origin. Even "Distributed Multi-Processing" used to be "Simultaneous Multi-Processing".

Athlon doesn't have these 'fuck ups' as you call them, do you still not see there is a substantial benifit from using SMT?
Didn't I already said previously above that not everyone waited for Intel or AMD to invent the HyperThreading hack?
>>>>>no - we used it long ago though we never sold the hack to our customers as a "feature"<<<<<

I had not use UMA/Symmetric Processing in any of my hardware for nearly 20 years, only NUMA/Distributed Processing.
 
nam-ng said:
It is exactly as it should be, just as mine are the same. What had thoughts and belief anything to do with facts of how things actually worked?

So you, who spent 2 pages railing against P4 for being a wastefull and inefficient CPU, think letting 1/3-1/2 of the CPU idel is 'exactly as it should be?'
Just for clarification, at what percentage of idle execution units does a CPU go from being as a well designed modern CPU shoudl operate, to wastefull and inefficient?


You didn't know Simultaneous Multi-Threading (SMT) originated with NUMA/Distributed Processing? "HyperThreading hack" was derived and used from that SMT origin. Even "Distributed Multi-Processing" used to be "Simultaneous Multi-Processing".

I guess EV8 was NUMA.
Though true NUMA access only would have existed in the local block. To get access outside the local block you did still have to use a shared bus, a scalable bus, but a shared one.
Though the first commercialy availible CPU with SMT was P4 Xeon.

To be fair though, SMT was started in the accedemic community (Susan Eggers at UW), pretty much without regard for the underlying architecture. The orginal simulated system was a single CPU system in fact. The bus architecture was not relevant to idea of SMT. It only 'originated' from a NUMA system because Alpha happened to pick it up first.

Really though, anyway you look at it, NUMA, SMP, 1P, they all benifit nicely from SMT, when the software enviorment is right for it.

Didn't I already said previously above that not everyone waited for Intel or AMD to invent the HyperThreading hack?
>>>>>no - we used it long ago though we never sold the hack to our customers as a "feature"<<<<<

I had not use UMA/Symmetric Processing in any of my hardware for nearly 20 years, only NUMA/Distributed Processing.

really?
You were selling SMT enabled chips years ago?
You have a link to any of those systems/chips?
because the idea of on chip simultaenous multithreading wasn't published until 1995.
And every resource I've ever seen has said the same thing, EV8 was the first to adopt it, P4 Xeon was the first to be sold with it.
 
It is exactly as it should be, just as mine are the same. What had thoughts and belief anything to do with facts of how things actually worked?
Where the hell does this even enter into the discussion? Does this even make sense to anyone?

This guy is talking from both sides of his mouth. One sentence it is a failure of the P4 design to benefit from HT, another it is an ingenious step ahead of its time for his people to do it, either way it is a bad idea for the Athlon as it is the perfect CPU.

I do believe we are the people holding the "know nothing ignorant dumbshit technical opinions" he has such disdain for, and as such I'll take my opinion elsewhere. He isn't worth arguing with. You can have him FreiDOg.

10-1 he has to flame this post as well...
 
0ldman said:
Where the hell does this even enter into the discussion? Does this even make sense to anyone?
That's normal, even my simplified to retard level of "jittered blending of adjacent pixels caused bluriness in V5's FSAA" took most ignorant dumbshits more than 6 months. "V5's FSAA= RGMS" took some retards more than 4 years, and even that long duration of time passing was insufficient to overcome their ignorance and stupidity.

Some are born retarded and half the people in the world are below average.
This guy is talking from both sides of his mouth. One sentence it is a failure of the P4 design to benefit from HT, another it is an ingenious step ahead of its time for his people to do it, either way it is a bad idea for the Athlon as it is the perfect CPU.

I do believe we are the people holding the "know nothing ignorant dumbshit technical opinions" he has such disdain for, and as such I'll take my opinion elsewhere. He isn't worth arguing with. You can have him FreiDOg.

10-1 he has to flame this post as well...
I already know you don't like my facts, don't give a damn how things worked.

Ignorance, innuendos, baseless dumbshit opinions typical to low-life are your preference. I am sure these are perfect for arguments to others just like yourself in this thread.

I have no use for them but only facts.
 
nam-ng said:
That's normal, even my simplified to retard level of "jittered blending of adjacent pixels caused bluriness in V5's FSAA" took most ignorant dumbshits more than 6 months. "V5's FSAA= RGMS" took some retards more than 4 years, and even that long duration of time passing was insufficient to overcome their ignorance and stupidity.

Some are born retarded and half the people in the world are below average.

Anyone else know what this drivel has to do with the topic on hand?
 
Some are born retarded and half the people in the world are below average.
Deep comments while trying to look superior... Half the people of the world are below average...

average = a rough middle between the high and low ends of the spectrum

Where else are half of the people of the world suppose to be? :eek:
 
FreiDOg said:
I really doubt we'll see SMT from AMD though. Dual core chips extend basic workstations to 2 or 4P system, and servers to 16P, it just doesn't make sense for AMD to go back and make signifigant architectual changes when they can allready put that much processing power, that much parallelism into a box. If we were going to see SMT, we would have seen it from the outset of K8, or we'll have to wait and see with K9.

*wipes keyboard*

Damn... 16P in one box...
 
CentronMe said:
Well I am lost and confused . :p

Yeah, really...

Besides, you would think that if this genius was so knowledgeable about CPU architecture, he might be able to put two sentences together.
 
Josh_B said:
*wipes keyboard*

Damn... 16P in one box...
I think what FrieDOg is saying isn't that we won't see dual core, just not hyperthreading. Why have 2 virtual CPU's when you can have true SMP?
 
Originally Posted by Josh B
Yeah, really...

Besides, you would think that if this genius was so knowledgeable about CPU architecture, he might be able to put two sentences together.
Ha ha... I'd met dumb as tree stump experts with Master degrees in English before...

Unfortunately ignorance and stupidity in perfect Enghlish is still ignorance and stupidity, a low-life with a Master in English is still a low-life.

A lot of them are salemen, lawyers, journalists <--- "the shits of literature".

Originally Posted by FreiDOg
So you, who spent 2 pages railing against P4 for being a wastefull and inefficient CPU, think letting 1/3-1/2 of the CPU idel is 'exactly as it should be?'
Wow... Superior perfect English techniques?

Just for clarification, at what percentage of idle execution units does a CPU go from being as a well designed modern CPU shoudl operate, to wastefull and inefficient?
According to the above bolded words from your great personal expertise, you'd already known 1/3-1/2 of the CPU to be idle, why don't you tell me?

I guess EV8 was NUMA.
EV6 was NUMA and capable of 4 simultaneous data streams for 4 SMT concurrently to QUAD-PROCESSORS.
Though true NUMA access only would have existed in the local block. To get access outside the local block you did still have to use a shared bus, a scalable bus, but a shared one.
Though the first commercialy availible CPU with SMT was P4 Xeon.

To be fair though, SMT was started in the accedemic community (Susan Eggers at UW), pretty much without regard for the underlying architecture. The orginal simulated system was a single CPU system in fact. The bus architecture was not relevant to idea of SMT. It only 'originated' from a NUMA system because Alpha happened to pick it up first.

Really though, anyway you look at it, NUMA, SMP, 1P, they all benifit nicely from SMT, when the software enviorment is right for it.
You missed the most basic fundamental point completely, "NON-UNIFORM-ACCESS" is everything. What the fabric media was comprised of had nothing to do with functionality but relative performance, be it local memory, HyperTransport, PCI-E, or Ethernet... "NON-UNIFORM-ACCESS" is what made SMT possible natively in hardware.

Classical Intel UMA can only do "sequential one-at-a time data streams", special hacks required for multiple simultaneous data streams in SMT.

****************************************

For UMA, the HyperThreading hack is a hack on top of a hack, first... a kinda broken (pathetic efficiency) processor design required, which allowed left over unused resources for the hack of 2 virtual fragments of a real processor instead of a whole one, each with their own necessary data stream.

Next the programmers hacked the 2 data streams into one data chunk then performed "sequential one-at-a time data streams" to both virtual fragments becaused UMA is incapable of multiple simultaneous data streams.

That above is the simplified summarization of HyperThreading with UMA architecture.

*****************************************
really?
You were selling SMT enabled chips years ago?
Us graphic hardware guys have a big thing for parallelism, I myself don't have time to wait for anyone to invent it first... Ever heard of 3Dfx's SLI or nVIDIA's SLI?
You have a link to any of those systems/chips?
Link to something of mine? With the huge number of low-life and ignorant dumbshit fanatics on the net? With some of them pretended to be moderators and web-masters? I connect to this forum through the minimum of 3 anonymous proxys after I met a couple of those low-lifes a while back.
because the idea of on chip simultaenous multithreading wasn't published until 1995.
And every resource I've ever seen has said the same thing, EV8 was the first to adopt it, P4 Xeon was the first to be sold with it.
yep, I had heard "Multisampling hardware didn't exist before some OpenGL guys invented it in '92" from a dumbshit expert, in nov '99 I described an 18 years old "Temporal multisampling FSAA" implementation to the graphic experts at Beyond3D forum.

BTW... The graphic experts at beyond3D forum didn't know about "Temporal FSAA" existence either in NOV'99.


P.S. For the mentally-challenged experts in this thread a pictograph of "Non-Uniform-Memory-Access" below from MikeC of nvnews.



Anyone ever heard "crossbar"? or the simplified mantra for dumbshits --> "simultaneous multiple data streams from multiple sources to multiple destinations"?
 
nam-ng said:
Ha ha... I'd met dumb as tree stump experts with Master degrees in English before... <snip>

So what you have been trying to tell us ignorant folk that there are computers out there with more than one processor? And software written to work on a whole bunch of these computer processors at the same time, kinda like a multinozzle garden hose????


WOW whodathunkit!


Back to topic:

I think the only thing that will really push multicore chips into the performance mainstream market is for gaming software to take advantage of it. Using multi logical partitions and processors on a mainframe doing heavy database crunching is one thing, but finding a sub-1000 dollar market is still going to be tough.
 
nam-ng said:
According to the above bolded words from your great personal expertise, you'd already known 1/3-1/2 of the CPU to be idle, why don't you tell me?

I think a CPU should always strive to provide the highest level of performance it can.

However, since you said:
A real efficient Processor and truly well designed had not the time to be 2 fractional virtual processors; it is busy enough having sufficient time and internal resources just to be an efficient CPU.

I was merely wondering why you thought 30, 40, even 50% of AMD's available pipeline 'slots' going empty wasn't 'sufficient' to warrant using SMT.

[Just some friendly advice; don't take shots at my spelling or grammar with sentence structure like that. If I'm rambling to the point of complete incoherence, let me know, I’ll restate it. It’s really a petty attack when done to undermine the merits of a post]


EV6 was NUMA and capable of 4 simultaneous data streams for 4 SMT concurrently to QUAD-PROCESSORS.

EV6, 21264 was not SMT. It used the same quad block design connected over a shared bus that Alpha was fond of, but it was 1 thread per CPU at a time.
EV8, 21464 was the first CPU to support on die SMT.

You missed the most basic fundamental point completely, "NON-UNIFORM-ACCESS" is everything. What the fabric media was comprised of had nothing to do with functionality but relative performance, be it local memory, HyperTransport, PCI-E, or Ethernet... "NON-UNIFORM-ACCESS" is what made SMT possible natively in hardware.

Classical Intel UMA can only do "sequential one-at-a time data streams", special hacks required for multiple simultaneous data streams in SMT.

****************************************

For UMA, the HyperThreading hack is a hack on top of a hack, first... a kinda broken (pathetic efficiency) processor design required, which allowed left over unused resources for the hack of 2 virtual fragments of a real processor instead of a whole one, each with their own necessary data stream.

Next the programmers hacked the 2 data streams into one data chunk then performed "sequential one-at-a time data streams" to both virtual fragments becaused UMA is incapable of multiple simultaneous data streams.

That above is the simplified summarization of HyperThreading with UMA architecture.

That’s your argument for calling it a hack?
The external CPU communications are merged into a single stream.
You’ll excuse me if I say, no shit Sherlock.

Of course allowing both threads to access external resources is best, it doesn’t mean having a single stream is a bad idea.
The biggest use for SMT, in any architecture, is to allow a second thread to continue executing using on chip resources while the other thread is blocked, and accessing the external resources. That works, and works pretty well, in NUMA, SMP, and 1P UMA systems.

I think you’re missing the fundamental point I’m making. SMT makes CPUs, all modern CPUs, faster (in the right software environment). It would therefore, be a good feature to see on any CPU in that software environment.

Us graphic hardware guys have a big thing for parallelism, I myself don't have time to wait for anyone to invent it first... Ever heard of 3Dfx's SLI or nVIDIA's SLI?

Obviously I’ve heard of SLI, I was completely unaware it used SMT.
It makes sense there were two threads managed by the driver, one for each card, but I don’t see them being executed in parallel with 1 or dual cards installed.
The only time SMT was used with Voodoo cards was ‘single pass multi-texturing’ that I can recall.
If they did, that’s quite interesting, I’d love to see some technical information on that, but have yet to turn up anything of use. Got any hints?

Link to something of mine? With the huge number of low-life and ignorant dumbshit fanatics on the net? With some of them pretended to be moderators and web-masters? I connect to this forum through the minimum of 3 anonymous proxys after I met a couple of those low-lifes a while back.

Maybe if you weren’t so, I don’t know, abrasive, on the net those meetings wouldn’t have ended with strung up on a coat hook by your underwear…
But surely some website somewhere has a tidbit about one of your ground breaking products with SMT that wouldn’t compromise your personal safety…

yep, I had heard "Multisampling hardware didn't exist before some OpenGL guys invented it in '92" from a dumbshit expert, in nov '99 I described an 18 years old "Temporal multisampling FSAA" implementation to the graphic experts at Beyond3D forum.

BTW... The graphic experts at beyond3D forum didn't know about "Temporal FSAA" existence either in NOV'99.

You’re saying because some people on the Internet didn’t know what they were talking about, SMT existed before it was first proposed in 1995?
Hold, on. I’ve just got to go drive a freight train through the hole in that logic.
 
FreiDOg said:
I think a CPU should always strive to provide the highest level of performance it can.

However, since you said:

I was merely wondering why you thought 30, 40, even 50% of AMD's available pipeline 'slots' going empty

wasn't 'sufficient' to warrant using SMT.
It is not my thought, it is yours. You brought them up, You said them and made them out to be mine... I had seen
that superior perfect English technique previously use by a "born to be a glib ignorant stupid fuck" many many times.

[Just some friendly advice; don't take shots at my spelling or grammar with sentence structure like that. If

I'm rambling to the point of complete incoherence, let me know, I’ll restate it. It’s really a petty attack when

done to undermine the merits of a post]
I never had an English class, never even once read an English grammar text book. Once upon a time if I had not

needed to read some English technical documentations, I would not have bothered with it. At the time there was no

"English for ignorant dumbshits in less than 21 days" book writen in my native tounge, my "me Tarzan, you Jane"

English learning method isn't optimum for shots.

If I read 10 more English specific text books by next week, do you suppose I get to be an English expert or a

wanna be English expert?

That’s your argument for calling it a hack?
The external CPU communications are merged into a single stream.
You’ll excuse me if I say, no shit Sherlock.
If you wanted ignorant dumbshit arguments you talked to the wrong guy. And no that's not all, I'd always left a

minor but fundamental FACTfor the subject topic expert I conversed with to fill in.

Of course allowing both threads to access external resources is best, it doesn’t mean having a single stream

is a bad idea.
Not a bad idea? Let me try out the "SMT for dumbshits" version instead...

It's liked 2 persons 2 destinations and 2 cars <- VS -> 2 persons 2 destinations and 1 car, it is still workable

if time is not of the essence
.

How about 8 persons 8 destinations and 8 cars <- VS -> 8 persons 8 destinations and 1 car? You would need a

miraculous hack for the latter case when time is of the essence.

The biggest use for SMT, in any architecture, is to allow a second thread to continue executing using on chip

resources while the other thread is blocked, and accessing the external resources. That works, and works pretty

well, in NUMA, SMP, and 1P UMA systems.

I think you’re missing the fundamental point I’m making. SMT makes CPUs, all modern CPUs, faster (in the right

software environment). It would therefore, be a good feature to see on any CPU in that software environment.
I had not missed your fundamental point, I had avoided making a comment upon it and skipped others previously as it

often translated to be "abrasive".

Maybe if you weren’t so, I don’t know, abrasive, on the net those meetings wouldn’t have ended with strung up

on a coat hook by your underwear…
There is simply no lack for Low-life and ignorant dumbshit fanatics since time memorial, have you not heard - the

meeks shall inherit the earth?

But surely some website somewhere has a tidbit about one of your ground breaking products with SMT that

wouldn’t compromise your personal safety…
"your ground breaking products"? I'm not a salesman, not a journalist, don't work in PR, don't have the show-off

hangups, period.

I'm in this thread because someone ask - AMD HyperThreading?
You’re saying because some people on the Internet didn’t know what they were talking about, SMT existed

before it was first proposed in 1995?
Hold, on. I’ve just got to go drive a freight train through the hole in that logic.

You can do whatever you please.
 
apHytHiaTe said:
HyperThreading is probably a patented Intel (*cough* marketing term *cough*) technology.


as its been said AMD and IBM both have patents on the technology with AMDs being granted in the early 90s. Besides that anyways, it dosnt really matter who created it as Intel and AMD both have cross lisence agreements that cover issues like this. They both have way too many patents on various things to do business without one. This is also why AMD was able to support SSE and Intel was able to support AMD64 and so forth.
 
Back
Top