More Than Two Threads Per Core?

eumyang

n00b
Joined
Jul 20, 2011
Messages
43
hi guys, i want to upgrade for quite a while now but i can't see any 8core intels with 16threads ...
This thread got me curious about something else. For all of the Intel Core and Xeon CPUs that have HT, it is always 2 threads per core, correct? Will Intel increase the number of threads per core, to say 4? (If I am not mistaken, Power7 already has this, up to 8 cores with 4 threads per core, for a total of up to 32 threads :eek:). I don't know, an Intel quad core chip with 4 threads per core (16 total) has a nice ring to it. No, I really don't have a need for this hypothetical chip. Just curious is all. :D
 
I do see Intel increasing the performance of the virtual cores by increasing the amount of execution ports in each core and increasing the functionality of the execution ports. I believe this is one benefit to widening the execution engine.

http://www.anandtech.com/show/6355/intels-haswell-architecture/8

However in my opinion it will be a long time before Intel goes more than 2 threads per core.
 
As my benchmark shown, 2 threads per core are basically useless. They have negligible benefits. Any application with proper scaling, that assumes all threads are mapped to real cores, runs basically slightly faster than at half speed.

You seen what problems Bulldozer had when OS didn't mapped threads to free core block first. It assumed any core is the same as another, and applications that used FP tanked. It's similar to a properly made multithreaded application. What you didn't expect HT, and your application runs at half speed? What you expected 2 theaded HT, but didn't thought that Intel would sink even lower, thus your application runs at half speed even after you scaled it for HT? Well stuff like that would happen to competent SW developers/researchers.
 
As my benchmark shown, 2 threads per core are basically useless. They have negligible benefits. Any application with proper scaling, that assumes all threads are mapped to real cores, runs basically slightly faster than at half speed.

You were already told many times that your benchmark was flawed and that HT does improve performance in many (but not all) situations.
 
As it becomes more "standard" for all applications to use 8+ threads, I could see intel going this route at some point in the future.
 
The problem of increase the numer of threads per core is the consecuent increase of temp.. An intel i5 its far away coolest chip than an i7 only for the fact of HT. More threads will require an even powerfull FPU to handle more virtual cores with lower temps... Maybe in a future when can be possible handle better temps with HT intel implement a 4 thread per core chip.. Same that happens with the pentium 4 HT in their time, they stoped the implement of HT for the problems of high temps until was more possible to reduce the problem in the first i7.

Also for the guy who said 2 threads per core its useless, yes bud, intel goes totally wrong and make and useless i7 family with HT, with negligible increase of performance.. Bud I need to try the herb you are smoking...
 
As my benchmark shown, 2 threads per core are basically useless. They have negligible benefits. Any application with proper scaling, that assumes all threads are mapped to real cores, runs basically slightly faster than at half speed.

You seen what problems Bulldozer had when OS didn't mapped threads to free core block first. It assumed any core is the same as another, and applications that used FP tanked. It's similar to a properly made multithreaded application. What you didn't expect HT, and your application runs at half speed? What you expected 2 theaded HT, but didn't thought that Intel would sink even lower, thus your application runs at half speed even after you scaled it for HT? Well stuff like that would happen to competent SW developers/researchers.

Your benchmark was flawed as already stated. I have a prime number program that I wrote from the ground up to specifically take advantage of myultiple CPU cores as well as HT. It uses absolutely no locks as each thread only writes to a specific part of the data set. About 20-25% speed increase is easily seen when programmed for correctly.

As it becomes more "standard" for all applications to use 8+ threads, I could see intel going this route at some point in the future.

Many applications can't do this.

A lot of programs can't.. but I want a CPU that has 2^32 - 1 threads.. even better if it had that many physical cores. Purely to see how my program would run on it since it is coded to support that many threads. 2^64 - 1 thread support would be a simple 5 minute change to the code.
 
Many applications can't do this.

Currently this is very true, there will always be programs that don't require multiple threads. I was referring to many years in the future. It will be a long while before the average computer user would need more then 4 cores/threads. But one day that time will come, I have no clue when that will be though. But I'll be waiting :D
 
Whether HT is of any use depends highly on workload. It can yield from -10% up to maybe 30% of performance increase, although a decrease is very unlikely. HT basically allows a core to do something while its other thread waits for data from RAM, that would otherwise completely stall the core. Hence the highest benefit is achieved on loads that work on a huge memory footprint with memory spread over multiple NUMA nodes with high latency paths. I would expect that doubling the number of threads per core yields less than 10% increase, but only for highly threaded workloads.
 
I would expect that doubling the number of threads per core yields less than 10% increase, but only for highly threaded workloads.

Without widening the execution engine (adding more execution ports ...) there will be little to gain from adding additional virtual cores. However with additional over provisioning of each real core there is performance to gain.
 
Currently this is very true, there will always be programs that don't require multiple threads. I was referring to many years in the future. It will be a long while before the average computer user would need more then 4 cores/threads. But one day that time will come, I have no clue when that will be though. But I'll be waiting :D

Well i think that come someday but still left several years for that... Even games today are hard to use that amount of cores/threads pretty optimized way, normally a game take 1 thread/core per activity and is hard passed of the 3 cores/threads, by example 1 for sound, 1 video, 1 physics, after this point become every time harder to add suport for more core/threads, So unless there is a major change in the compilers we are going to be this continue to play out for some type to come.
 
Increasing the number of threads per core may not give much improvement. There are only so many execution units, and partitioning speculative execution and other resources between more threads gives diminishing returns, or may possibly even hurt performance. However, under certain workloads (lots of threads, each with low execution average resources usage due to memory stalls or whatever) it may help tremendously (see: GPGPU). There is no reason why Intel wouldn't increase threads/core if it would help in general cases.

As my benchmark shown, 2 threads per core are basically useless.
Read up on why SPEC is a good standard benchmark and maybe you can improve your Java benchmark, or at least relate what cases in generally available software your benchmark is modeling. ;)
 
Last edited:
You were already told many times that your benchmark was flawed and that HT does improve performance in many (but not all) situations.
Let's say it this way: Do you have something better that you can use? YES: Use it and show results. NO: From my experience in the industry there is a simple rule: if it's the only available stuff and nobody is willing to create something better in his spare time, it must suffice.

It's not like developers are obliged to make inefficient applications which would benefit from HT the most.

Your benchmark was flawed as already stated. I have a prime number program that I wrote from the ground up to specifically take advantage of myultiple CPU cores as well as HT. It uses absolutely no locks as each thread only writes to a specific part of the data set. About 20-25% speed increase is easily seen when programmed for correctly.
From what I seen your program does quite a lot of memory trashing, and is bandwidth limited. Have you did your own implementation of binary array to remove unpredictable behavior?

A lot of programs can't.. but I want a CPU that has 2^32 - 1 threads.. even better if it had that many physical cores. Purely to see how my program would run on it since it is coded to support that many threads. 2^64 - 1 thread support would be a simple 5 minute change to the code.
Probably poorly, when I did some real application, 30 cores with 110 GB/s was fine, 60 still reasonable, but felt like it's starting to become clogged, 150 threads were out of question because they hit bandwidth limit.

Read up on why SPEC is a good standard benchmark and maybe you can improve your Java benchmark, or at least relate what cases in generally available software your benchmark is modeling. ;)
Well my benchmark tests behavior I want, it nearly exactly mimics behavior of certain application.

I hate money, thus learning "let's ask 200$ for our benchmark" is not my forte. ~_^

(There is some flaw, it can have a small problem with some timers, but considering APIC/HPET is OS stuff, I can't do anything about that, and the problem would appear only occasionally, thought it looks like it's inside results. I thought it will normalize itself when I'd run it for long enough, but considering the critical case is 250x longer, running the benchmark for 5 minutes, and taking only obviously valid run isn't feasible. It will not be a simple lightweight benchmark anymore.)
 
Last edited:
Back
Top