intel multi core and Multi threading

Joined
Aug 26, 2002
Messages
584
In the Anandtech review of the new dual core Extreme Edition he suggests that some applications are multithreaded and some are single threaded (the performance implications of this are obvious). However, he suggests, for instance, that web browsers and media players are single threaded. however, in windows tasks manager when I enable tick "threads" in the select columns option, it shows 14 threads for internet explorer and 15-25 for windows media player. Just about every app is mulithtreaded according to Task Manager. What gives, anyone?
 
Just because something is multithreaded doesn't mean it is created with distributed processing or SMP in mind. A lot of threads simply do things like control the UI, while the processing work is still done in a single thread.

There are folks with much more multithreaded coding experience than I who could probably explain it better.

Don't forget that sometimes, threads sit idle, waiting for data.
 
That's right... while technically any app that uses more than one thread is 'multithreaded', the discussion with regard to multicore/HyperThreading is about having the actual workload of the application multithreaded.
Usually applications do all the work in just one thread, and any extra threads are just 'helper' threads that simply lie idle until some kind of event occurs.
Heck, when you boot Windows, you already have hundreds of threads open, even though the PC doesn't do anything...

For webbrowsers and mediaplayers it's not very useful to even consider multithreading because they are so light on today's CPUs that it's not worth the trouble.
You need something with a big workload, like 3d rendering... then you can think about using more than one thread to process the workload. 3d rendering happens to be one of those tasks that is very suitable for executing in parallel. Think of the huge renderfarms at Pixar etc. A dual core processor could do the same thing, just on a much smaller scale.
 
I will inject my experience. Do not trust anything Anand has to say about Intel and ATi, and don't trust Tom Pabst for anything to do with AMD. These two people are severe !!!!!!s and I will never read their reviews. They have proven themselves wrong on more than one occasion. Don't ask me to point out where now. I don't feel like getting into it. I don't like wasting much time on them. Tom has even gone so far as to pull his staff's reviews, edit them and put his own name on them. When he is proven wrong he pulls the review.

To be honest I am MUCH happier coming to this website and reading Kyle's reviews. That's what brought me to this forum to begin with. The most amazing HSF reviews I have ever seen were on this site. I still want to know what his special copy of UT2K3 is though.
 
Umm

Sorry to burst your bubble bud, but Anandtech is probably the best review site that their is. This is noted hands down from numeros people and sources. We still all do love our Hard OCP tho :)

However you are correct about toms hardware, stay away from that place.
 
I'd agree that Anandtech is a great review site, but this kind of oversimplification is really annoying. I know that every article can't explain everything, but I wish there was a way to find a happier middle ground.

Web browsers are very naturally multithreaded. You want one thread waiting around (or processing) events in the UI and repainting. Then, you want a couple of threads for actually making connections and downloading things. The W3C limits you to two connections to the same server at the same time, so more than two either means you're not compliant or you're working more than one site concurrently. Then, you'll want to consider adding threads in addition to your core UI threa dto play animated GIFs, media content, and so on.

Does the browser support multiple open windows or tabs? Background downloadnig? More threads might be justified for those features.

The decision to use a thread sometimes doesn't have as much to do with workload as it does with the best way to design the application. If you have one thread waiting on some event by blocking, that thread is spoken for. Making a thread poll ends up burning more CPU compared to creating a thread which sleeps most of the time.

Scali said:
You need something with a big workload, like 3d rendering... then you can think about using more than one thread to process the workload. 3d rendering happens to be one of those tasks that is very suitable for executing in parallel. Think of the huge renderfarms at Pixar etc. A dual core processor could do the same thing, just on a much smaller scale.

This is an interesting topic. I don't think that having multiple threads for 3d animation rendering or video compression is natrual just because it's a "big workload". Maybe you have a thread taking care of the actual disk I/O, but putting more than one thread onto the actual rendering work doesn't seem very natural to me.

Why? Because the work might be serialized. If you have a thread that's creating one frame from scratch, all by itself, then you're probably in good shape. Create two threads and have them fight over memory bandwidth to create two frames at the same time. At some point, a surprisngly low point, the limit is going to be memory bandwidth.

For video rendering, there's the issue of keyframes. If a thread is taking a source frame and compressing it, it's using information in the previous frame to do the compression. This frame is described by the bits that have changed since the previous frame. You can't get to work on the next frame until this frame is done... until you have a key frame, which is all new content by itself, without referencing the previous frame.

Could you write a video encoder that worked one thread against frames starting at a key frame, and then worked another set of frames elsewhere in the stream starting at the next key frame? Probably, but then you have a bit of a problem using memory until you can assemble the clips again. And you want another thread for that, right? So then you might be heading towards the CPU-to-memory bandwidth limitation again.

I'm afraid that dual-core processors are going to end up with the same memory bandwidth limitations that dual-processor motherboards already have.

Companies that use render farms end up winning the battle because so much CPU goes into rendering that they don't care about assembling the stream later on. They're building movies that are more than 100 minutes long, not 35-second videos of their friends crashing their skateboards into the sides of police cars.
 
mikeblas said:
This is an interesting topic. I don't think that having multiple threads for 3d animation rendering or video compression is natrual just because it's a "big workload". Maybe you have a thread taking care of the actual disk I/O, but putting more than one thread onto the actual rendering work doesn't seem very natural to me.

I think you misunderstood. I didn't mean that big workloads would automatically have multiple threads... I meant to say that it makes sense to see where you can split the workload for multiple threads/cores, when you're developing an app with a big workload. For a small workload it's not worth the trouble. If something is done in only a few ms anyway, it's not worth it to invest a lot of time into redesigning it for multiple cores... chances are that the overhead will be about as large as the workload anyway... besides, you're not going to notice the difference, even if you could get it efficient.

I'm afraid that dual-core processors are going to end up with the same memory bandwidth limitations that dual-processor motherboards already have.

I don't think it will be that bad... 3d rendering for example, is not all that memory-intensive. You will generally spend most time in the rasterization/shading parts, while you only need memory too look up geometry and textures.
I suppose video encoding will need a bit more bandwidth, but mainly because the CPUs have gotten so fast these days... video encoding used to be lots of grunt and not all that much memory bandwidth. It's just a result of memory not keeping up with CPU developments, I suppose.

Ofcourse, things that were already memory-limited, will only be more memory limited... But not everything is memory-limited, that's what I mean :)

Companies that use render farms end up winning the battle because so much CPU goes into rendering that they don't care about assembling the stream later on. They're building movies that are more than 100 minutes long, not 35-second videos of their friends crashing their skateboards into the sides of police cars.

I think it is true by definition that a lot more CPU goes into rendering than reassembling the stream. Besides, reassembling the stream can be really simple. It just depends on how you split up the workload in the first place.
Look at 3d-accelerators... There are basically three ways in which they split the workload today... First there are the pixel pipelines. Since each pixel is completely independent of every other pixel in a triangle rasterizer, they can all be rendered at the same time.
Then there is multi-GPU, which can be done in two ways... You can split up the screen into as many parts as you have GPUs, where each part is again independent of every other part... Or you can split up the frames. Let every nth frame be rendered by every nth GPU. This is usually the best way... Firstly you can be sure that no redundant work has to be done, because no geometry will span two or more GPUs... Secondly, the workload is normally balanced in the best way, because the difference between two successive frames is usually minor.

Anyway, the point is that multiprocessing doesn't have to be perfect. As long as your second CPU can at least do *something*, you get gain. In fact, in a lot of situations, multiprocessing can't be perfect... in 3d rendering it surely isn't... not even with the 3d accelerators themselves. But even though they're less efficient, they're still much faster than any single solution could ever be... Reminds me of my skinned shadowvolumes on CPU vs shaders... even though the shaders have to do three times the workload, because they can't store skinned vertices, so they have to reskin for every triangle... they're still much faster than a CPU, which can do it in a far more efficient way.
 
Back
Top