Dual SMP Linux clients on a quad

SmokeRngs

[H]ard|DCer of the Month - April 2008
Joined
Aug 9, 2001
Messages
18,227
Myself and Bill were having a discussion about trying this.

Does anyone here have a Linux quad SMP machine that is not a production machine and would want to do a little test for us?

Since we know the SMP client does not scale 100% with twice as many cores (dual to quad) we were wondering if there would be a PPD difference with running a Linux VM on top of another Linux install. Setup the SMP client on the host OS as normal and then setup the SMP client under the VM Linux install.

I know there is a performance hit by doing so, but we were wondering if the performance hit is more or less than the performance hit with the bad scaling of dual to quad cores.

If someone here has a box that would be ideal for this test, I would appreciate if you would volunteer. If you have no experience with VMware under Linux I could probably help you a bit, as I did that install on two SUSE 10.2 machines this weekend and have them running just fine.

I only ask because I only have a dual core. This wouldn't actually be necessary if someone wants to donate a Q6600 G0 to me. At that point I would do all the testing myself. Of course, for my time and effort spent in doing it, I would naturally expect to keep the quad core for my own uses. Although I guess I could trade my E6400 for it. :D

Seriously, I would like to know if someone would be interested in helping out on this project.

 
http://www.hardforum.com/showthread.php?t=1217862

I wanted to do 2xSMP+Windows, but no one reminded me and I forgot. (Short-term memory loss sucks. :()
Oh, yeah- and I had to sell off the other quad for financial reasons.

I'll give it a run next weekend, if I remember. I don't have time during the week, but I'm still eager to try it. Just waiting for an order of tubing to come in from Petra's, but I can probably throw in my 9500 or Ninja for a few days.
 
Since the Linux SMP client seems to be much more efficient I was wanting to avoid Windows entirely. A normal Linux install with the SMP client running on it as well as a Linux install under VMware also running the SMP client.

It looks like you are wanting to run two VM images on a host Windows machine. There is a rather large performance hit when doing that which I'm hoping will not be present or as large with a nix/nix setup.

I apologize if I am misunderstanding your original post.

 
A virtual OS inside VMware should perform identically regardless of the host OS; otherwise the implementation of Vanderpool/Pacifica is flawed.

I also think your idea of running *Nix-native PLUS *Nix-VM is a little bit premature- the host OS will see all four cores, and will try to occupy all four of them. There will be nothing left over for the VMware client, creating a conflict between the native environment and the VMware environment.
 
I thought Zim did some tests with this. I believe he got a slight boost in PPD running 3 clients on a dual quad box with linux, but not much of one. It's in some older post around here...
 
Well, I don't have the time tonight to mess with it, but if I have a chance tomorrow, I'll install another client under my Ubuntu VM and see what happens. I haven't looked into VMware a whole lot but you might be able to put priority to two cores to that VM which would probably take care of the problem to a certain extent.

I realize the client on the host OS will still see 4 cores, but I'm hoping the VM will be able to take over most of the priority of two of them. As I said, it's all theory at this point. If I had a quad, it wouldn't be theory as I would have already tried it. I just don't have a quad to work with and I can't afford one anytime soon.

My manw[H]oring plan hasn't worked out lately.

 
I thought Zim did some tests with this. I believe he got a slight boost in PPD running 3 clients on a dual quad box with linux, but not much of one. It's in some older post around here...

I think it all depends on being able to set affinity to certain clients to certain cores. I have not looked into this yet. I'm still a Linux noob. I just know what I'm proposing I have the ability to setup, just not the hardware to do it.

 
I do this with Windows and have no problems so I assume if you do this in Ubuntu it would be even more awesome. I prefer this setup to using two instances of VMWare/Ubuntu on both machines because both of my computers have remained responsive and usable. Also, all of their RAM is not tied up in virtual machines. On my setups, the VMware process is assigned to Cores 2-3 while the Windows SMP processes are set to Cores 0-3. There is no way to assign the Windows instances to specific cores because after each work unit each process is regenerated so affinity settings are reset. I get 100% CPU utilization with this method. There's about a 15% hit in the Windows versions, but it's a worthy tradeoff to have usable computers.

aldamonsmpwindowsubuntu.gif
 
I run a VM machine with 2 Ubuntu64 virtual installations, each running SMP. I set the affinity in Windows Vista (it's the host since I also game and do other stuff, hence no Ubuntu for me). I average 3400 ppd, which seems a bit low compared to others but I just realized I may have disabled VT in the bios or something, since it's not as fast as Aldamon. aldamon, I agree there is a performance hit with 2 VM running since I have just 2 Gb and it's swapping like mad each time I start or stop both VM so I'll try your way.

I will try a different experiment tonight by stopping 1 VM then run a Windows SMP client on the host to see if this is more efficient.

I also ran a console client to feed off stray cycles for a average of 150 ppd.

 
aldamon, I agree there is a performance hit with 2 VM running since I have just 2 Gb and it's swapping like mad each time I start or stop both VM so I'll try your way.

Yeah, I think 4GB would be optimal for two VMware setups. Both instances could have 768 MB comfortably. Doing that with 2GB would be pushing it if you expect to use the machine.




 
I run a VM machine with 2 Ubuntu64 virtual installations, each running SMP. I set the affinity in Windows Vista (it's the host since I also game and do other stuff, hence no Ubuntu for me). I average 3400 ppd, which seems a bit low compared to others but I just realized I may have disabled VT in the bios or something, since it's not as fast as Aldamon. aldamon, I agree there is a performance hit with 2 VM running since I have just 2 Gb and it's swapping like mad each time I start or stop both VM so I'll try your way.

I will try a different experiment tonight by stopping 1 VM then run a Windows SMP client on the host to see if this is more efficient.

I also ran a console client to feed off stray cycles for a average of 150 ppd.


Update on this? Very interested in the results.
 
Update on this? Very interested in the results.

No yet, sorry... I got a call from a poor friend pulling hair with a new router installation so I went to help him. I will try tonight :) However, I changed the affinity of the 2 instances to use 1-3 and 2-4 to see if it run faster since windows might not count the processors the way we think.

 
Xilikon, I didnt't realize that you were on Vista. I think switching to Windows SMP + Ubuntu SMP will give you a more responsive machine, but I don't think it will increase PPD. I think Vista is the reason why your PPD is slightly lower than my file server even though yours is 100 MHz faster. I'm on XP.

 
There is no way to assign the Windows instances to specific cores because after each work unit each process is regenerated so affinity settings are reset.
Have you tried setting affinity with a third-party app like SMP Seesaw?
 
I think it all depends on being able to set affinity to certain clients to certain cores. I have not looked into this yet. I'm still a Linux noob. I just know what I'm proposing I have the ability to setup, just not the hardware to do it.


I use:
#folding
cd /fah1
taskset -c 0,2,4,6 ./fah5 -verbosity 9 -forceasm &

#folding
cd /fah2
taskset -c 1,3,5,7 ./fah5 -verbosity 9 -forceasm &

find out your cpu #'s in

/proc/cpuinfo
 
However, I changed the affinity of the 2 instances to use 1-3 and 2-4 to see if it run faster since windows might not count the processors the way we think.


I realized I had a typo in my first post. Ubuntu has Cores 2-3 or the second set of cores (as labeled in Affinity) on my rigs and Windows SMP has access to Cores 0-3 or all 4 cores. Sorry if this caused any confusion.





 
I understand the underlying premise in this thread, but Instead of configuring two clients over virtual machines, wouldn't it be simpler to run two clients on a single OS instead to recover stray cycles?
 
I understand the underlying premise in this thread, but Instead of configuring two clients over virtual machines, wouldn't it be simpler to run two clients on a single OS instead to recover stray cycles?

It doesn't work. That's why we're doing this.


 
It doesn't work. That's why we're doing this.
Since configuring 2 SMP clients on my system (finally), FahSpy has reported an increase in PPD over one client. The reported increase is corroborated by the reduced time/frame when divided by two (assuming same WU type).

Maybe configuring a virtual machine is overall superior in efficiency, but there are drawbacks as posted by some, such as the added overhead and complexity. I wouldn't bother with a machine that has fewer than 8 cores unless the gains are very significant in the order of hundreds PPD increase. There are other ways to optimize efficiency.
 
Maybe configuring a virtual machine is overall superior in efficiency, but there are drawbacks as posted by some, such as the added overhead and complexity. I wouldn't bother with a machine that has fewer than 8 cores unless the gains are very significant in the order of hundreds PPD increase. There are other ways to optimize efficiency.

I can't understand how you could read this thread and come to that conclusion. This method isolates an instance of SMP on each set of cores on Kentsfield and prevents them from communicating over the FSB. That is the source of Kentsfield's inefficiency. Once we have real quad cores, this won't be necessary.

My two quads used to produce ~5,900 PPD on 2653. Now they're producing over 8,100 PPD. Don't you agree that an increase of over 1,000 PPD on each machine is significant and worth "bothering" for? I have added the production of an overclocked C2D for free.


 
My original premise for starting this thread was a way to maximize the efficiency of folding only quad core boxen. Unless you have a lot of RAM to go around and no everyday use for it, it's not an efficient way to use a production box. However, for a folding only box, the gains could be significant. Basically, any additional PPD (also meaning work units done) in a folding only box is a significant gain.

I realize running VMs can have a negative impact on everyday performance. Trust me, I'm running a WinXP VM on a SUSE 10.2 PIII 800 box with 512 meg of RAM. However, that box is a fileserver and not much more so the performance disadvantages for everyday use doesn't come into play.

I know there are people out there running C2Q boxen just for folding and if this helps them, then it's good. Also, scaling cores is not 100% no matter if the quad is true quad or not. On the same WU a quad core at the same speed as a dual core does not get a 100% increase in output. It doesn't matter if it's a "true" quad core or not. What I have proposed would take care of that scaling inefficiency by being able to run two SMP clients with each client thinking there is only a dual core machine. Thus, you would get the full advantage of a quad core rig minus the inefficiency of running VMs. I was trying to find out what the efficiency loss with the VMs is worth it. I know there is a performance hit running a Linux VM with a Windows host. I was wanting to determine if the performance hit was lessened by running Linux VMs on a Linux host.

 
I can't understand how you could read this thread and come to that conclusion. This method isolates an instance of SMP on each set of cores on Kentsfield and prevents them from communicating over the FSB. That is the source of Kentsfield's inefficiency. Once we have real quad cores, this won't be necessary.

My two quads used to produce ~5,900 PPD on 2653. Now they're producing over 8,100 PPD. Don't you agree that an increase of over 1,000 PPD on each machine is significant and worth "bothering" for?
Yes, of course, but I wasn't aware that the gains were 'that' significant. The increase on my machine with two clients is much more modest, but I'm running 2 dual core Opterons, not Kentsfields. Opterons don't have the inherent FSB limitation and thus, scale better. No need to get upset over a misunderstanding.
 
Ok, another question.
I noticed that going from dual core to single core to run the GPU client on the other one only slows the SMP client by around 35-40%.
Would running 4 SMP clients on a quad core overload the memory bus ?
If not then running four SMP clients on a Quad core could well double the PpD of running a single client.

Schedule something like "SMP Seesaw" to run every hour to keep the clients locked to their own core and see how it works.
Stanford wont like it because you'll take longer to return work but you may get double PpD per box.
But as long as you get the work back before the first deadline then you wont be harming the science much.

Luck ........... :D
 
Ok, another question. I noticed that going from dual core to single core to run the GPU client on the other one only slows the SMP client by around 35-40%. Would running 4 SMP clients on a quad core overload the memory bus ? If not then running four SMP clients on a Quad core could well double the PpD of running a single client.

Schedule something like "SMP Seesaw" to run every hour to keep the clients locked to their own core and see how it works. Stanford wont like it because you'll take longer to return work but you may get double PpD per box. But as long as you get the work back before the first deadline then you wont be harming the science much.
Four clients on one quad? And, make it before the first deadline?? Can a Kentsfield accomplish that kind of workload?
 
Yes, of course, but I wasn't aware that the gains were 'that' significant. The increase on my machine with two clients is much more modest, but I'm running 2 dual core Opterons, not Kentsfields. Opterons don't have the inherent FSB limitation and thus, scale better. No need to get upset over a misunderstanding.

This thread is about Intel quads. I'm not upset. I just not sure why you're posting in this thread when you don't have what we're talking about. Your results have nothing in common with what we're discussing.


 
This thread is about Intel quads. I'm not upset. I just not sure why you're posting in this thread when you don't have what we're talking about.
I have a dual Clovertown system that hasn't been put online yet and I'm looking for the optimum configuration of clients and OS. Otherwise I wouldn't have read beyond the first post.

Your results have nothing in common with what we're discussing.
Maybe not, but that doesn't make it entirely OT because the question was posed in a comparative context, in the quest to seek further knowledge of the subject.
 
Because I have a dual Clovertown system that hasn't been put online yet and I'm looking for the optimum configuration of clients and OS. Otherwise I wouldn't have read beyond the first post.

Dual Clovertown is dual quads, right? Your best best would be to use 4 instances of VMWare/Ubuntu to isolate each set of cores with one instance of SMP. If you max out the RAM, this shouldn't be a problem.

http://icrontic.com/articles/quad_core_folding_guide

For a dual socket, dual Quad core machine, there is the possibility of creating four VMs with two cores assigned to each.

You'd have monster PPD there even without overclocking.

 
Dual Clovertown is dual quads, right? Your best best would be to use 4 instances of VMWare/Ubuntu to isolate each set of cores with one instance of SMP. If you max out the RAM, this shouldn't be a problem.

http://icrontic.com/articles/quad_core_folding_guide

For a dual socket, dual Quad core machine, there is the possibility of creating four VMs with two cores assigned to each.

You'd have monster PPD there even without overclocking.
Not really because they're only 2GHz (E5310 OC). I'm not expecting to double my production, just a nice evening out of my current production. If I can later upgrade to Harpertowns, yes, that would be much better.

Thanks for the link.
 
Don't be modest. That's a great machine and greater if you get 4 VMwares going.


 
Back
Top