Hyperthreading Reality

Mayhem33

[H]ard|Gawd
Joined
Oct 12, 2004
Messages
1,204
I did a search for this topic here and found a lot of hits, was not sure which thread to add on too, so I started another.

I am not sure if everyone had seen this.

OC-AMD
Site Moderator


Posted: Sun Nov 28, 2004 8:40 pm Post subject: Hyperthreading or no Hyperthreading... please read.
________________________________________
Folks, for many the HT (HyperThreading) issue is one that they just don't understand. While Stanford allows you to run more than one instance of the Folding@Home client on your computer, you are not really helping them if the number of clients you are running is not equal to the number of REAL CPU's you actually have. While running two instances of the F@H client will return two work units more quickly than doing one at a time, that is not what is important. Returning one work unit 70% faster and starting the next generation of tests on the work unit, is what Stanford wants. Speed in working through each protein generation is very, very important. Sometimes quality is better than quantity. Quality in this case is work being returned more quickly. While it is well known that an HT CPU can run 2 work units because the OS treats it like two CPU's, and you can gain a possible 15-30 percent increase in points for that computer, it also means that each work unit is returned more slowly. In simple terms, if the project has 300 generations needed to test a theory on a protein model, and running two instances at once delays the return of work for 1 day each time, you end up with a 300 day delay. That translates to about a 10 month delay in examining your data for the final result. The bottom line is simply this: Run 1 instance for each CPU you have. An HT CPU is not two CPU's, it is one. Let's work to advance the science and spend a little less time worrying about the number of points you get.

This is a direct quote from Dr. Vijay Pande:
1) If you care primarily about points, running 2 procs on HT is still the best bet. We are grateful for all contributions and if people choose to run 2 procs on HT, our approach is that all contributions are welcome.

2) If you care about the science foremost and are interested in our recommendations, then do not run 2 procs on HT, but please just run one process. That won't be best for points, but is best for the science.

3) If your machine cannot make the deadlines, then one should run the timeless WUs.

I hope that clears these issues up and thanks to all for their contributions.

Thanks for listening and please do the right thing.
Larry

(copied from http://www.em-dc.com/ info gathered from folding-community.org forums)



Just something I saw a few days ago, forgot to post and found again.
Here is the link to the post there. post
 
That doesn't make any sense, though.

Say there's a 70% delay by using HT. 10 time units for one instance on a dedicated HT machine vs 17 time units for one instance on a split HT machine.

However, let's examine what really happens:
dedicated HT: 1 work unit / 10 time units
split HT: 2 work units / 17 time units -- clearly more efficient (1.176 work units / 10 time units).

Stanford has to wait one day more for unit #1, but they get unit #2 immediately with it.

Looks to me like this is a case of someone who simply doesn't understand how the computer is really processing his data. Naturally I've greatly simplified the numbers here, but unless HT somehow reduces the efficiency of a system, you come out ahead. Even running two instances on a standard single-CPU system should be no less measurably efficient than one instance.
 
I am not exactly sure either. I think that by only downloading one WU the other WU would be assigned to another box. Hence the one day delay. That is all I can figure it could be. Hard to say what the next box is getting that WU though, it could take more than that one day for the next box to chew it up.
 
Mayhem33 said:
I did a search for this topic here and found a lot of hits, was not sure which thread to add on too, so I started another.

I am not sure if everyone had seen this.

OC-AMD
Site Moderator


Posted: Sun Nov 28, 2004 8:40 pm Post subject: Hyperthreading or no Hyperthreading... please read.
________________________________________
Folks, for many the HT (HyperThreading) issue is one that they just don't understand. While Stanford allows you to run more than one instance of the Folding@Home client on your computer, you are not really helping them if the number of clients you are running is not equal to the number of REAL CPU's you actually have. While running two instances of the F@H client will return two work units more quickly than doing one at a time, that is not what is important. Returning one work unit 70% faster and starting the next generation of tests on the work unit, is what Stanford wants. Speed in working through each protein generation is very, very important. Sometimes quality is better than quantity. Quality in this case is work being returned more quickly. While it is well known that an HT CPU can run 2 work units because the OS treats it like two CPU's, and you can gain a possible 15-30 percent increase in points for that computer, it also means that each work unit is returned more slowly. In simple terms, if the project has 300 generations needed to test a theory on a protein model, and running two instances at once delays the return of work for 1 day each time, you end up with a 300 day delay. That translates to about a 10 month delay in examining your data for the final result. The bottom line is simply this: Run 1 instance for each CPU you have. An HT CPU is not two CPU's, it is one. Let's work to advance the science and spend a little less time worrying about the number of points you get.

This is a direct quote from Dr. Vijay Pande:
1) If you care primarily about points, running 2 procs on HT is still the best bet. We are grateful for all contributions and if people choose to run 2 procs on HT, our approach is that all contributions are welcome.

2) If you care about the science foremost and are interested in our recommendations, then do not run 2 procs on HT, but please just run one process. That won't be best for points, but is best for the science.

3) If your machine cannot make the deadlines, then one should run the timeless WUs.

I hope that clears these issues up and thanks to all for their contributions.

Thanks for listening and please do the right thing.
Larry

(copied from http://www.em-dc.com/ info gathered from folding-community.org forums)



Just something I saw a few days ago, forgot to post and found again.
Here is the link to the post there. post

For a supposedly bunch of smart people ,Stanford better take a look at the math. Would they rather I ran three 500 mgz machines and took the full two months they allow for a WU return or would they rather I give them two WU’s back, complete, in 3 days?

If finished units returned fast of such great importance which I have to assume they are, then we go back to my original argument from another thread that stated that even making a work unit that has a 2 month deadline was not a good idea. Way too many will be and are lost due to people turning off machines and producing errors.

If what you are posting is indeed what they think then I seriously question how much “science” Stanford is really accomplishing.

This doesn’t even qualify as “math” its mere arithmetic. How hard is that?
 
I realize it is hard to grasp, but this project works in a "serial mode" ...
Every work unit is dependant on the work unit previous to it.
Running two units at the same time slows the over-all progress...

When each of you who think the logic is incorrect also truely believes you know more than the guy running the project, that seems a little weird doesn't it?

I've already busted their chops over this issue.
It's up to them to fix the problem, but one should be aware of what they want.
It's up to you to decide what is more important, what they want you to do, or your points.
 
I don't have any machines with Hyperthreading so I don't know if this is a good idea or not, but what about running one instance set at 100% usage doing normal WUs and another instance set at, say, 10% usage doing timeless WU's? Would that provide a situation where Hyperthreading can still be taken advantage of to slowly trickle a timeless WU out as well as running a normal WU that would be completed in almost the same time as normal.


Just a random idea...
 
LPerry said:
I realize it is hard to grasp, but this project works in a "serial mode" ...
Every work unit is dependant on the work unit previous to it.
Running two units at the same time slows the over-all progress...

When each of you who think the logic is incorrect also truely believes you know more than the guy running the project, that seems a little weird doesn't it?
Also makes zero sense. If it's all serial, what good is DC? You can't parallelize a serial project. But, since DC is by nature massively parallelized, the serial argument falls flat.

For your latter point, that's why I clarified. Having a PhD does not confer the magical title of "expert" in any and every field. The guy is probably a genius in cancer research. He probably doesn't have a clue what L2 cache is. Is that a knock against him? Not at all. I'm just saying he's probably not qualified to discourse on what is and is not good about how computers work.

//edit:
I should point that the best evidence against the "serialization" is that no objection is made to true SMP machines. There is no fundamental difference in HT and SMP that will make SMP more efficient while making HT less efficient. It's just not true.
 
Let me add here that if I were in it for the points I’d be nuts. The point system as it stands is the biggest debacle ever dreamed up by anyone. It’s a total loss as far as incentive goes.

Case in point (no pun intended)

I currently stand at 815780 “Points” my total returned Work Units are 27796.

I was just passed by another folder with 818698 “Points” but only 6331 total Work Units.

Roughly one fifth the work for more points.

Where is the parity in that?

I totally agree with lomn75, there is NO serialization here at all.

The only thing I can think of that would allow me to believe VJ’s statements are:

He truly believes everyone is running the equivalent of their “test” machine, he has no clue how much processing power are in some people’s hands.

Nor does he appreciate it; to him it’s an entitlement.
 
I currently stand at 815780 “Points” my total returned Work Units are 27796.

I was just passed by another folder with 818698 “Points” but only 6331 total Work Units.

Roughly one fifth the work for more points.

Where is the parity in that?
Some work units take more processing time than others, thus they are worth more points. Why is that not fair? Sure it's hard to make the point system totally fair in terms of what points each WU earns you, but some are simply worth more.
 
mattg2k4 said:
Some work units take more processing time than others, thus they are worth more points. Why is that not fair? Sure it's hard to make the point system totally fair in terms of what points each WU earns you, but some are simply worth more.

Lets not go down that road...

As to HT it's a moot point for me as I'm an AMD folder. It just seems like an ill-considered comment for VJ to have even made. While perhaps the science of project #1 is delayed, the science of project #2 is advanced, no?

Anyhoo, the right thing is Folding. HT or no, fold on.
 
It's not a true serialization, but some steps can only be made once certian ones are completed, thus running one instance on each machine. If the data to fold is released in batches, and batch 2 needs data from parts of batch 1 to be truly effective, what happens if parts of batch 1 get delayed by people running two instances on a machine? It means that batch 2 can't be properly processed, and in some cases not even released. So, even if mathematically it may be better to run two instances on a PC with hyperthreading, logically, by project status, it may be better to let one instance run and hog all the power for itself. It's like with any other program. If program B needs parts of program A to be finished, why the hell should they be running at the same time? For some reason I'm thinking Vijay is right on this one, mainly because he has access to a lot more data than the rest of us do.
 
Shadowchild said:
It's not a true serialization, but some steps can only be made once certian ones are completed, thus running one instance on each machine.
Except, as I've already noted, this doesn't mesh with VJs assertion that true SMP machines are acceptable but HT machines are not.

If he claimed that any multi-instance machine was less efficient, he *might* have a point. A point that could be addressed with half-decent programming, but a point.

However, given a claimed distinction over a complete non-issue (remember, via the WinXP HAL, HT and SMP look identical to processes), this is an incorrect premise. If hyperthreading gives a raw performance boost (which it does), then two processes collectively crunch more data than one. Furthermore, since it's known that multiple processes on an SMP rig do boost WU throughput, it follows that the same applies to HT.

The thing I really want people to learn here is that a few letters in front of your name doesn't make you right all the time. PhDs are experts in their one or two fields, but past that, they're ordinary people. Cancer researchers are not, as a rule, qualified to analyze how microprocessors work. This guy is a chemist, not an electrical engineer or computer scientist.
 
I've spent nearly 4 years working in close contact with these people. I have busted them on several occasions when I knew they were wrong. Myself and others who are outside this project (most of us also have degrees in science, math, and engineering) have closely examined why he said this about HT machines running two instances. In no way do we just take their word for it. You who disagree with this because it doesn't seem logical are correct; On the surface it doesn't seem logical. The problem is, your understanding of how some of these work units are processed and looked at is flawed. It is difficult to explain to someone the problem in simple terms. But the problem is this: In order to gather good information in a short period of time, things need to be returned as quickly as possible so the next series can start. It's a race of sorts. Processes running in "Series" are compared to "Parallel" processes in other "Series" processes to verify and confirm information gathered. If information can't be verified until you turn in something this parallel series can't move forward. Speed, not quantity becomes all important. No one cares about two work units that arrive to the party late. While their information will be used at a later date, they didn't help make the next step.

Now if you're incapable of understanding what I just said then there isn't any explanation in the world that will make you understand this subtle problem they have. If you can't understand that a rapid return of a smaller quantity is more valuable to them than a slower return of a greater quantity it is hopeless even having a discussion. Remember, you aren't the only one folding these proteins. My 2 Ghz Duals running two instances of the client are more valuable to them than your 3 Ghz HT running two instances. I'm getting more work done than you are by far. You are returning silver when they want gold.

While many parts of this project run in parallel, pieces of it have to be serial.
Vijay would rather have ONE work unit returned TODAY than TWO units returned TOMORROW. As stupid as that may seem to you, it makes perfect sense if you understand what is really going on here. You might scoff at a degree someone has, but generally it shows who is really ignorant. Using your logic, one has to have a degree in electrical engineering in order to use a computer and write a math calculation that you don't understand.

lomn75 said:
Except, as I've already noted, this doesn't mesh with VJs assertion that true SMP machines are acceptable but HT machines are not.

If he claimed that any multi-instance machine was less efficient, he *might* have a point. A point that could be addressed with half-decent programming, but a point.

However, given a claimed distinction over a complete non-issue (remember, via the WinXP HAL, HT and SMP look identical to processes), this is an incorrect premise. If hyperthreading gives a raw performance boost (which it does), then two processes collectively crunch more data than one. Furthermore, since it's known that multiple processes on an SMP rig do boost WU throughput, it follows that the same applies to HT.

The thing I really want people to learn here is that a few letters in front of your name doesn't make you right all the time. PhDs are experts in their one or two fields, but past that, they're ordinary people. Cancer researchers are not, as a rule, qualified to analyze how microprocessors work. This guy is a chemist, not an electrical engineer or computer scientist.
 
LPerry said:
I've spent nearly 4 years working in close contact with these people. I have busted them on several occasions when I knew they were wrong. Myself and others who are outside this project (most of us also have degrees in science, math, and engineering) have closely examined why he said this about HT machines running two instances. In no way do we just take their word for it. You who disagree with this because it doesn't seem logical are correct; On the surface it doesn't seem logical. The problem is, your understanding of how some of these work units are processed and looked at is flawed. It is difficult to explain to someone the problem in simple terms. But the problem is this: In order to gather good information in a short period of time, things need to be returned as quickly as possible so the next series can start. It's a race of sorts. Processes running in "Series" are compared to "Parallel" processes in other "Series" processes to verify and confirm information gathered. If information can't be verified until you turn in something this parallel series can't move forward. Speed, not quantity becomes all important. No one cares about two work units that arrive to the party late. While their information will be used at a later date, they didn't help make the next step.

Now if you're incapable of understanding what I just said then there isn't any explanation in the world that will make you understand this subtle problem they have. If you can't understand that a rapid return of a smaller quantity is more valuable to them than a slower return of a greater quantity it is hopeless even having a discussion. Remember, you aren't the only one folding these proteins. My 2 Ghz Duals running two instances of the client are more valuable to them than your 3 Ghz HT running two instances. I'm getting more work done than you are by far. You are returning silver when they want gold.

While many parts of this project run in parallel, pieces of it have to be serial.
Vijay would rather have ONE work unit returned TODAY than TWO units returned TOMORROW. As stupid as that may seem to you, it makes perfect sense if you understand what is really going on here. You might scoff at a degree someone has, but generally it shows who is really ignorant. Using your logic, one has to have a degree in electrical engineering in order to use a computer and write a math calculation that you don't understand.


I understand what you are saying, and I'm not trying to start anything here, I just have one problem with what they are saying exactly. Why have long deadlines for the proteins if they want them done quicker? To me that defeats the purpose of getting them back as quickly as possible.

My example for this would be a couple of my systems. My main rig, an AMD running at 2.5 Ghz can chew through proteins rather quickly. My PIII 700 has yet to miss a deadline, but folds much much slower. I recently changed it over to timeless WUs so it's a moot point now. But any hyperthreading P4 running two instances would still get proteins done and sent back quicker than the PIII can with only one instance. I'm not talking about points or anything here.

I guess the benchmarking of the processors could point the slower machine to proteins that don't need to be turned back in really quickly. If that is the case, then I won't say anything about this again. If that isn't the case though, then what they are saying doesn't really make a lot of sense.

Also, I don't use any hyperthreading machines for folding. I've actually never even touched one. I am just trying to hopefully get a couple of questions resolved.
 
LPerry said:
While many parts of this project run in parallel, pieces of it have to be serial.
Vijay would rather have ONE work unit returned TODAY than TWO units returned TOMORROW. As stupid as that may seem to you, it makes perfect sense if you understand what is really going on here. You might scoff at a degree someone has, but generally it shows who is really ignorant. Using your logic, one has to have a degree in electrical engineering in order to use a computer and write a math calculation that you don't understand.
Also nonsense. If that's the case, then he shouldn't use DC in the first place. He'll get results much faster from a single Cray.
Obviously, overall data throughput is the real concern here.

It comes down to 2 things:
1) SMP is inherently flawed for this implementation, which translates to a DC flaw.
2) VJs original asserition is wrong.

Finally, my logic does not show any of the engineering / math equation nonsense. I'm saying a Physical Chemist is not qualified to say that SMP is better than a single processor while HT is somehow worse, particularly since that's not true.

But hey, if you want me to admit I'm wrong and then shut up, here's all I'm asking. Explain -- even in general terms -- my last statement:
SMP is better than a single processor while HT is somehow worse
That's what VJ's assertion is, and that's what I'm taking issue with.
 
My “Farm” is a total of 36 Gigs. Only one machine is HT, most of the rest are AMD duallys, all over 2 gigs per CPU. It has never taken my HT machine over 3 days to return 2 completed Work Units that have a due date of 2 months.

The stated goal is to return as many work units as fast as possible. If I get two units back in 3 days in which VJ has allowed 4 total months of processing time I fail to see where he has an issue.

IF serialization is the goal then Stanford should analyze the return stats, pick the fastest machines by name and or individual producer and feed them only the projects then need back NOW. That part isn’t brain surgery; it’s a simple database analysis program. They already track the stats, now analyze them, and put the higher producing machines to proper use.

If my one HT machine and my Duallys are screwing up Stanford’s program then by all means let VJ come here and tell us that. Given a logical reason I’ll be glad to make changes.

We are after all only one of the top producers and we end up with this information “second hand” (no offense meant Larry). After much frustration in the “main” DC forum I gave up, my blessings to those who stick it out.

I don’t give a damn about the points as I stated previously. I do care a ton about the science, and there in lies my problem, does anyone at Stanford understand the science?

If gross amounts of work are required on a faster basis (which I totally understand) then why in the name of God send out calculations that have a 2-month return time?

VJ, please, I invite you to come make your point in a valid way us non-degreed people who provide you with free massive computing power can understand.
 
If there are subtleties lets try and be subtle enough to figure them out, shall we?

I don't think the HT compares with SMP very well. Sure, the OS and applications may not be able to distinguish, but the fact remains that the real SMP system is two CPUs. What HT does is for folding increase the overall efficiencey by slowing down indiviual tasks and doing more of them. It is pointless to compare a 3.6 GHz HT system with SMP anything, rather the comparison should be against what the actual output is.

If there is a slight boost on the 3.6 GHz P4 enabled with HT, let's say then certain WU's can be completed as if there were two 2.0 GHz systems running. Those could be in one SMP box or in two seperate systems. The start and return times will be indistinguishable between the 3.6 GHz HT system and the two 2.0 GHz systems. It doesn't perform like a single 4.0 GHz system because the resulting return times for a WU are longer with hyper-threading enabled, not shorter.

I myself am a scientist, but I call myself that as much from a skecptical view of the world than because of whichever degrees I may have. So yes, I see that VJ's statement flies in the face of what seems logical. I don't have the insight that VJ has, hence I am MORE skeptical rather than less because of that. What I get from the first post in this thread is that two 2.0 GHz systems are better than one 3.6 GHz system.

Unless of course there is some reason that the results from a WU performed on a 2.0 GHz are more accurate or precise than the results from an effective 2.0 GHz thread on an HT-enabled 3.6 GHz CPU. If the results are identical then it solely comes down to a timing and scheduling issue.

If indeed a 3.6 GHz processor is better than two 2.0 GHz processors, then why even allow folding to run on ANY slow systems? From a program-wide standpoint the serialization would be much more effective if they only allowed 3.0 GHz and higher systems to fold. Why the heck am I borging 1.1 GHz PIIIs if they are not as good for the science? Maybe I should turn off the slowest 10% of my systems? Will that improve the science?

The folks at Stanford may have analysed this effect to reach the conclusion stated, or it may be based on someone's perceptions of how HT slows down the processing of individual work units. Given that it does indeed fly in the face of logic, then without a hint of analyses or data it will remain an opinion in my eyes. So obviously no additional hammering at the points mentioned in prior posts will change that.
 
Chugiak said:
If there are subtleties lets try and be subtle enough to figure them out, shall we?

I don't think the HT compares with SMP very well. Sure, the OS and applications may not be able to distinguish, but the fact remains that the real SMP system is two CPUs. What HT does is for folding increase the overall efficiencey by slowing down indiviual tasks and doing more of them. It is pointless to compare a 3.6 GHz HT system with SMP anything, rather the comparison should be against what the actual output is.

If there is a slight boost on the 3.6 GHz P4 enabled with HT, let's say then certain WU's can be completed as if there were two 2.0 GHz systems running. Those could be in one SMP box or in two seperate systems. The start and return times will be indistinguishable between the 3.6 GHz HT system and the two 2.0 GHz systems. It doesn't perform like a single 4.0 GHz system because the resulting return times for a WU are longer with hyper-threading enabled, not shorter.

I myself am a scientist, but I call myself that as much from a skecptical view of the world than because of whichever degrees I may have. So yes, I see that VJ's statement flies in the face of what seems logical. I don't have the insight that VJ has, hence I am MORE skeptical rather than less because of that. What I get from the first post in this thread is that two 2.0 GHz systems are better than one 3.6 GHz system.

Unless of course there is some reason that the results from a WU performed on a 2.0 GHz are more accurate or precise than the results from an effective 2.0 GHz thread on an HT-enabled 3.6 GHz CPU. If the results are identical then it solely comes down to a timing and scheduling issue.

If indeed a 3.6 GHz processor is better than two 2.0 GHz processors, then why even allow folding to run on ANY slow systems? From a program-wide standpoint the serialization would be much more effective if they only allowed 3.0 GHz and higher systems to fold. Why the heck am I borging 1.1 GHz PIIIs if they are not as good for the science? Maybe I should turn off the slowest 10% of my systems? Will that improve the science?

The folks at Stanford may have analysed this effect to reach the conclusion stated, or it may be based on someone's perceptions of how HT slows down the processing of individual work units. Given that it does indeed fly in the face of logic, then without a hint of analyses or data it will remain an opinion in my eyes. So obviously no additional hammering at the points mentioned in prior posts will change that.


Bottom line is that if you have a computer with Hyperthreading, it is fast enough that it will probably finish 2 units faster than a lot of systems out there, so really, no harm no foul. Let people do what they want.
 
Explain -- even in general terms -- my last statement:
SMP is better than a single processor while HT is somehow worse
The problem is the F@H clients aren't SMP aware, so they don't know if you have 2 physical CPUs, or in the case of HT, a physical and a logical CPU. To the client they're both the same, but they're not. If you run 1 client on any board that has 2 or more CPUs it doesn't matter how many CPUs you have F@H is going to process frames as if there was only 1 CPU, because it isn't SMP aware. That's why it's neccesary to run 1 client/CPU on a board that has more than 1 physical CPU.

A CPU that has HT tricks the OS into thinking you have 2 CPUs, but you don't. And 1 CPU can't do the work of 2 CPUs. HT tricks the F@H client into thinking the machine has 2 CPUs also, so it doesn't end up using 100% of the CPU cycles.

This whole HT thing got started when people noticed Task Manager was only showing 50% of cycles being used for the F@H client. That's not accurate though, TM is giving a false reading. In reality your HT CPU is using about 70% of the available cycles. People found that by running a second client they could recover and use the other 30%, or 50% if you go by what TM is reporting. TM would then show 100% being used and this made people happy.

On a dual (physical) CPU board with 1 client:
Physical CPU_0 = 100% cycles used.
Physical CPU_1 = 0% cycles used.

On a dual (physical) CPU board with 2 clients:
Physical CPU_0 = 100% cycles used.
Physical CPU_1 = 100% cycles used.

On a single HT CPU board with 1 client:
Physical CPU_0 = 70% cycles used.
Logical CPU_1 = 0% cycles used.

On a single HT CPU board with 2 clients:
Physical CPU_0 = 50% cycles used.
Logical CPU_1 = 50% cycles used.

There is an advantage to running 2 clients on an HT enabled CPU and board, but there is clearly a big difference between 2 physical CPUs and 1 that's pretending to be 2.

I hope this helps clear up things a little bit, but somehow I doubt it will. People seem to be polarized on this issue.

ChelseaOilman
u_chelseaoilman.gif
 
I don't really care if you admit anything... ;)
The fact that you can't even understand it in simple terms means it is a waste of time discussing it with you.
You are trying to have a discussion about something you don't really know anything about, obviously.
So, do what you want... they will fix the issue as I have told them they must.
It is their responsibility to make the software do what they want and not what the user thinks is best.
lomn75 said:
Also nonsense. If that's the case, then he shouldn't use DC in the first place. He'll get results much faster from a single Cray.
Obviously, overall data throughput is the real concern here.

It comes down to 2 things:
1) SMP is inherently flawed for this implementation, which translates to a DC flaw.
2) VJs original asserition is wrong.

Finally, my logic does not show any of the engineering / math equation nonsense. I'm saying a Physical Chemist is not qualified to say that SMP is better than a single processor while HT is somehow worse, particularly since that's not true.

But hey, if you want me to admit I'm wrong and then shut up, here's all I'm asking. Explain -- even in general terms -- my last statement:

That's what VJ's assertion is, and that's what I'm taking issue with.
 
Hito Bahadur said:
Let people do what they want.
That's exactly what Vijay is doing. ;)
This is a direct quote from Dr. Vijay Pande:
1) If you care primarily about points, running 2 procs on HT is still the best bet. We are grateful for all contributions and if people choose to run 2 procs on HT, our approach is that all contributions are welcome.

2) If you care about the science foremost and are interested in our recommendations, then do not run 2 procs on HT, but please just run one process. That won't be best for points, but is best for the science.

3) If your machine cannot make the deadlines, then one should run the timeless WUs.

Nothing in his stament says you are required to do anything you don't want to.

ChelseaOilman
u_chelseaoilman.gif
 
If a 2.4 Ghz CPU is the slowest Hypertheaded one and it give a 10% increase in production then the worst case senario is with a big slow gromac.
Numbers.
Points = 288.
Speed = 2.4 x 1.1 x 0.5 = 1.375.
Factor = 1.7 . Points per hour per Ghz for a slow Gromac.
Then it will take 288 / 1.375 / 1.7 / 24 = 5.133.
Call it 5 1/4 days as opposed to 3 days with no Hyperthreading folding 24/7.
This workunit will have a time of around 54 days.

Luck ............ :D
u=Tigerbiten.gif
 
Chelsea: I understand what you're saying. HT and SMP aren't the same tech, but the point remains that 2 processes in HT use more processor potential than a single process. Therefore, a gain is realized.

LPerry: it's very simple: if you want your opinion respected, back it up. Otherwise, we're left with a lot of unsupported claims that don't mesh (DC trades speed for volume, but VJ wants to trade volume for speed?).
Any DC app wants volume. Hyperthreading, at its absolute worst, is volume-neutral and is likely volume-positive. Therefore, use HT.
 
To the rest who don't get it... I understand.
One of the problems with the project right now is it has evolved in to more than it was when it started. There are problems with assignments to machines that shouldn't be getting something that is needed quickly. I had suggested that they add a flag for those that ran their systems 24/7 and had CPU's of 2 GHz and higher. This would in my opinion get enough people asking for the work they want back fast and they would get a better turn around between generations. The deadlines and such are not realistic with what they would like, and they know that. Again, I have pointed out to them that simply putting a shorter deadline for their return would help solve a few issues people have. What they are trying to do is not alienate the people that donate their computers and their time which they know equals money. Trying to juggle everything to perfection is impossible.

Guys, I'm only the messenger who happens to understand both the hardware and the software, and also the physics/chemistry of this process. I think most of us realize that these guys are working on cutting edge stuff and doing things that have never been done. They also produce papers and travel the world talking about what they are doing. In other words, what they are doing has some peer review and isn't done in a vacuum.

No matter whether you understand all this junk or not, keep on folding. That's the bottom line. They will eventually get things fixed, and then there will be something else. That's just the way complex thing work. ;)
 
lomn75 said:
Chelsea: I understand what you're saying. HT and SMP aren't the same tech, but the point remains that 2 processes in HT use more processor potential than a single process. Therefore, a gain is realized.
I don't think I said otherwise. :confused:

What Vijay is saying is that he would prefer donators to give up the 30% gain in order to get WUs back faster.

Of course he realizes this will net you less points if you comply, so he understands if you don't.

ChelseaOilman
u_chelseaoilman.gif
 
lomn75 said:
Any DC app wants volume. Hyperthreading, at its absolute worst, is volume-neutral and is likely volume-positive. Therefore, use HT.
This is why you don't understand... in this case your statement is wrong.

Also, I think that in most cases even those who don't always agree with me will admit that I have already proven myself to the DC community at large, I certainly don't have to spend time proving anything to you. Who are you?

Stay in school and study some Logic because your knowledge is sorely lacking.
 
LPerry said:
This is why you don't understand... in this case your statement is wrong.

Also, I think that in most cases even those who don't always agree with me will admit that I have already proven myself to the DC community at large, I certainly don't have to spend time proving anything to you. Who are you?

Stay in school and study some Logic because your knowledge is sorely lacking.

Right now I’m trying very hard NOT to take that as one very inflammatory statement.

I know to whom it was directed, but in fact it’s sort of a slap in the face to the entire team.
 
LPerry said:
This is why you don't understand... in this case your statement is wrong.

Also, I think that in most cases even those who don't always agree with me will admit that I have already proven myself to the DC community at large, I certainly don't have to spend time proving anything to you. Who are you?

Stay in school and study some Logic because your knowledge is sorely lacking.

Larry, your credentials regarding EM-III are peerless. I think it's great and have supported your site a couple times through PayPal, which by the way anyone here can do by going to The Weatherman's site and clicking on the PayPal button. Now...

SHAME ON YOU!

If you cannot be civil please do not post.

As to supporting a position in a debate/discussion/argument, your credentials mean squat. So does that of anyone who cannot back up the position with something other than "It's too complicated for you to understand."

Now should we continue this a civil manner or should this thread be locked?
 
I think I'll just leave it to you guys to debate, but I'll leave you with this quote from the guy running this project as he is more of a diplomat than I am:
This is a direct quote from Dr. Vijay Pande:
1) If you care primarily about points, running 2 procs on HT is still the best bet. We are grateful for all contributions and if people choose to run 2 procs on HT, our approach is that all contributions are welcome.

2) If you care about the science foremost and are interested in our recommendations, then do not run 2 procs on HT, but please just run one process. That won't be best for points, but is best for the science.

3) If your machine cannot make the deadlines, then one should run the timeless WUs.

I hope that clears these issues up and thanks to all for their contributions.
 
Ok, now I have to go completely to the other side of emotion and laugh like hell.

VJ is about as diplomatic as Kim Jung-Il. It was his original “diplomacy” that about broke this team in half a while back.

I’ll continue to do my best to produce what I have been because I really believe in the project. That is all I can do. It’s all any of us can do.

Fold on everyone :)
 
So basicly for every 100 WUs we turn in now he wants 80 or so instead?
I know that I am very far from understanding all the science that goes on behind the scenes here but sometimes you dont have to dig deep down into the question in order to find out what the real answer is.

Lets say that noone ever put 2 clients on an HT machine. and i mean EVER. how many less WUs would we have today? I have no idea but I'm sure its a fairly significant amount. Is VJ saying that all these extra WUs have slowed down the process of researching these protiens and whatever else they do with our WUs? I'm no mathematician but in this case I think that VJ may be wrong. Many of you are talking about *quick* researching, turning in WUs faster, etc, but I'm pretty sure that f@h falls into the category of 'long-term reseach projects'.

Ok i guess we can venture into the short-term as well. I know that you also would like a little bit of work done on the deadlines for some WUs, as well as hoping to assign certain WUs to faster computers, but with something like 160,000+ processors running f@h are you really ever waiting for days on one specific client to return one specific WU in order to move ahead in the research? I kind of doubt it.

I guess I am still failing to understand how less is more.
 
Well I for one am in this for the points (should I be ashamed or something?), so I guess I’ll just continue as I did before. Though I can see how running one instance instead of two might make sense on something like a laptop or a desktop where it isn’t going to be run all the time; you would have a better chance of making the deadline.
 
Hi -

Not here to attack/defend anyone, just so you know, hehe

A dual CPU system will always outperform a HT system.
A dual core on a single chip system will probably be outperformed by dual CPU systems, most of the time. In turn, it will probably outperform HT by a fair bit.

HT is a trick. It is a GOOD trick, and I sure like it on my laptop when I'm doing a whole bunch of work. I run dual clients (two services) on that machine. I do it because I feel like it, and because the machine can run out of work if I'm unable to have a good network connection.

A lot of my rigs are AMD as well, but for the purpose of this discussion it's not relevant ;)

The overhead of HT will slow down the WU progress per individual WU. This is as without HT you'd turn around a single WU very quickly, and then do the next one. This means you usually end up churning through WUs quicker.

With HT you do two in 'fake' parallel. This means each WU is completed more slowly as such, but as you are processing two in parallel of course when both finish you've done two ;) -- the time that takes is a little longer, points-wise you can do a little better though.

As for Vijay being diplomatic, tbh I'm not interested in that discussion - he does what he feels is best, and as far as I am concerned, anyone folding does that because of their own reasons, in the way they feel is best for them.

The way the folding project is being run means it's better to turn a WU over quicker, rather than returning two WUs a bit slower. As without HT you'll turn a single WU in quicker, that is why Vijay prefers that approach. That's the simplest explanation :cool:

To put things into perspective - I'm the admin of the support site, and I don't always agree with everyone either, nor do I expect that to happen anytime soon. My opinion is that you should do what YOU feel works best for you (and your team) - I doubt anyone, including Vijay, will argue with that :D

Fold on

P.S. As for being ashamed to be in it for the point - 'course not, it works for a lot of ppl, including myself :)
 
ChelseaOilman said:
...
On a single HT CPU board with 1 client:
Physical CPU_0 = 70% cycles used.
Logical CPU_1 = 0% cycles used.

On a single HT CPU board with 2 clients:
Physical CPU_0 = 50% cycles used.
Logical CPU_1 = 50% cycles used.

There is an advantage to running 2 clients on an HT enabled CPU and board...
I think though that you have forgotten to reflect that advantage in your synopsis. To do that you would have needed to add a column for amount of work returned, which would need a relevant time reference as well as relation to raw CPU speed to begin with, but I digress.

You know, there is another option that VJ might be suggesting which hasn't been discussed, and that is to disable HT on your rig in order to allow F@H to use 100% of the one REAL CPU. Sure this would be a bit of a point sacrifice, and also a bit of a lifestyle sacrifice. I do believe that in the end it is all up to you, though, and that according to his quote, VJ has accepted that.

That said, it is indeed hard to understand how my main rig HT 3.2 which is doing frames on the exact same protein with less than a 20 sec/frame difference (according to EMIII, <--- thanks LPerry) from my HTPC AMD @ 1.8, is being "less effective" at doing the research that is asked of it.

I trust LPerry's comments that with a more complete knowledge of the target turnaround times, quantities of results needed for verification, and research sequencing, we could probably understand this better, and would hope that VJ et al would do a better job of publishing the reasoning and/or fixing the client as has been suggested. I also respect anyone's right to question the veracity of such statements until they are able to reconcile it to themselves.

I have to admit that this one has given me a bit of question as to what to do myself. I AM in it for the science, but dangit, that sharp guy and I are in a fight for the #19 spot for the time being. That and really I only have open access to one of the 4 PCs that I have folding for me that are HT-enabled.
 
There is one clear answer and it is amazing it has not been brought up yet.

Let us have a SMP enabled client and all dually and HT computers would be very, very happy. We have been waiting on one for a couple years now. Want to see fast results? Try 4+ GHz of dually AMD power crunching on the same work unit.

That's all I have to say about that.
 
This may be a silly question, but I thought workunits that were needed back by a certain time were assigned to clients with higher PF's. So if a if a HT machine running two instances doesn't have a high enough PF, than it won't get assigned to that machine.

If Vijay has very time sensitive workunits, why doesn't he just have them only assigned to say, machines with .97 PF or higher?
 
Back
Top