Phenom performance killer: CnQ

pxc · Jul 2, 2008

AMD's Phenom X4 9950, 9350e and 9150e: Lower Prices, Voltage Tricks and Strange Behavior

As if things couldn't get worse...

Code:

SYSMark 2007 Preview    CnQ/EIST  CnQ EIST  Performance Increase from 
   Overall Score        Enabled  /Disabled      Disabling CnQ/EIST 
AMD Phenom X4 9350e       101       113               11.8% 
AMD Athlon X2 6400+       121       123                1.7% 
Intel Core 2 Quad Q9450   153       156                2.0%

Disabling CnQ increased my SYSMark scores by around 12% and cut my Photoshop CS3 render times in half (58.7s with CnQ enabled, 35.2s with CnQ disabled); enabling CnQ had the opposite effect. Gary ran similar numbers using PCMark Vantage and found a 5% difference. AMD originally insisted that the problem was because SYSMark introduces unrealistic pauses into its benchmark (so called "think" times or periods of time while the system is waiting for user input), but since we found the same issue in other benchmarks (PCM Vantage and our Photoshop test), we believe this is more than just a SYSMark issue.

Read page 4 for more CnQ flakiness.

vista_blista · Jul 2, 2008

Probably more like operator error. Gary Key is clueless. Vista isn't a very stable platform for changing lower p-states for either AMD or Intel. AMD probably increased the transistion intervals for stability, or clock timing settings. This is probably a simple BIOS depending on the individual mobo. But he still insists on testing with Vista SP1 and overclocking... and subsequently blowing up... cheap IGP motherboards.

pxc · Jul 2, 2008

vista_blista said:
Probably more like operator error. Gary Key is clueless.

Maybe he is, but this stuff is pretty mature and should be foolproof, aside from possibly a bad BIOS and that's not exactly the reviewer's fault. The first person "I" on that page implied it was Anand doing the testing anyways since he is the one who has direct AMD contacts.

Vista isn't a very stable platform for changing lower p-states for either AMD or Intel.

Well at least for Intel, Vista and XP work perfectly fine with EIST/SpeedStep. All of my systems besides one Athlon X2 running XP have power management enabled. I haven't had any problems with Athlon X2 or Intel CPUs and power management in Vista, and never had problems with Intel and XP power management.

I do think there may be something to it. At least one review that I saw (not AT) had a "phenomenonal" increase in performance between the 9850 and 9950 CPUs that could point to the same problem.

I guess we'll see more in the next couple of days.

vista_blista · Jul 2, 2008

I'm not familiar with CNQ or EIST in the BIOS. I use software solutions for both my Intel and AMD cpus. Something like CrystalCPUID or Rmclock isn't set-and-forget. I'm talking about the most power savings possible, laptop or desktop. If you check the Rmlock forums there seems to be a problem with newer CPUs, 65nm<, especially running Vista x64. I dont have any problem whatsoever with XP Pro, and can undervolt by a substantial margin. My C2D Thinkpad especially has a weird throttling issue with EIST that isn't apparent until viewed in Rmclock's monitoring. I haven't tested it fully though.

pxc · Jul 2, 2008

Who runs RMClock? It's never worked right when I tested it over the years.

For 99.999%+ of the rest of the population running modern CPUs get power savings that are enabled by default in the OS.

Tom128 · Jul 2, 2008

pxc said:
Who runs RMClock? It's never worked right when I tested it over the years.

*raises hand*

It has kept my laptop running cool for a while now (undervolting). Have never had an issue with it.

pxc · Jul 2, 2008

I used Centrino Hardware Control back when it was still called that. I might use NHC if I overclock my laptop. CHC was much better than RMClock.

thekernel · Jul 2, 2008

vista_blista said:
Probably more like operator error. Gary Key is clueless. Vista isn't a very stable platform for changing lower p-states for either AMD or Intel. AMD probably increased the transistion intervals for stability, or clock timing settings. This is probably a simple BIOS depending on the individual mobo. But he still insists on testing with Vista SP1 and overclocking... and subsequently blowing up... cheap IGP motherboards.

You're kidding right? You're going to fault the reviewer for using the latest OS? How about fault AMD and the mobo manufacturers for not building a stable platform under the latest OS that is shipped on pretty much all new PCs?

Intel certainly doesn't seem to have a problem with stability under Vista SP1, why should AMD get a free pass?

dandragonrage · Jul 3, 2008

I run RMClock on my home desktop, my laptop and my work computer and it works perfectly. I use it to undervolt at lower multiplier settings and I also use it to control the clock changing delay.

FreiDOg · Jul 6, 2008

Our minds then wandered over to what we saw when we looked at the AMD Power Meter. Since Windows Vista takes it upon itself to move threads between cores in fairly stupid ways, during the Photoshop test we saw what looked like threads bouncing around between cores or cycling through them in rapid succession. Whatever was actually being done, the result was that one processor would ramp up to full power (1GHz up to 2GHz) and then drop back down as the next CPU came up to speed.

We talked about how it's possible that threads moving between these different cores, needing to wake the next one up rather than running on an already at speed core, could possibly impact performance. As the Phenom is the only CPU architecture we currently have access to with individual PLLs per core (Intel's CPUs must run all cores at the same frequency), the CnQ issues could be related to that.

On the surface, that would seem to be pretty reasonable explanation for the results.
Until Phenom, the OS never had to deal with the possibility that cores were running at different speeds in the same system. (At least Windows hasn't)
It should come as no surprise then that when a thread is returning from a blocked or dormant state no preference is given to which core it is being re-activated on. In fact I would be surprised if it wasn't scheduled on an idle core, it would be counterproductive to schedule it on an active core.

Seems like the CPU and OS need to communicate to set up the active core(s) (any cores currently in use), a ready core (no active threads but at the same power state as the active cores) and idle cores (low power state, not ready for immediate execution), or AMD would have to implement some hardware trigger to shift the core into it's highest power state immediately when a new thread is scheduled on the core.

aop · Jul 6, 2008

I tried CnQ once with a PC equipped with A64 X2 5000+ BE CPU. With CnQ enabled it didn't raise clocks over 2GHz even in heavy load no matter what I did. So I couldn't do anything else but disable CnQ on that PC :/

RamonGTP · Jul 6, 2008

Vista has been around for over a year now. I would hope that it's a stable platform to test on at this point. Seems like more of an AMD issue than a Vista issue to me.

dderidex · Jul 6, 2008

FreiDOg said:
On the surface, that would seem to be pretty reasonable explanation for the results.
Until Phenom, the OS never had to deal with the possibility that cores were running at different speeds in the same system. (At least Windows hasn't)
It should come as no surprise then that when a thread is returning from a blocked or dormant state no preference is given to which core it is being re-activated on. In fact I would be surprised if it wasn't scheduled on an idle core, it would be counterproductive to schedule it on an active core.

What? No - that can't be right, surely?

I mean, granted, 'multiple cores on a single socket' are definitely new. But Windows versions back to NT have had to deal with multiple CORES themselves - on different sockets, granted. And Windows 2000 supported the same ACPI for power management that is still used as the standard in Vista - the standard version being able to support up to 2 processors, and 'server' edition up to 4.

This HAS to have come up before.

aop said:
I tried CnQ once with a PC equipped with A64 X2 5000+ BE CPU. With CnQ enabled it didn't raise clocks over 2GHz even in heavy load no matter what I did. So I couldn't do anything else but disable CnQ on that PC :/

*shrugs*

I have overclocking issues on an old 939 system with CnQ enabled, true, but...the power savings and noise difference is simply worth it. I don't do hardcore number crunching, just gaming, so the best overclock out there is hardly more effective at improving my 'gameplay experience' (nudge, nudge, wink, wink) than spending $50 more to buy the next level up of graphics card.

Lost! · Jul 7, 2008

This is an AMD issue. I think it has to do with 780G and Hypertransport 3 on the 65W Phenoms.

It was rumoured before the 780G release that it only used Hypertransport 1 on the Phenom 9100e. Whether anyone tested that remains to be seen (haven't found something up to credence) but it has been a rumour, and might be well justifying these drops in perf.

790FX is void of these problems I reckon. Now if ONLY they tested the 9350 on that too...

FreiDOg · Jul 7, 2008

dderidex said:
What? No - that can't be right, surely?

I mean, granted, 'multiple cores on a single socket' are definitely new. But Windows versions back to NT have had to deal with multiple CORES themselves - on different sockets, granted. And Windows 2000 supported the same ACPI for power management that is still used as the standard in Vista - the standard version being able to support up to 2 processors, and 'server' edition up to 4.

This HAS to have come up before.

Neither X2 nor any of Inte'ls multi-core offerings have ever permitted cores on the same CPU to run at different speeds, Phenom does.
For example,
Core 0 is running a task at full load,it's at full power: 2ghz, Cores 1, 2, and 3 are pretty much idle running at low power states.
When the main thread on core 0 is blocked or remove, it's more likely that windows (or any OS) would try to move a low priority thread to core 0 when one asks for a timeslice, and then bring back the primary thread on an idle core.
Traditioanlly this hasn't been a problem since CPU time slices, and disk read times are generally smaller than the C'nQ check interval so the primary thread, even when it is returned to a different core, it's most likely coming back to a core running in a high power state since al cores were in that high power state when it became blocked.

With Phenom, only the core the main thread is running on when it becomes blocked is necessarily in a high power state. But according to Anandtech, Windows was not typically returning that thread to the same core, and so it was being re-started much of the time on a core in a lower power state.

A windows environment has dealt with scheduling in SMP and NUMA systems for years now, but it has never had to deal with instances where cores within the same system were running at different clock speeds. That throws a whole new wrench into scheduling for the OS - there are now significant advantages to running certain threads on certain cores.

I'm not blaming Microsoft over AMD. It does seem that AMD has some holes in their C'nQ right now and they have made the OS' job much more difficult then it maybe needed to be.
But to call it a bug in C'nQ I think is perhaps inaccurate because C'nQ is doing exactly what it is supposed to; the OS is not however able to properly schedule threads on Phenom at this time because it have at all the information it should when making that descision.

dderidex · Jul 7, 2008

FreiDOg said:
A windows environment has dealt with scheduling in SMP and NUMA systems for years now, but it has never had to deal with instances where cores within the same system were running at different clock speeds. That throws a whole new wrench into scheduling for the OS - there are now significant advantages to running certain threads on certain cores.

But in an SMP system, that is possible, no? The separate CPUs aren't going to be all running at the same clocks. So if one of these old-school systems had 4 physical processors in it, it has 4 cores (of course: one each) - potentially all at different power states. Doesn't matter if they are all on the same die or not.

Ergo, Windows must have dealt with this issue before.

RamonGTP · Jul 7, 2008

Even if this is a software and not a hardware issue, unless I'm mistaken, Microsoft isn't the one who wrote the "AMD Processor Driver"

FreiDOg · Jul 7, 2008

dderidex said:
But in an SMP system, that is possible, no? The separate CPUs aren't going to be all running at the same clocks. So if one of these old-school systems had 4 physical processors in it, it has 4 cores (of course: one each) - potentially all at different power states. Doesn't matter if they are all on the same die or not.

Ergo, Windows must have dealt with this issue before.

Everyone from CPU manufacturers to Microsoft to system vendors required those systems to have matched CPUs. You could probably put two different CPUs into a windows environment, but no one would ever recommend you do so; it would be far to unpredictable for 'serious' (read: business) use.
The problem historically has been in maintaining the integrity of shared data and consistency of transfers across of shared bus without predictable timings.
AMD can do it because they hide access to the core behind the system request interface. They have written the hardware to handle that asymmetric access.

There have been a few platforms that properly supported different CPU speeds; TruUnix64 for DecAlpha CPUs comes to mind; but they are fairly far and few between and rarely sell for less than 6 figures.
That's not to say you can't plug two different CPUs into a windows system and have it work; it's just to say it was considered insane to try and a circumstance never coded for by Microsoft.

And, lets just say none of that was true and you can mix and match CPU speeds to your desires; those CPUs have always been a fixed speed. Phenom changes that; the OS won't necessarily know what power state a given core is in unless it queries the CPU when scheduling a thread. It can make a guess based on previous scheduling, but changing power states is largely up to the hardware or BIOS layer, not software. You're still going from a predefined envrionment where relative performance is known at boot time, to one where performance is very much in flux.

dderidex · Jul 8, 2008

Nonono - I was referring to power save states.

Phenom is hardly the first CPU that would run a core at less than 100% clock speed. In fact, Intel CPUs back to the original Pentium III could support multiple clock speed as a result of power steps.

pxc · Jul 9, 2008

lostcircuits.com weighs in with more detail: http://www.lostcircuits.com/cpu/amd_phenom9350/16.shtml

What a mess.

Lost! said:
This is an AMD issue. I think it has to do with 780G and Hypertransport 3 on the 65W Phenoms.

It seems to affect the 9950 too:

from the lostcircuits 9950/9360e review link above said:
The CnQ Conundrum

Over the course of testing various iterations of Phenom we came across a number of results that were inconsistent to the point where they no longer made any sense. One example was the Phenom 9600 (B2 revision) outperforming the Phenom 9900 in selective benchmarks when clocked to the same speed albeit with a lower NB frequency. Benchmarks affected were Virtual Dub/Divx 6.7, F.E.A.R, Crysis CPU benchmark and others that sometimes, for no obvious reason started to give erratic results.

Well, that's one good thing about the differently broken B2 9600 I have.

cannondale06 · Jul 9, 2008

the more I read about Phenom the less I want it

pxc · Jul 11, 2008

Oh boy, I just did testing on my own B2 9600. CnQ does not work right with that chip either.

I was doing some SuperPi tests and noticed it was running very slow, even for a Phenom. This is with Vista 32-bit (no SP1) and the TLB fix is disabled, but power management is on. So I opened CPU-Z and I could see the CPU alternating from minimum speed to maximum speed on *every refresh* during the SuperPi run. I disabled power management and the score improved dramatically. CPU-Z showed the CPU running at full speed, of course. My results for SuperPi Mod XS 1.5 1M:

Phenom 9600 @ 2.3GHz CnQ on: 40s
Phenom 9600 @ 2.3GHz CnQ off: 32s

It's repeatable when I enable and disable power management without rebooting.

The power difference isn't deathly critical (to me) at idle, but geez. I hope this is something that AMD can fix with a microcode or CPU driver update.

Phenom performance killer: CnQ

Extremely [H]

Limp Gawd

Extremely [H]

Limp Gawd

Extremely [H]

Limp Gawd

Extremely [H]

Weaksauce

[H]F Junkie

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Extremely [H]

[H]F Junkie

Extremely [H]