Async compute gets 30% increase in performance. Maxwell doesn't support async.

TaintedSquirrel · Sep 2, 2015

KickAssCop said:
Wow, seems like this benchmark is the only glimpse of hope for AMD product users.
Meanwhile, enjoying Mad Max and MGS V in full glory.

Yeah but if those were DX12 games the 290X would be like 30% faster...

I've accepted the fact that no matter what's in my rig for the next year, I'm going to want a 16nm GPU regardless.

The Mac · Sep 2, 2015

i think thats pretty much everyones assesment:

Not worth buying anything until Pascal/Arctic Islands.

KickAssCop · Sep 2, 2015

TaintedSquirrel said:
Yeah but if those were DX12 games the 290X would be like 30% faster...
I've accepted the fact that no matter what's in my rig for the next year, I'm going to want a 16nm GPU regardless.

That is like saying if grandma had balls she would be your grand pa.
Here and now. Eleventy billion dx11 games still to release with 980 ti stomping fury x on the floor. When pascal hits we will get that as well or if AMD is alive by then and has a superior card, will get that.

For the 2 Direct X 12 games that are releasing this year, I have no interest. The only DX 12 compat game that will define my next purchase for now is DX MD. At that point also there has to be a graphical feature worth buying an AMD card for (something I would miss in terms of graphics fidelity).

Still to see a screenshot comparing DX11 vs DX12 render and the feature I miss by using the DX11 code path in AOTS bench.

5150Joker · Sep 2, 2015

The funny thing is that the AMD fans are counting on consoles ACE use to translate over to the PC. But given AMD's market share and financial resources and with UE4 being so popular (and pro-NVIDIA), the chances of anything being optimized for AMD desktop GPUs vs NVIDIA is nearly zero unless it's a sponsored game like AoS. The excuses from the peanut gallery will be pretty funny once we see NVIDIA pull ahead in meaningful DX 12 titles.

Rizen · Sep 2, 2015

5150Joker said:
The funny thing is that the AMD fans are counting on consoles ACE use to translate over to the PC. But given AMD's market share and financial resources and with UE4 being so popular (and pro-NVIDIA), the chances of anything being optimized for AMD desktop GPUs vs NVIDIA is nearly zero unless it's a sponsored game like AoS. The excuses from the peanut gallery will be pretty funny once we see NVIDIA pull ahead in meaningful DX 12 titles.

I don't think it's likely that developers are going to skew PC ports for NVIDIA just based on market share. The entire point of DX12 is a lower level API and from all indications, the console code is much closer to DX12 than DX11. So presumably, since the optimizations are already done for ACE in the console titles (as both consoles are using GCN-powered APUs) it would be very little work - comparatively - to keep those optimizations in place. I think this is what AMD has been betting on.

Obviously, how much of a difference this makes in the AMD vs NVIDIA performance competition will come down to the game itself, what the rendering engine is doing, how optimized the engine is in the first place, and what further optimizations are done for DX12 and NVIDIA, but I'm all for anything that gives PC gaming a speed boost.

tybert7 · Sep 2, 2015

razor1 said:
It is capable of it and its fast at it too, its all dependent on how the code is written.

From what I've seen so far it can do some async compute but takes a hit when trying to do it in tandem with other graphics work, have you found something different? Is that all in the coding or a limitation of nvidias implementation?

If the former, let's say the oxide guys retool their code for nvidia gpus to give optimal performance when using their specific hardware for asynch compute + graphics, then have them do the same for amd hardware.

AFTER those optimizations, what do you expect will perform better? Which is the better design for the types of mixed graphics/compute workloads we are expected to see? That is the question I want answered definitively. They may still want to tweak things for that 80% of the market that has nvidia discreet gpu shipments, but they ought to design their game to take advantage of the best practices and techniques for getting the job done, and that might not be the current nvidia implementation. If so, then nvidia needs to alter their designs going forward.

razor1 · Sep 2, 2015

I'm thinking its on both sides, both Oxide and nV, but without more information to what Oxide stated with their specific issue, its us looking through a window at what is going on, in this case Oxide gave us a peep hole with concrete wall on the otherside. There seems to be limitations when going past the 31 +1 compute and graphics on nV's Maxwell 2 cards, where the Queue isn't being refilled as its being used.

StormClaw · Sep 2, 2015

tybert7 said:
From what I've seen so far it can do some async compute but takes a hit when trying to do it in tandem with other graphics work, have you found something different? Is that all in the coding or a limitation of nvidias implementation?

If the former, let's say the oxide guys retool their code for nvidia gpus to give optimal performance when using their specific hardware for asynch compute + graphics, then have them do the same for amd hardware.

AFTER those optimizations, what do you expect will perform better? Which is the better design for the types of mixed graphics/compute workloads we are expected to see? That is the question I want answered definitively. They may still want to tweak things for that 80% of the market that has nvidia discreet gpu shipments, but they ought to design their game to take advantage of the best practices and techniques for getting the job done, and that might not be the current nvidia implementation. If so, then nvidia needs to alter their designs going forward.

Jesus, give it up already.

Nvidya can do async only with software (which turns out to run pretty shit). Maxwell physically lacks an asynchronous compute capable controller and they can't do anything about it.

razor1 · Sep 2, 2015

StormClaw said:
Jesus, give it up already.

Nvidya can do async only with software (which turns out to run pretty shit). Maxwell physically lacks an asynchronous compute capable controller and they can't do anything about it.

If the cpu is involved in helping the driver shuffle the async code (breaking the code down to instructions analyzing the work load and then sending it back to the GPU to processes in a certain order) the latency would go through the roof, its not easy to hide latency from CPU to a discrete GPU, because of the nature itself will introduce latency into a pipeline like that.

trudude · Sep 2, 2015

So what is AMD's excuse for being slow in everything else compared to nVidia...

TaintedSquirrel · Sep 2, 2015

trudude said:
So what is AMD's excuse for being slow in everything else compared to nVidia...

Don't you get it, they were playing the long game. They've spent the last 3+ years building towards DX12! Go read Hallock's posts. They're geniuses.

Revdarian · Sep 2, 2015

KickAssCop said:
Still to see a screenshot comparing DX11 vs DX12 render and the feature I miss by using the DX11 code path in AOTS bench.

That tells me that you have no idea what Dx12 is about.

It isn't about new effects, it is about a more efficient way to handle graphics, efficient on the CPU side mainly, and the Asynchronous point is one of the easiest ways to increase efficiency on the GPU side too.

The effects that it would allow are actually "Better Performance with the Same Hardware", say High Lightning on the same performance that dx11 would only allow you to use Medium, that kind of stuff.

trudude · Sep 2, 2015

So why did nVidia decide not to support Async compute?

Yakk · Sep 2, 2015

razor1 said:
If the cpu is involved in helping the driver shuffle the async code (breaking the code down to instructions analyzing the work load and then sending it back to the GPU to processes in a certain order) the latency would go through the roof, its not easy to hide latency from CPU to a discrete GPU, because of the nature itself will introduce latency into a pipeline like that.

I missed a couple days worth of news, but... So taking the CPU out of the equation cause it's just too slow to communicate with it...

I'm just looking at what information we have seen so far...

Whatever redirecting nvidia MAY be doing GPU side, only works with a queue size of 31 or less per their hardware setup limit. IF this information cannot be out of order then some shuffling is also happening within this (small) queue and the main queue. As soon as there is too much information for the (small) queue (which would act as a fast buffer in this case) nothing can be shuffled anymore and everything is processed sequencially until there is again space in the queue.

On top of this the GPU may have to constantly be switching Context depending on the type of activity being requested. (Which may be fine, except if the queue is constantly being exceeded on different contexts)

Hmm...

Yakk · Sep 2, 2015

trudude said:
So why did nVidia decide not to support Async compute?

Only they know, but DX11 works sequencially so it makes sense, and they maybe were too far into their design cycle when they realized they were stuck. Also I'd guess it may contribute to their power savings on idle states. Might even be why we see those crazy power fluctuations when the gpu is active.

Rizen · Sep 2, 2015

trudude said:
So why did nVidia decide not to support Async compute?

Most hardware designs are planned out years in advance so async compute was probably not believed to be a critical feature for the near future, which if you look at the market place today appears to be technically correct.

TekRok · Sep 2, 2015

SO I dont feel like going through 13 pages. Can someone give me cliff notes of how this affects 980ti users?

Rizen · Sep 2, 2015

It doesn't affect you at all currently but might make AMD cards a bit more competitive against NVIDIA cards in DX12 games eventually.

StormClaw · Sep 2, 2015

trudude said:
So why did nVidia decide not to support Async compute?

they were too much of a jews to slap a $65 controller on the card

Rauelius · Sep 2, 2015

Rizen said:
I don't think it's likely that developers are going to skew PC ports for NVIDIA just based on market share. The entire point of DX12 is a lower level API and from all indications, the console code is much closer to DX12 than DX11. So presumably, since the optimizations are already done for ACE in the console titles (as both consoles are using GCN-powered APUs) it would be very little work - comparatively - to keep those optimizations in place. I think this is what AMD has been betting on.

Obviously, how much of a difference this makes in the AMD vs NVIDIA performance competition will come down to the game itself, what the rendering engine is doing, how optimized the engine is in the first place, and what further optimizations are done for DX12 and NVIDIA, but I'm all for anything that gives PC gaming a speed boost.

I've noticed that people with multiple machines with varying brands of hardware like us, are more even-headed and logical in these threads.

trick0502 · Sep 2, 2015

Rizen said:
Most hardware designs are planned out years in advance so async compute was probably not believed to be a critical feature for the near future, which if you look at the market place today appears to be technically correct.

It's not like they haven't been working on dx12 for 4 years now

https://www.3dvisionlive.com/content/directx-12-major-stride-gaming

According to their blog, they support async

http://blogs.nvidia.com/blog/2015/07/29/directx-12-windows-10/

TaintedSquirrel · Sep 2, 2015

They've been boasting "Full DX12 Support" including Async for a while.
They will attempt to use lawyer-speak to escape this situation. Probably explains why they remain silent. I can't think of anything they could possibly say that will help them in this situation unless they post some magical benchmarks that prove Oxide and everyone else wrong...

Rauelius · Sep 2, 2015

Igroock said:
SO I dont feel like going through 13 pages. Can someone give me cliff notes of how this affects 980ti users?

In DX12 the R9-290x is about equal to the GTX980-Ti if Asynchronous Shaders are used.

https://www.youtube.com/watch?v=v3dUhep0rBs

That video explains how it works.

Last week I was looking at getting the Gigabyte GTX980-Ti G1 Gaming to complete my main machine, and move the R9-290x to my Lan PC. With this latest news I'm now considering the Sapphire R9-Fury. The issue is that for most of the games I play right now, and look forward to play (Arkham Knight) will play better on the GTX980-Ti, and they come with MGSV which I was going to buy anyway and helps justify the $100+ price difference between the Fury and GTX980-Ti.

The problem is, if the best one could hope for in DX12 performance from the GTX980-Ti would be at best equal to the card I already have, then for the long haul it might be a nice upgrade today, but not one for the long haul.

With the Fury I get somewhat faster performance than my current card overclocked to R9-390x clocks, but when DX12 is more fully utilized, the Fury will still hang around for the long haul and start pulling away from the 390x, and considering the costly investment for either of these cards, the longer it can stay in my system the better.

This all hinges the widespread use of Async. Shaders. I was ready to jump this week on the Gigabyte GTX980-Ti, but now I think I'll wait till prices drop. If I see the Fury drop to $500 or $450 I'll go that route, if the GTX980-Ti dips to $550+MGSV I may go that route.

I don't think I'm alone in this quandary.

Revdarian · Sep 2, 2015

It is too early to draw those kind of comparisons as they will pertain in a Game to Game scenario, as not every dx12 load will look the same (heck, i would say that it would be definitely more varied than dx11 loads).

There is a "possibility" for the 290x to increase it's performance that much, but that is all in theory at the moment.

Honestly, stay with your card in peace, no matter if Nvidia or AMD, wait until a new dx12 game that interests you is announced or published, and take your decision after seeing actual performance.

Optik · Sep 2, 2015

Revdarian said:
It is too early to draw those kind of comparisons as they will pertain in a Game to Game scenario, as not every dx12 load will look the same (heck, i would say that it would be definitely more varied than dx11 loads).

There is a "possibility" for the 290x to increase it's performance that much, but that is all in theory at the moment.

Honestly, stay with your card in peace, no matter if Nvidia or AMD, wait until a new dx12 game that interests you is announced or published, and take your decision after seeing actual performance.

its not in theory, people have been talking about possible increases in AMD performance from w10 and then dx12 since the spring.

We are now seeing the manifestations of those increases with this benchmark.

jwcalla · Sep 2, 2015

Rauelius said:
In DX12 the R9-290x is about equal to the GTX980-Ti if Asynchronous Shaders are used.

You also kinda forgot to mention that it's also equal to a Fury X.

Optik · Sep 2, 2015

jwcalla said:
You also kinda forgot to mention that it's also equal to a Fury X.

and then the benchmark saying that everything is currently GPU limited.

Mav451 · Sep 2, 2015

Yeah if there was an AMD card I got my eye on, it'd be the Sapphire Fury (vanilla). Moreover, my hope is that kind of cooler 'overdesign' continues in Arctic Islands. Really a by-product of HBM reducing PCB size requirements haha. Moreover, for a top-level, flagship card to handle 3D loads as quietly as it does? Yes please. Secondly, in terms of multi-monitor, it seems AMD finally addressed the issue of high power consumption (14W Fury vs 50+ on 290/290x) https://www.techpowerup.com/reviews/Sapphire/R9_Fury_Tri-X_OC/28.html

E.g. The 390 Powercolor had an even more absurd 74W for multi-monitor, so it does appear this is only a 290/390 era card problem. And yeah this matters as the older GTX580 had some issues with downclocking, especially a 120Hz panel as my main.

Last time I used an AMD card, I was a regular visitor of Rage3D. Curious how many users here remember that site.

FrameBuffer · Sep 2, 2015

dscherf said:
Razor1 is in that B3D thread as a contributor.

and he's been proven wrong in that thread as well..

TaintedSquirrel · Sep 2, 2015

The regular Fury is shit. High price, no overclockability. Power is fine, not great, but fine.

http://arstechnica.com/gadgets/2015...ds-best-card-in-years-but-just-who-is-it-for/

If I were buying an AMD card today it would be a 390 and nothing else.

trick0502 · Sep 2, 2015

Nv working on a gameworks unreal engine dx12 benchmark? Amd performance tanks? You heard it here first.

Note: I have no source, link, etc. to back this up. I am just speculating.

Yakk · Sep 2, 2015

Well, the more this story unfolds, the more what AMD is saying seems to be getting validated :

The results showed that Maxwell cards processed up to 31 command queues without any latency increase, the time taken increasing further with every 31 additional queues. Many users over at Beyond3D and elsewhere - including this guy - took that as a strong indication that NVIDIA cards were, in fact, capable of executing asynchronous graphics/compute.

But AMD see the results as proof that they're not. Here's the full statement from Tungler:

"The results produced by the benchmark do, in fact, illustrate that Maxwell is not capable of asynchronously executing graphics and compute. If you look at the Maxwell async compute results, you will see that the bar heights are the result of adding graphics and compute together. This indicates that the workloads are being done serially, not asynchronously. Compare that to the AMD results, where the async compute results show graphics and compute being processed simultaneously with no noticeable rise in overall frame latency.

"If Maxwell supported asynchronous compute, their results would look like the GCN results. Remember that asynchronous compute isn’t whether or not a GPU can do compute and graphics across a long workload, it’s whether or not the GPU can perform these workloads simultaneously without affecting the frame latency. MDolenc’s benchmark clearly shows that only GCN can do this."

http://www.pcgamesn.com/amd-respond-to-nvidia-dx12-async-controversy-maxwell-is-not-capable-of-asynchronously-executing-graphics-and-compute

Mav451 · Sep 2, 2015

We already knew this last week.
Seems all the dinosaurs (tech news sites) are only just catching up haha.

razor1 · Sep 2, 2015

Yakk said:
I missed a couple days worth of news, but... So taking the CPU out of the equation cause it's just too slow to communicate with it...

I'm just looking at what information we have seen so far...

Whatever redirecting nvidia MAY be doing GPU side, only works with a queue size of 31 or less per their hardware setup limit. IF this information cannot be out of order then some shuffling is also happening within this (small) queue and the main queue. As soon as there is too much information for the (small) queue (which would act as a fast buffer in this case) nothing can be shuffled anymore and everything is processed sequencially until there is again space in the queue.

On top of this the GPU may have to constantly be switching Context depending on the type of activity being requested. (Which may be fine, except if the queue is constantly being exceeded on different contexts)

Hmm...

yeah that is a possibility. To many variables to really see what is going on and the only way to cut down the variables is to get more data from different sources.

razor1 · Sep 2, 2015

Yakk said:
Well, the more this story unfolds, the more what AMD is saying seems to be getting validated :

http://www.pcgamesn.com/amd-respond...asynchronously-executing-graphics-and-compute

They should rephrase that only older GCN architecture can do it if that is their thinking. Fiji seems to have the same issues as Maxwell if that is their thinking.

jwcalla · Sep 2, 2015

It's fascinating to see so many statements from AMD personnel regarding their competitor's products. That's actually kind of rare for the industry.

TaintedSquirrel · Sep 2, 2015

jwcalla said:
It's fascinating to see so many statements from AMD personnel regarding their competitor's products. That's actually kind of rare for the industry.

Not rare for AMD at all.

Revdarian · Sep 2, 2015

Optik said:
its not in theory, people have been talking about possible increases in AMD performance from w10 and then dx12 since the spring.

We are now seeing the manifestations of those increases with this benchmark.

To the point of a 980 TI is very much in theory at the moment, remember that Ashes is pre-beta so we can't count that chicken until it hatches.

jwcalla · Sep 2, 2015

TaintedSquirrel said:
Not rare for AMD at all.

Yeah there was that time when they stomped a Geforce card live on-stage during one of their press announcements.

I think it was just a 650 Ti though.

TaintedSquirrel · Sep 2, 2015

They smash a GTX 680 in "The Fixer" video series.

http://www.youtube.com/watch?v=eH6XayaLTw8&t=1m20s

Async compute gets 30% increase in performance. Maxwell doesn't support async.

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

2[H]4U

[H]F Junkie

Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

2[H]4U

[H]ard|Gawd

Supreme [H]ardness

Supreme [H]ardness

[H]F Junkie

2[H]4U

[H]F Junkie

Gawd

2[H]4U

Supreme [H]ardness

[H]F Junkie

2[H]4U

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

Supreme [H]ardness

Limp Gawd

[H]F Junkie

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

[H]F Junkie

[H]F Junkie

2[H]4U

[H]F Junkie

2[H]4U

2[H]4U

[H]F Junkie