Async compute gets 30% increase in performance. Maxwell doesn't support async.

Wow, seems like this benchmark is the only glimpse of hope for AMD product users.
Meanwhile, enjoying Mad Max and MGS V in full glory.
Yeah but if those were DX12 games the 290X would be like 30% faster... :rolleyes:
I've accepted the fact that no matter what's in my rig for the next year, I'm going to want a 16nm GPU regardless.
 
i think thats pretty much everyones assesment:

Not worth buying anything until Pascal/Arctic Islands.
 
Yeah but if those were DX12 games the 290X would be like 30% faster... :rolleyes:
I've accepted the fact that no matter what's in my rig for the next year, I'm going to want a 16nm GPU regardless.
That is like saying if grandma had balls she would be your grand pa.
Here and now. Eleventy billion dx11 games still to release with 980 ti stomping fury x on the floor. When pascal hits we will get that as well or if AMD is alive by then and has a superior card, will get that.

For the 2 Direct X 12 games that are releasing this year, I have no interest. The only DX 12 compat game that will define my next purchase for now is DX MD. At that point also there has to be a graphical feature worth buying an AMD card for (something I would miss in terms of graphics fidelity).

Still to see a screenshot comparing DX11 vs DX12 render and the feature I miss by using the DX11 code path in AOTS bench.
 
The funny thing is that the AMD fans are counting on consoles ACE use to translate over to the PC. But given AMD's market share and financial resources and with UE4 being so popular (and pro-NVIDIA), the chances of anything being optimized for AMD desktop GPUs vs NVIDIA is nearly zero unless it's a sponsored game like AoS. The excuses from the peanut gallery will be pretty funny once we see NVIDIA pull ahead in meaningful DX 12 titles.
 
The funny thing is that the AMD fans are counting on consoles ACE use to translate over to the PC. But given AMD's market share and financial resources and with UE4 being so popular (and pro-NVIDIA), the chances of anything being optimized for AMD desktop GPUs vs NVIDIA is nearly zero unless it's a sponsored game like AoS. The excuses from the peanut gallery will be pretty funny once we see NVIDIA pull ahead in meaningful DX 12 titles.
I don't think it's likely that developers are going to skew PC ports for NVIDIA just based on market share. The entire point of DX12 is a lower level API and from all indications, the console code is much closer to DX12 than DX11. So presumably, since the optimizations are already done for ACE in the console titles (as both consoles are using GCN-powered APUs) it would be very little work - comparatively - to keep those optimizations in place. I think this is what AMD has been betting on.

Obviously, how much of a difference this makes in the AMD vs NVIDIA performance competition will come down to the game itself, what the rendering engine is doing, how optimized the engine is in the first place, and what further optimizations are done for DX12 and NVIDIA, but I'm all for anything that gives PC gaming a speed boost.
 
It is capable of it and its fast at it too, its all dependent on how the code is written.

From what I've seen so far it can do some async compute but takes a hit when trying to do it in tandem with other graphics work, have you found something different? Is that all in the coding or a limitation of nvidias implementation?

If the former, let's say the oxide guys retool their code for nvidia gpus to give optimal performance when using their specific hardware for asynch compute + graphics, then have them do the same for amd hardware.

AFTER those optimizations, what do you expect will perform better? Which is the better design for the types of mixed graphics/compute workloads we are expected to see? That is the question I want answered definitively. They may still want to tweak things for that 80% of the market that has nvidia discreet gpu shipments, but they ought to design their game to take advantage of the best practices and techniques for getting the job done, and that might not be the current nvidia implementation. If so, then nvidia needs to alter their designs going forward.
 
I'm thinking its on both sides, both Oxide and nV, but without more information to what Oxide stated with their specific issue, its us looking through a window at what is going on, in this case Oxide gave us a peep hole with concrete wall on the otherside. There seems to be limitations when going past the 31 +1 compute and graphics on nV's Maxwell 2 cards, where the Queue isn't being refilled as its being used.
 
From what I've seen so far it can do some async compute but takes a hit when trying to do it in tandem with other graphics work, have you found something different? Is that all in the coding or a limitation of nvidias implementation?

If the former, let's say the oxide guys retool their code for nvidia gpus to give optimal performance when using their specific hardware for asynch compute + graphics, then have them do the same for amd hardware.

AFTER those optimizations, what do you expect will perform better? Which is the better design for the types of mixed graphics/compute workloads we are expected to see? That is the question I want answered definitively. They may still want to tweak things for that 80% of the market that has nvidia discreet gpu shipments, but they ought to design their game to take advantage of the best practices and techniques for getting the job done, and that might not be the current nvidia implementation. If so, then nvidia needs to alter their designs going forward.
Jesus, give it up already.

Nvidya can do async only with software (which turns out to run pretty shit). Maxwell physically lacks an asynchronous compute capable controller and they can't do anything about it.
 
Jesus, give it up already.

Nvidya can do async only with software (which turns out to run pretty shit). Maxwell physically lacks an asynchronous compute capable controller and they can't do anything about it.


If the cpu is involved in helping the driver shuffle the async code (breaking the code down to instructions analyzing the work load and then sending it back to the GPU to processes in a certain order) the latency would go through the roof, its not easy to hide latency from CPU to a discrete GPU, because of the nature itself will introduce latency into a pipeline like that.
 
So what is AMD's excuse for being slow in everything else compared to nVidia...
 
So what is AMD's excuse for being slow in everything else compared to nVidia...
Don't you get it, they were playing the long game. They've spent the last 3+ years building towards DX12! Go read Hallock's posts. They're geniuses.
 
Still to see a screenshot comparing DX11 vs DX12 render and the feature I miss by using the DX11 code path in AOTS bench.


That tells me that you have no idea what Dx12 is about.

It isn't about new effects, it is about a more efficient way to handle graphics, efficient on the CPU side mainly, and the Asynchronous point is one of the easiest ways to increase efficiency on the GPU side too.

The effects that it would allow are actually "Better Performance with the Same Hardware", say High Lightning on the same performance that dx11 would only allow you to use Medium, that kind of stuff.
 
If the cpu is involved in helping the driver shuffle the async code (breaking the code down to instructions analyzing the work load and then sending it back to the GPU to processes in a certain order) the latency would go through the roof, its not easy to hide latency from CPU to a discrete GPU, because of the nature itself will introduce latency into a pipeline like that.

I missed a couple days worth of news, but... So taking the CPU out of the equation cause it's just too slow to communicate with it...

I'm just looking at what information we have seen so far...


Whatever redirecting nvidia MAY be doing GPU side, only works with a queue size of 31 or less per their hardware setup limit. IF this information cannot be out of order then some shuffling is also happening within this (small) queue and the main queue. As soon as there is too much information for the (small) queue (which would act as a fast buffer in this case) nothing can be shuffled anymore and everything is processed sequencially until there is again space in the queue.

On top of this the GPU may have to constantly be switching Context depending on the type of activity being requested. (Which may be fine, except if the queue is constantly being exceeded on different contexts)

Hmm...
 
So why did nVidia decide not to support Async compute?

Only they know, but DX11 works sequencially so it makes sense, and they maybe were too far into their design cycle when they realized they were stuck. Also I'd guess it may contribute to their power savings on idle states. Might even be why we see those crazy power fluctuations when the gpu is active.
 
So why did nVidia decide not to support Async compute?
Most hardware designs are planned out years in advance so async compute was probably not believed to be a critical feature for the near future, which if you look at the market place today appears to be technically correct.
 
SO I dont feel like going through 13 pages. Can someone give me cliff notes of how this affects 980ti users?
 
It doesn't affect you at all currently but might make AMD cards a bit more competitive against NVIDIA cards in DX12 games eventually.
 
I don't think it's likely that developers are going to skew PC ports for NVIDIA just based on market share. The entire point of DX12 is a lower level API and from all indications, the console code is much closer to DX12 than DX11. So presumably, since the optimizations are already done for ACE in the console titles (as both consoles are using GCN-powered APUs) it would be very little work - comparatively - to keep those optimizations in place. I think this is what AMD has been betting on.

Obviously, how much of a difference this makes in the AMD vs NVIDIA performance competition will come down to the game itself, what the rendering engine is doing, how optimized the engine is in the first place, and what further optimizations are done for DX12 and NVIDIA, but I'm all for anything that gives PC gaming a speed boost.

I've noticed that people with multiple machines with varying brands of hardware like us, are more even-headed and logical in these threads.
 
They've been boasting "Full DX12 Support" including Async for a while.
They will attempt to use lawyer-speak to escape this situation. Probably explains why they remain silent. I can't think of anything they could possibly say that will help them in this situation unless they post some magical benchmarks that prove Oxide and everyone else wrong...
 
SO I dont feel like going through 13 pages. Can someone give me cliff notes of how this affects 980ti users?

In DX12 the R9-290x is about equal to the GTX980-Ti if Asynchronous Shaders are used.

https://www.youtube.com/watch?v=v3dUhep0rBs

That video explains how it works.

Last week I was looking at getting the Gigabyte GTX980-Ti G1 Gaming to complete my main machine, and move the R9-290x to my Lan PC. With this latest news I'm now considering the Sapphire R9-Fury. The issue is that for most of the games I play right now, and look forward to play (Arkham Knight) will play better on the GTX980-Ti, and they come with MGSV which I was going to buy anyway and helps justify the $100+ price difference between the Fury and GTX980-Ti.

The problem is, if the best one could hope for in DX12 performance from the GTX980-Ti would be at best equal to the card I already have, then for the long haul it might be a nice upgrade today, but not one for the long haul.

With the Fury I get somewhat faster performance than my current card overclocked to R9-390x clocks, but when DX12 is more fully utilized, the Fury will still hang around for the long haul and start pulling away from the 390x, and considering the costly investment for either of these cards, the longer it can stay in my system the better.

This all hinges the widespread use of Async. Shaders. I was ready to jump this week on the Gigabyte GTX980-Ti, but now I think I'll wait till prices drop. If I see the Fury drop to $500 or $450 I'll go that route, if the GTX980-Ti dips to $550+MGSV I may go that route.

I don't think I'm alone in this quandary.
 
It is too early to draw those kind of comparisons as they will pertain in a Game to Game scenario, as not every dx12 load will look the same (heck, i would say that it would be definitely more varied than dx11 loads).

There is a "possibility" for the 290x to increase it's performance that much, but that is all in theory at the moment.

Honestly, stay with your card in peace, no matter if Nvidia or AMD, wait until a new dx12 game that interests you is announced or published, and take your decision after seeing actual performance.
 
It is too early to draw those kind of comparisons as they will pertain in a Game to Game scenario, as not every dx12 load will look the same (heck, i would say that it would be definitely more varied than dx11 loads).

There is a "possibility" for the 290x to increase it's performance that much, but that is all in theory at the moment.

Honestly, stay with your card in peace, no matter if Nvidia or AMD, wait until a new dx12 game that interests you is announced or published, and take your decision after seeing actual performance.

its not in theory, people have been talking about possible increases in AMD performance from w10 and then dx12 since the spring.

We are now seeing the manifestations of those increases with this benchmark.
 
Yeah if there was an AMD card I got my eye on, it'd be the Sapphire Fury (vanilla). Moreover, my hope is that kind of cooler 'overdesign' continues in Arctic Islands. Really a by-product of HBM reducing PCB size requirements haha. Moreover, for a top-level, flagship card to handle 3D loads as quietly as it does? Yes please. Secondly, in terms of multi-monitor, it seems AMD finally addressed the issue of high power consumption (14W Fury vs 50+ on 290/290x) https://www.techpowerup.com/reviews/Sapphire/R9_Fury_Tri-X_OC/28.html

E.g. The 390 Powercolor had an even more absurd 74W for multi-monitor, so it does appear this is only a 290/390 era card problem. And yeah this matters as the older GTX580 had some issues with downclocking, especially a 120Hz panel as my main.

Last time I used an AMD card, I was a regular visitor of Rage3D. Curious how many users here remember that site.
 
Nv working on a gameworks unreal engine dx12 benchmark? Amd performance tanks? You heard it here first.

Note: I have no source, link, etc. to back this up. I am just speculating.
 
Well, the more this story unfolds, the more what AMD is saying seems to be getting validated :

The results showed that Maxwell cards processed up to 31 command queues without any latency increase, the time taken increasing further with every 31 additional queues. Many users over at Beyond3D and elsewhere - including this guy - took that as a strong indication that NVIDIA cards were, in fact, capable of executing asynchronous graphics/compute.

But AMD see the results as proof that they're not. Here's the full statement from Tungler:

"The results produced by the benchmark do, in fact, illustrate that Maxwell is not capable of asynchronously executing graphics and compute. If you look at the Maxwell async compute results, you will see that the bar heights are the result of adding graphics and compute together. This indicates that the workloads are being done serially, not asynchronously. Compare that to the AMD results, where the async compute results show graphics and compute being processed simultaneously with no noticeable rise in overall frame latency.

"If Maxwell supported asynchronous compute, their results would look like the GCN results. Remember that asynchronous compute isn’t whether or not a GPU can do compute and graphics across a long workload, it’s whether or not the GPU can perform these workloads simultaneously without affecting the frame latency. MDolenc’s benchmark clearly shows that only GCN can do this."


http://www.pcgamesn.com/amd-respond-to-nvidia-dx12-async-controversy-maxwell-is-not-capable-of-asynchronously-executing-graphics-and-compute
 
We already knew this last week.
Seems all the dinosaurs (tech news sites) are only just catching up haha.
 
I missed a couple days worth of news, but... So taking the CPU out of the equation cause it's just too slow to communicate with it...

I'm just looking at what information we have seen so far...


Whatever redirecting nvidia MAY be doing GPU side, only works with a queue size of 31 or less per their hardware setup limit. IF this information cannot be out of order then some shuffling is also happening within this (small) queue and the main queue. As soon as there is too much information for the (small) queue (which would act as a fast buffer in this case) nothing can be shuffled anymore and everything is processed sequencially until there is again space in the queue.

On top of this the GPU may have to constantly be switching Context depending on the type of activity being requested. (Which may be fine, except if the queue is constantly being exceeded on different contexts)

Hmm...

yeah that is a possibility. To many variables to really see what is going on and the only way to cut down the variables is to get more data from different sources.
 
It's fascinating to see so many statements from AMD personnel regarding their competitor's products. That's actually kind of rare for the industry.
 
its not in theory, people have been talking about possible increases in AMD performance from w10 and then dx12 since the spring.

We are now seeing the manifestations of those increases with this benchmark.

To the point of a 980 TI is very much in theory at the moment, remember that Ashes is pre-beta so we can't count that chicken until it hatches.
 
Back
Top