Vega GPU announced with some details, HPC so far

Anarchist4000 · Dec 15, 2016

razor1 said:
I don't think anyone has talked about the things you have been speculating on yet. I haven't heard anything about those things direct from AMD. Please link, All of these things, no one at B3D has been talking about either and many of them are in the game industry, but they don't talk about it? Why is that? Cause yeah, not going to happen.

Strange, they formed an entire foundation on the topic with a handful of major tech companies. Have alluded to the concept in just about every press briefing I've seen recently, although not revealing implementation details like core counts and exact configuration. Even their CEO has mentioned the concept and there are ample slides about HSA and ROCm. I think all the major tech news sites covered it so feel free to find your own links if you missed it. Considering they've been pushing the merits of pairing CPUs (scalar cores) with GPUs (SIMD cores), filed a handful of patents related to the implementations, published papers on the concept, it seems reasonable that an engineer would have actually examined the implications of the design on GCN. Especially if it's a feature they seem to feel has significant performance advantages. There has been plenty of talk at B3D about the papers and concepts. Just wondering if you realized what everyone was actually talking about? I suppose some could be under various NDAs, but the flexible scalar and HSA/ROCm concept is widely distributed and published.

https://www.extremetech.com/extreme...esident-phil-rogers-leaves-company-for-nvidia
Couldn't find a direct link to that slide in the background. But "Task Parallel Runtimes" and "Nested Data Parallel programs" in the 2012-2020 timeframe seen to fit into the design window of Vega. Yes I realize a lot of that is about using a CPU with a GPU, but surely someone considered the benfit of doing the same thing with the scalar and SIMD within the GPU. The current implementation already supports a number of the features, just not quite in parallel.

razor1 said:
Yes you are predicting, and no I haven't seen anything accurate about them yet. Your Fiji predictions and async predictions alike. You are just guessing blind with a dart board.

Like the HWS you say don't exist or aren't intelligent? Async making zero difference, yet there are ample benchmarks showing it helps both IHVs, excluding Maxwell? Games won't use async, yet obviously we have a lot of reviews showing it and it's a topic of discussion?

The only one that is still up in the air that comes to mind is the high-end Polaris design. There's mention of Polaris 12 and 10XT2, but an actual high end Polaris on par with what I was speculating at doesn't currently exist. It might, but we haven't seen it yet. That segment doesn't actually have a product. So what am I guessing at and missing here? Considering the capabilities already in GCN, the general implementation isn't much of a stretch.

razor1 said:
FP16 has not supported in any API (well sm 6.0 yeah, but no engine supports that yet) so what do you think the time frame will be once Vega comes out and if its has full speed FP16 support? Magic 2 or 3 or 4 years? And Dev's will support it when AMD has 25% or lower marketshare? Use your head man. Speculating is all fine if you start using business tactics that would influence the need for those changes.

So you're saying PS4 Pro doesn't actually exist? Or at least have an updated dev kit? I thought Nintendo Switch and Tegra had the FP16 support with the mobile sector? I'd expect Scorpio to have some launch titles, surely those won't be 2, 3, or 4 years out.

AMD has a pretty substantial marketshare between PC and console. The majority of gaming in fact. If that Intel deal pans out I'd expect the number to go up substantially. Seeing AMD near 85-90% in a couple years wouldn't be surprising. Who knows if that's significant enough for developers to target?

razor1 · Dec 15, 2016

Anarchist4000 said:
Strange, they formed an entire foundation on the topic with a handful of major tech companies. Have alluded to the concept in just about every press briefing I've seen recently, although not revealing implementation details like core counts and exact configuration. Even their CEO has mentioned the concept and there are ample slides about HSA and ROCm. I think all the major tech news sites covered it so feel free to find your own links if you missed it. Considering they've been pushing the merits of pairing CPUs (scalar cores) with GPUs (SIMD cores), filed a handful of patents related to the implementations, published papers on the concept, it seems reasonable that an engineer would have actually examined the implications of the design on GCN. Especially if it's a feature they seem to feel has significant performance advantages. There has been plenty of talk at B3D about the papers and concepts. Just wondering if you realized what everyone was actually talking about? I suppose some could be under various NDAs, but the flexible scalar and HSA/ROCm concept is widely distributed and published.

The talks at B3D were theoretical talks because you started posting about it lol. yes I read those posts, and none of them have anything concrete outside of it might be there, but no one knows lol. Something you started off. Great to start off a rumor with hypothetical right?

HSA HASN'T Taken off yet! How many years has it been and still nothing?

https://www.extremetech.com/extreme...esident-phil-rogers-leaves-company-for-nvidia
Couldn't find a direct link to that slide in the background. But "Task Parallel Runtimes" and "Nested Data Parallel programs" in the 2012-2020 timeframe seen to fit into the design window of Vega. Yes I realize a lot of that is about using a CPU with a GPU, but surely someone considered the benfit of doing the same thing with the scalar and SIMD within the GPU. The current implementation already supports a number of the features, just not quite in parallel.

And you are hypothosizing based on an article and a person that pretty much states HSA is bad in the near term? What did I just state HSA HASN'T taken off yet?

Like the HWS you say don't exist or aren't intelligent? Async making zero difference, yet there are ample benchmarks showing it helps both IHVs, excluding Maxwell? Games won't use async, yet obviously we have a lot of reviews showing it and it's a topic of discussion?

I didn't say HWS didn't exist, I stated they aren't intelligent, because they are not, the programmer has to tell them what to do for them to work, they need explicit programming for them to work.

Async makes no difference in the landscape of the GPU world, where is your theory of 30% and greater performance increases? Shit lucky to find 10% at this point, and its been a year.

What me to link all that?

The only one that is still up in the air that comes to mind is the high-end Polaris design. There's mention of Polaris 12 and 10XT2, but an actual high end Polaris on par with what I was speculating at doesn't currently exist. It might, but we haven't seen it yet. That segment doesn't actually have a product. So what am I guessing at and missing here? Considering the capabilities already in GCN, the general implementation isn't much of a stretch.

So you're saying PS4 Pro doesn't actually exist? Or at least have an updated dev kit? I thought Nintendo Switch and Tegra had the FP16 support with the mobile sector? I'd expect Scorpio to have some launch titles, surely those won't be 2, 3, or 4 years out.

AMD has a pretty substantial marketshare between PC and console. The majority of gaming in fact. If that Intel deal pans out I'd expect the number to go up substantially. Seeing AMD near 85-90% in a couple years wouldn't be surprising. Who knows if that's significant enough for developers to target?

Do you know Sony is shipping their Dev Kits? Do you know how programming is different on Sony vs, other consoles vs PC? Yeah look those up.

Do you know the ramifications of having two separate development lines and more for game developers? Think of what they will choose and why they will do it, business side is very important here.

Anarchist4000 · Dec 15, 2016

razor1 said:
Do you know Sony is shipping their Dev Kits? Do you know how programming is different on Sony vs, other consoles vs PC? Yeah look those up.

I didn't realize floating point math differed by platform.

razor1 said:
Do you know the ramifications of having two separate development lines and more for game developers? Think of what they will choose and why they will do it, business side is very important here.

Probably focus on consoles as that's where most of their revenue resides.

razor1 said:
I didn't say HWS didn't exist, I stated they aren't intelligent, because they are not, the programmer has to tell them what to do for them to work, they need explicit programming for them to work.

Async makes no difference in the landscape of the GPU world, where is your theory of 30% and greater performance increases? Shit lucky to find 10% at this point, and its been a year.

What me to link all that?

Not a whole lot of games featuring DX12/async an excessively high tessellation rates. Might be able to force 64x tess on Forza with a Fiji and get it.

razor1 · Dec 15, 2016

Anarchist4000 said:
I didn't realize floating point math differed by platform.

We are specifically talking about FP16 shaders aren't we? What consoles have that right now (at decent enough performance)?

Probably focus on consoles as that's where most of their revenue resides.

Err development costs dude not end results, end results will be the same cause they can still develop without F16 shaders that will still work on PS4 pro.

Not a whole lot of games featuring DX12/async an excessively high tessellation rates. Might be able to force 64x tess on Forza with a Fiji and get it.

yeah have you tried tessellation on shader units? That would dump performance down to a crawl. There was a reason why the hull shader was made for DX11 as a fixed function unit vs DX10 where there was no hull shader and the geometry shader (which is the shader array) was doing tessellation.

Ieldra · Dec 15, 2016

Anarchist4000 said:
Not a whole lot of games featuring DX12/async an excessively high tessellation rates. Might be able to force 64x tess on Forza with a Fiji and get it.

Somewhat ironic that the only reasonable situation in which you can extract those mythical 30% (wasn't it something like 45% in the slides?) is when the shaders are bottlenecked to a crawl.

I sincerely hope someone makes a Don't Starve style survival game with simulated ecosystem and dynamic relationships between various entities on different size scales (units, tribes, species etc). Then they could genuinely say they've made interesting use of async compute. Plus it would be really cool.

razor1 · Dec 15, 2016

Ieldra said:
Somewhat ironic that the only reasonable situation in which you can extract those mythical 30% (wasn't it something like 45% in the slides?) is when the shaders are bottlenecked to a crawl.

I sincerely hope someone makes a Don't Starve style survival game with simulated ecosystem and dynamic relationships between various entities on different size scales (units, tribes, species etc). Then they could genuinely say they've made interesting use of async compute. Plus it would be really cool.

If shaders are bottlenecked then they are going being underutilized then yeah that is when you can get a higher increase from async but that would just be bad programming to begin with, you should never underutilized your shaders and then get the performance back by doing more work.

JustReason · Dec 16, 2016

razor1 said:
The talks at B3D were theoretical talks because you started posting about it lol. yes I read those posts, and none of them have anything concrete outside of it might be there, but no one knows lol. Something you started off. Great to start off a rumor with hypothetical right?

HSA HASN'T Taken off yet! How many years has it been and still nothing?

And you are hypothosizing based on an article and a person that pretty much states HSA is bad in the near term? What did I just state HSA HASN'T taken off yet?

I didn't say HWS didn't exist, I stated they aren't intelligent, because they are not, the programmer has to tell them what to do for them to work, they need explicit programming for them to work.

Async makes no difference in the landscape of the GPU world, where is your theory of 30% and greater performance increases? Shit lucky to find 10% at this point, and its been a year.

What me to link all that?

Do you know Sony is shipping their Dev Kits? Do you know how programming is different on Sony vs, other consoles vs PC? Yeah look those up.

Do you know the ramifications of having two separate development lines and more for game developers? Think of what they will choose and why they will do it, business side is very important here.

The problem here between you two and your arguments is perspective. In all the time we have debated, what I gather is that you know Nvidia quite well but not so much AMD. You keep trying to apply Nvidia technique to AMD which is not ever really gonna work well. Reminds me of when you and Ieldra kept referring to CUDA techniques when speaking of async as proof Maxwell could do it, when in reality it can not, and Cuda rarely ever translates into games. Also In all cases where something is viable and possible, debating the end outcome is not proof that it isn't, just that the effort isn't there. Intel has paid to keep software in their court. Nvidia made Gameworks to set their hardware to expected utilization. So using the "no one has used it" approach doesn't really speak to the fundamentals of possibilities.

Actually if you look at recent news articles in spans of months over the last few years, you can clearly see a shift in the overview of AMD. We have seen far more positives about AMD and its "ideas" lately than usual. I wouldn't be too surprised if we actually see more AMD utilization in software as it seems some of the stranglehold Intel and Nvidia has had on the industry is beginning to wane.

razor1 · Dec 16, 2016

JustReason said:
The problem here between you two and your arguments is perspective. In all the time we have debated, what I gather is that you know Nvidia quite well but not so much AMD. You keep trying to apply Nvidia technique to AMD which is not ever really gonna work well. Reminds me of when you and Ieldra kept referring to CUDA techniques when speaking of async as proof Maxwell could do it, when in reality it can not, and Cuda rarely ever translates into games. Also In all cases where something is viable and possible, debating the end outcome is not proof that it isn't, just that the effort isn't there. Intel has paid to keep software in their court. Nvidia made Gameworks to set their hardware to expected utilization. So using the "no one has used it" approach doesn't really speak to the fundamentals of possibilities.

Actually if you look at recent news articles in spans of months over the last few years, you can clearly see a shift in the overview of AMD. We have seen far more positives about AMD and its "ideas" lately than usual. I wouldn't be too surprised if we actually see more AMD utilization in software as it seems some of the stranglehold Intel and Nvidia has had on the industry is beginning to wane.

Dude, might want to check your facts again, the reason why HSA hasn't taken off, is because Cuda implementation of features and extensions are greater for certain libraries. Which have huge performance implications like 2 fold implications. And this is something that HSA hasn't implemented in over years for what ever reasons, could be hardware I don't know. So come again? You need to read your shit up man.

AMD hasn't really been pushing HSA the past year or so, not only that, they haven't even talked about in conference calls, because its kinda stalled temporarily.

This is the problem, you and few others think they know what you guys are talking about, but the fact is, you don't know shit about what you post. And anything remotely technical, has nothing to do with AMD and nV, but, you guys take angles on, which have no fuckin merit. Its like the other tread, math doesn't lie yet you guys argued for 3 pages about it then started calling people names, even after I stated do the damn math to figure things out. Lazy people never learn man, and from the way you post I can see you never learn cause this is the same damn conversation I had with you close to a year ago when HSA and Boltzmann were first announced. So go home read a damn book, and come back when you learned something useful for the conversation.

Did you read up about Bloztmann initiative how it fails on 75% of optimized CUDA code? Yeah, kinda breaks down with software already done and up and running, which would have been the best bet for that to take off.

Its all ass backwards guys, what did I say about automated code translation when they announced it last year? Its freakin hard to do? It never translates well with optimized code? Why did I say this, cause I have experience in automated code translation? yeah I remember you self proclaimed expert of HPC and HSA because you read articles in the past, read something more then press releases......do the freakin programming or talk to a person that has done it......it will help your posts immensely.

Yes I will take your word for something Just Reason, when you can show me you have experience in doing something related to what you post, outside of that, nope, you are just a tech lover that is in way over his head in these types of conversations.

Ieldra · Dec 16, 2016

JustReason said:
You keep trying to apply Nvidia technique to AMD which is not ever really gonna work well. Reminds me of when you and Ieldra kept referring to CUDA techniques when speaking of async as proof Maxwell could do it, when in reality it can not, and Cuda rarely ever translates into games.

.

I had forgotten completely... Weren't you supposed to correct my warped (mis)understanding of async compute? Why didn't you reply in the async thread?

Like you say, I 'apply Nvidia techniques 5o AMD', so would be nice for someone with a real understanding of the topic to have an input in the thread.

JustReason · Dec 16, 2016

razor1 said:
Dude, might want to check your facts again, the reason why HSA hasn't taken off, is because Cuda implementation of features and extensions are greater for certain libraries. Which have huge performance implications like 2 fold implications. And this is something that HSA hasn't implemented in over years for what ever reasons, could be hardware I don't know. So come again? You need to read your shit up man.

AMD hasn't really been pushing HSA the past year or so, not only that, they haven't even talked about in conference calls, because its kinda stalled temporarily.

This is the problem, you and few others think they know what you guys are talking about, but the fact is, you don't know shit about what you post. And anything remotely technical, has nothing to do with AMD and nV, but, you guys take angles on, which have no fuckin merit. Its like the other tread, math doesn't lie yet you guys argued for 3 pages about it then started calling people names, even after I stated do the damn math to figure things out. Lazy people never learn man, and from the way you post I can see you never learn cause this is the same damn conversation I had with you close to a year ago when HSA and Boltzmann were first announced. So go home read a damn book, and come back when you learned something useful for the conversation.

Did you read up about Bloztmann initiative how it fails on 75% of optimized CUDA code? Yeah, kinda breaks down with software already done and up and running, which would have been the best bet for that to take off.

Its all ass backwards guys, what did I say about automated code translation when they announced it last year? Its freakin hard to do? It never translates well with optimized code? Why did I say this, cause I have experience in automated code translation? yeah I remember you self proclaimed expert of HPC and HSA because you read articles in the past, read something more then press releases......do the freakin programming or talk to a person that has done it......it will help your posts immensely.

Yes I will take your word for something Just Reason, when you can show me you have experience in doing something related to what you post, outside of that, nope, you are just a tech lover that is in way over his head in these types of conversations.

first you really need to relax. Second, bringing up other stories that have no relevance to this discussion, have no relevance. You missed the point again like you do every time.

point is you seem to have an intimate knowledge of Nvidia and I have not doubted it, but your knowledge in AMD is no where near as comfortable. You speak to coding which seems relevant to Nvidia and based entirely on it as well, but fall far short of making true claims to AMD and how it could be utilized. Anarchist is talking of AMD and possibilities based on AMD architecture and IP not what is being done now with current software/coding. You two are debating different perspectives that have no real direct comparison.

You are taking way too much of these debates personally. I am not questioning your knowledge with Nvidia coding and architectural analysis, although you do seem to mix and cross professional and consumer coding techniques. Just because it is possible doesn't mean it can or will be done. That applies to Anarchist as well but he seems to point that out at least. But your constant accusations and labeling of myself and others as being stupid/ignorant/morons just because you can't see what we see, whether it is because you refuse to acknowledge or just honestly can't grasp it because of your in depth practices this far, is crossing the line and respectfully ask you refrain from such further.

razor1 · Dec 16, 2016

JustReason said:
first you really need to relax. Second, bringing up other stories that have no relevance to this discussion, have no relevance. You missed the point again like you do every time.

point is you seem to have an intimate knowledge of Nvidia and I have not doubted it, but your knowledge in AMD is no where near as comfortable. You speak to coding which seems relevant to Nvidia and based entirely on it as well, but fall far short of making true claims to AMD and how it could be utilized. Anarchist is talking of AMD and possibilities based on AMD architecture and IP not what is being done now with current software/coding. You two are debating different perspectives that have no real direct comparison.

Point is I didn't miss anything, yeah I told him exactly why it won't happen, the example he used were talked about and for them to work they need to unroll the shader in real time and that requires a huge amount of register space and cache which WILL NOT be aviable in GPU's in the short term, this data cannot be taken off die no matter what. This is what you fail to understand, before you start saying I don't know AMD hardware, it just doesn't work right now because the GPU's right now don't have enough die space allocated for such a thing and they won't in the near future because well, there are more pressing matters that GPU's have to tackle first, like more performance for graphics related tasks.

You are taking way too much of these debates personally. I am not questioning your knowledge with Nvidia coding and architectural analysis, although you do seem to mix and cross professional and consumer coding techniques. Just because it is possible doesn't mean it can or will be done. That applies to Anarchist as well but he seems to point that out at least. But your constant accusations and labeling of myself and others as being stupid/ignorant/morons just because you can't see what we see, whether it is because you refuse to acknowledge or just honestly can't grasp it because of your in depth practices this far, is crossing the line and respectfully ask you refrain from such further.

This has nothing to do with one IHV or another, it has everything to do with why GPU's have evolved in a certain way and why things don't change just because it looks like it can.

Simple FACT flexibility with code and architecture to provide a flexible programming (coding) environment always takes up transistors. There is no way around that. And when the cost of implementing such a feature in hardware (transistors) becomes valid from a performance and cost perspective it will be done. Simple, we haven't reached that point yet, that is why we still have fixed function units that do things.

Now I don't know how long you have been following the adoption of GPU's and how they have evolved, but just go read some things about why programmable shaders became the norm, why AMD switched over to Scalar architectures, and when they did their die size blew up. Simple because prior to that, when using VLIW (which nV used too before G80) took up less die space, less complex from an architectural point of view. For AMD it was a better way to go because they can sustain their primary goal, Graphics performance.

Of course you didn't read what was written over at B3D but these were the same things stated to him over there.

You can try to use this "I might know one architecture and one programming model over another" but that isn't the case, I know both and I know both well, I might be more comfortable with one over another (when it comes to optimization) but that doesn't mean I don't know the features and complexities of the other.

Ieldra · Dec 16, 2016

JustReason said:
first you really need to relax. Second, bringing up other stories that have no relevance to this discussion, have no relevance. You missed the point again like you do every time.

point is you seem to have an intimate knowledge of Nvidia and I have not doubted it, but your knowledge in AMD is no where near as comfortable. You speak to coding which seems relevant to Nvidia and based entirely on it as well, but fall far short of making true claims to AMD and how it could be utilized. Anarchist is talking of AMD and possibilities based on AMD architecture and IP not what is being done now with current software/coding. You two are debating different perspectives that have no real direct comparison.

You are taking way too much of these debates personally. I am not questioning your knowledge with Nvidia coding and architectural analysis, although you do seem to mix and cross professional and consumer coding techniques. Just because it is possible doesn't mean it can or will be done. That applies to Anarchist as well but he seems to point that out at least. But your constant accusations and labeling of myself and others as being stupid/ignorant/morons just because you can't see what we see, whether it is because you refuse to acknowledge or just honestly can't grasp it because of your in depth practices this far, is crossing the line and respectfully ask you refrain from such further.

Cool, still waiting for your input here on this thread though

https://hardforum.com/threads/demystifying-asynchronous-compute-v1-0.1909504/page-3#post-1042515129

Anarchist4000 · Dec 16, 2016

Ieldra said:
Somewhat ironic that the only reasonable situation in which you can extract those mythical 30% (wasn't it something like 45% in the slides?) is when the shaders are bottlenecked to a crawl.

Situations where there is enough other work to actually overcome or shift a bottleneck?

razor1 said:
If shaders are bottlenecked then they are going being underutilized then yeah that is when you can get a higher increase from async but that would just be bad programming to begin with, you should never underutilized your shaders and then get the performance back by doing more work

Will be interesting to see how much of a geometry bottleneck these games with 100% compute loads generate. At least in Sebbbi's case he stated their game is entirely compute based.

razor1 said:
AMD hasn't really been pushing HSA the past year or so, not only that, they haven't even talked about in conference calls, because its kinda stalled temporarily.

They just pushed it at these past couple events. It was nearly all they talked about. Only difference is they re-branded HSA to ROCm now.

razor1 said:
Point is I didn't miss anything, yeah I told him exactly why it won't happen, the example he used were talked about and for them to work they need to unroll the shader in real time and that requires a huge amount of register space and cache which WILL NOT be aviable in GPU's in the short term, this data cannot be taken off die no matter what. This is what you fail to understand, before you start saying I don't know AMD hardware, it just doesn't work right now because the GPU's right now don't have enough die space allocated for such a thing and they won't in the near future because well, there are more pressing matters that GPU's have to tackle first, like more performance for graphics related tasks.

You've missed the point in almost every argument you've entered. You missed the async benefits, the HWS existing, point of using HBM, high speed interconnects, AMD losing more marketshare, GPU scheduling, and the list goes on. It's hardly worth keeping track of anymore.

Where exactly did anyone say it will take a huge amount of register space and cache? It will take more, but no worse than what would be incurred by adding more SIMDs. Maybe an added 1-4KB in cache per CU. The Fiji scalar had 16KB per CU. So a Fiji style design would require 64-256KB of added cache to pull it off. Less cache if the existing scalar was replaced or reduced. Big difference is the scalar now using vector registers for most of the data. Unrolling a vector in hardware is extremely simple and DSPs do it all the time. Copy buffer, loop through it. The current GCN scalar does the same thing. The real question is if the design benefits more from 1 or 4 scalars. The rest of the model already exists along with most of what I proposed.

razor1 said:
Of course you didn't read what was written over at B3D but these were the same things stated to him over there.

Guess I completely missed where that happened. The programming types, as well as the related benchmarks, all seemed to corroborate it. Few guys that misunderstood what was being discussed. I'm guessing this is yet another thing you failed to understand? Some sort of confirmation basis maybe? I've seen no reason to believe AMD won't do what they've been saying this entire time. If they add FP support to the current scalar they're a quarter of the way to what I proposed.

razor1 said:
We are specifically talking about FP16 shaders aren't we? What consoles have that right now (at decent enough performance)?

PS4 Pro? At least if the guy in charge of designing it told the truth.

Ieldra · Dec 16, 2016

Anarchist4000 said:
Situations where there is enough other work to actually overcome or shift a bottleneck?

I mean, sure, you can spin it that way but there is also the possibility of lifting the bottleneck threshold and not having to overcome a bottleneck at all.

razor1 · Dec 16, 2016

Anarchist4000 said:
Situations where there is enough other work to actually overcome or shift a bottleneck?

If designed right and programmed right, you should be trying to avoid bottlenecks as much as possible, I don't even see anything remotely useful for getting so bogged down by a bottleneck for doing something like that.

Will be interesting to see how much of a geometry bottleneck these games with 100% compute loads generate. At least in Sebbbi's case he stated their game is entirely compute based.

How many games will be coming out like that? Got to understand most programmers, game dev's aren't thinking across those lines yet, it takes time to go that far. And for Sebbbi his is a console programmer, and yeah they can't push geometry, look where the limiting factor is for AMD's current consoles.........

They just pushed it at these past couple events. It was nearly all they talked about. Only difference is they re-branded HSA to ROCm now.

Can you list me the features they have added? Can you tell me what has changed from their programming models? I can, no changes lol. Thats why I have stated they aren't pushing forward, its stalled.

You've missed the point in almost every argument you've entered. You missed the async benefits, the HWS existing, point of using HBM, high speed interconnects, AMD losing more marketshare, GPU scheduling, and the list goes on. It's hardly worth keeping track of anymore.

I didn't miss the point, please go back and link what you think I stated, I can tell you , I did not state HWS did not exsist, already told you want i staetd about them, point of using HBM, what I stated, we haven't seen any real world benefit from it yet, which link something that shows what I stated was wrong, don't even know why you throw high speed interconnects there, cause what I stated, was nV is ahead with their tech, which link me to something that shows where I was wrong, AMD losing marketshare, they did loose marketshare last quarter, so where do you go from there? GPU scheduling, you don't know or remember what you were saying about that, because you attached that to async performance, which those two things don't go along with each other, nor does hardware for that, so, come again? yes the list does go on, Cause I CAN AND NOW WILL LINK all of the crap you posted later today ok? Fair enough. Don't try to make shit up here Anarchist, I don't play around with people that make shit up. YOU know that I caught you making shit up before too remember?

Where exactly did anyone say it will take a huge amount of register space and cache? It will take more, but no worse than what would be incurred by adding more SIMDs. Maybe an added 1-4KB in cache per CU. The Fiji scalar had 16KB per CU. So a Fiji style design would require 64-256KB of added cache to pull it off. Less cache if the existing scalar was replaced or reduced. Big difference is the scalar now using vector registers for most of the data. Unrolling a vector in hardware is extremely simple and DSPs do it all the time. Copy buffer, loop through it. The current GCN scalar does the same thing. The real question is if the design benefits more from 1 or 4 scalars. The rest of the model already exists along with most of what I proposed.

You don't remember EXT3d, go look up your posts and his response and mine too. Hypothesizing is great, reality is total different, future I can see some of the things you are saying can happen, but as a whole its a fart in the wind.

Guess I completely missed where that happened. The programming types, as well as the related benchmarks, all seemed to corroborate it. Few guys that misunderstood what was being discussed. I'm guessing this is yet another thing you failed to understand? Some sort of confirmation basis maybe? I've seen no reason to believe AMD won't do what they've been saying this entire time. If they add FP support to the current scalar they're a quarter of the way to what I proposed.

You seem to miss many things man, sorry but I don't.

PS4 Pro? At least if the guy in charge of designing it told the truth.

So what current consoles have that? Ya do you remember maintaining backward compatibility with the two next gen consoles was kinda of a top priority with them so if you look at the performance deltas for FP16 and 32 on them, the benefits are meaningless.
So for current consoles and and PS4 pro what would the best way to develop a game on all of them at the same time?

Yeah. Business what would the business needs be to start game dev with FP16 shaders now when the PS4 pro came out when and how are they going to make the same game on all consoles? And what is the FP16 performance of the PS4 vs its FP32 performance, and when using mixed precision what will the ramifications be on that?

Ieldra · Dec 16, 2016

100% compute loads ? Am I misinterpreting or do you actually mean 100% compute loads? Are they writing compute shaders for the assembly and rasterizer stages ?

razor1 · Dec 16, 2016

Ieldra said:
100% compute loads ? Am I misinterpreting or do you actually mean 100% compute loads? Are they writing compute shaders for the assembly and rasterizer stages ?

It can be done, but I don't think its going to happen, Sebbbi's comments were more hypothetical, not yes we are going to do it for our next game. Anyways I just directly pmed Sebbbi, so lets see what he says.

I can't see that happening, I've been working with around 40% compute shaders right now, and maybe go up to 60% in the next 2 years, until Volta and Navi come out?

Ieldra · Dec 16, 2016

razor1 said:
It can be done, but I don't think its going to happen, Sebbbi's comments were more hypothetical, not yes we are going to do it for our next game. Anyways I just directly pmed Sebbbi, so lets see what he says.

I can't see that happening, I've been working with around 40% compute shaders right now, and maybe go up to 60% in the next 2 years, until Volta and Navi come out?

I'm sure it can be done, just wondering whether or not it's worth the effort. From a purely academic point of view it's super cool

razor1 · Dec 16, 2016

It really isn't worth being done, because the fidelity and performance standpoint vs cost for certain features, its not worth using compute shaders. And that is the reason why the fixed function parts are still there in GPU's, once those hurdles are overcome, GPU's won't have fixed function units anymore.

We will find out soon enough what Sebbbi really meant.

In the mean time this was Sebbbi's last qoute on the matter of FP16/FP32 and compute usage % of over all programming tasks

sebbbi said:
Agreed. Shifts in ALU:TMU:BW allow/force developers to change their algorithm/data design. However in modern games and even in modern GPGPU, FP32 ALU performance isn't the biggest limiting factor. New faster GPUs keep roughly the same ALU:TMU:BW balance as old ones. People are used to thinking that 2x more flops = roughly 2x faster. This mostly holds as everything else is also increased by roughly 2x. But if you increase only the flops (and not bandwidth and the count of CUs / SMs or their other functionality like samplers) the result is nowhere close to 2x faster in general case.

But neural networks are a bit of special case. Some algorithms in this field do benefit from additional ALU performance. But only time will tell how much 4xUINT helps over 2xUINT/2xFP16. All I am saying that don't blindly look at the marketing 8/16 bit ALU flops.

Now how the hell did you, Anarchist, take that, to mean what you stated?

What he is saying is game developers are going to change the % of usage over time. It doesn't happen all at once.

So now if Sebbbi clarifies his response to my PM in the same manner, are you going to sit here and make shit up still on what you thought what he ment? Or should I start posting all of the crap you just made up what you think I stated in the past, and then we will get a full picture of the drivel you have been saying?

Well just stop making shit up and no one will ever bother you, I don't care if you are speculating, they are great discussions and I enjoy them, but everything and the kitchen sink doesn't happen over one generation.

razor1 · Dec 16, 2016

Ok got the response from him, the game is 100% compute, but he also stated the tech isn't ready for AAA games (not generic enough). So does that give you, Anarchist, a better understanding, its not going to happen in any AAA games in the near feature (1-4) years.

Anarchist4000 · Dec 16, 2016

razor1 said:
So what current consoles have that? Ya do you remember maintaining backward compatibility with the two next gen consoles was kinda of a top priority with them so if you look at the performance deltas for FP16 and 32 on them, the benefits are meaningless.
So for current consoles and and PS4 pro what would the best way to develop a game on all of them at the same time?

PS4 Pro, just like the last couple times I've answered this question. Backwards compatibility just means it can run on the older PS4, which doesn't apply for VR titles or cases where separate paths were written for each. They just need to maintain acceptable, and ideally identical performance between versions. I doubt PS4 is taking much advantage of the ID buffer and checkerboard rendering either.

razor1 said:
You don't remember EXT3d, go look up your posts and his response and mine too. Hypothesizing is great, reality is total different, future I can see some of the things you are saying can happen, but as a whole its a fart in the wind.

He checked with an AMD employee and confirmed what I said after some debate. Certain undisclosed metrics are tracked and HWS signals ACEs to dispatch according to some undisclosed logic. At the rate AMD is going, I'm expecting a neural network for work dispatch on Vega at this point. Zen uses it for prefetching, might as well do Vega.

razor1 said:
Can you list me the features they have added? Can you tell me what has changed from their programming models? I can, no changes lol. Thats why I have stated they aren't pushing forward, its stalled.

http://www.anandtech.com/show/10831/amd-sc16-rocm-13-released-boltzmann-realized
Just the standard non-experimental clang/llvm support, OpenCL, FP16, virtualization, math libraries, etc. The CUDA translation is a small ongoing part of a much larger project. Likely a bunch of undisclosed features related to Zen/Vega that anandtech didn't include. Going off some posts by AMD engineers the whole Vega backend and Zen APU integration work is floating around. Obviously a lot of that is still under NDA(especially when the article was written) until the products release.

razor1 said:
How many games will be coming out like that? Got to understand most programmers, game dev's aren't thinking across those lines yet, it takes time to go that far. And for Sebbbi his is a console programmer, and yeah they can't push geometry, look where the limiting factor is for AMD's current consoles.........

He seems to be programming for much more than just consoles. One of his tweets was about making a benchmark to profile load/store performance on AMD, Intel, and Nvidia hardware. Switch aside, two of those aren't heavily vested in consoles. The big use for 100% compute would probably be raytracing. There could be some games using that, but more likely professional tools and simulations. Given appropriate upcoming hardware software rasterization could be possible. It could also become far more programmable and run directly on the CUs. One big hitch is that it's somewhat serial and requires a fair amount of interpolation that doesn't map well to SIMDs. FPGAs, scalars, and some interpolators showing up might make it far more practical.

razor1 said:
If designed right and programmed right, you should be trying to avoid bottlenecks as much as possible, I don't even see anything remotely useful for getting so bogged down by a bottleneck for doing something like that.

Sure, but some hardware features are cheap. Even in one aspect isn't a bottleneck optimizing it shouldn't hurt.

Ieldra said:
I'm sure it can be done, just wondering whether or not it's worth the effort. From a purely academic point of view it's super cool

Like I mentioned above, raytracing would be an obvious application of it. We've seen a few professional renderings use it, but not quite games.

razor1 said:
Now how the hell did you, Anarchist, take that, to mean what you stated?

I didn't, I took a direct quote(can't recall where) from him that the project he was working on was 100% compute based.

razor1 said:
Ok got the response from him, the game is 100% compute, but he also stated the tech isn't ready for AAA games (not generic enough). So does that give you, Anarchist, a better understanding, its not going to happen in any AAA games in the near feature (1-4) years.

So you're saying I'm right again?

That's only one example, I'm sure there are more. We've seen renderings use it, probably a handful of games that are bordering on tech demos. AAA titles not so much, but there could be some interesting ones out there. Not sure the exact timeframe on his game, but I could see some heavily compute biased work showing up. Any dev that goes crazy with multi-engine for physics, audio, pathfinding, etc will likely push that compute percentage up significantly.

razor1 · Dec 16, 2016

Anarchist4000 said:
PS4 Pro, just like the last couple times I've answered this question. Backwards compatibility just means it can run on the older PS4, which doesn't apply for VR titles or cases where separate paths were written for each. They just need to maintain acceptable, and ideally identical performance between versions. I doubt PS4 is taking much advantage of the ID buffer and checkerboard rendering either.

Oh god you aren't understanding a single thing about how different hardware paths work and what it takes to implant them and test them and deploy them, just ask any of the DX12 game developers that have deployed games recently, you don't even need to ask, just look at the shit storm that has happened.

He checked with an AMD employee and confirmed what I said after some debate. Certain undisclosed metrics are tracked and HWS signals ACEs to dispatch according to some undisclosed logic. At the rate AMD is going, I'm expecting a neural network for work dispatch on Vega at this point. Zen uses it for prefetching, might as well do Vega.

Yeah that is normal I'm not arguing they aren't going to be doing AI on their GPU's, I'm saying unrolling a shader in the chip is not going to be done till at least Navi.

Zen guess what prefetching is on Zen, it ain't AI, I think you should read up on branch predictions and perceptrons, cause that ain't a true neural net lol. Please take this stuff out of you head, and don't try to parallelize prefetching with this, cause AMD is using "neural net" as a buzz word. You seem fond about AMD marketing and their buzz words, just like Async compute.

http://www.anandtech.com/show/10831/amd-sc16-rocm-13-released-boltzmann-realized
Just the standard non-experimental clang/llvm support, OpenCL, FP16, virtualization, math libraries, etc. The CUDA translation is a small ongoing part of a much larger project. Likely a bunch of undisclosed features related to Zen/Vega that anandtech didn't include. Going off some posts by AMD engineers the whole Vega backend and Zen APU integration work is floating around. Obviously a lot of that is still under NDA(especially when the article was written) until the products release.

That is all fine and good, I'm not arguing that things aren't going on behind the scenes t o get their new hardware working with what they have already, but the feature that make them behind aren't being taken care off

And this goes to what about people actually using the stuff man, where is the money the incentive for them to switch over from what they have been doing in the past, don't take the business out of it cause the business needs drive the industry forward, once someone is already there.

He seems to be programming for much more than just consoles. One of his tweets was about making a benchmark to profile load/store performance on AMD, Intel, and Nvidia hardware. Switch aside, two of those aren't heavily vested in consoles. The big use for 100% compute would probably be raytracing. There could be some games using that, but more likely professional tools and simulations. Given appropriate upcoming hardware software rasterization could be possible. It could also become far more programmable and run directly on the CUs. One big hitch is that it's somewhat serial and requires a fair amount of interpolation that doesn't map well to SIMDs. FPGAs, scalars, and some interpolators showing up might make it far more practical.

I'm not going to get into it but I'm talking to him so tell ya what I know more then you about it, but not going to say it here, cause its a pm.

Sure, but some hardware features are cheap. Even in one aspect isn't a bottleneck optimizing it shouldn't hurt.

If you say so when you don't know shit about game programming.... Stop diminishing the job of good programmers with this nonsense.

Like I mentioned above, raytracing would be an obvious application of it. We've seen a few professional renderings use it, but not quite games.

About the only thing I can agree with you in your entire post, and in games though, yes it does help. Again you don't know that because you don't know what games are doing what behind the scenes.

I didn't, I took a direct quote(can't recall where) from him that the project he was working on was 100% compute based.

And I just found out more details on it. Not only that I'm still taking to him about it, in depth. And you just took it for face value and took it out of context for everyone, mind you why did he say can't be used for AAA games, then you get why I stated what I did. I don't muddle around things when I already know the answer, but to answer your round about mediocrity of understanding what someone says and generalizing it for everyone, I asked him.

So you're saying I'm right again?

You took his statements out of context, you are incorrect, it can't be used for a AAA game right now, pretty much it can't be used for studios outside of his because its not multiplatform (its not even that, its from the sound of it, it can't be used outside of specific consoles right now? still finding more about it but probably, he can't talk to much more about it, NDA), And that comes down to what I was saying before, the business needs of doing something on multiple platforms with different paths, and the time it takes to do it, it will not happen as quickly as you think. How the hell can you twist something around that is point blank, he stated it won't be used anytime soon for AAA games, its not usable in its current form, once it becomes usable then it takes 3 to 5 years for a game to be made on it or with it!

That's only one example, I'm sure there are more. We've seen renderings use it, probably a handful of games that are bordering on tech demos. AAA titles not so much, but there could be some interesting ones out there. Not sure the exact timeframe on his game, but I could see some heavily compute biased work showing up. Any dev that goes crazy with multi-engine for physics, audio, pathfinding, etc will likely push that compute percentage up significantly.

Yes you don't know, yet you expect something like this to happen around Vega in abundance, that is just BS.

pendragon1 · Dec 16, 2016

how many threads need to turn into multi-quote-cluster-fucks? you guys should take your pissing contest private...

razor1 · Dec 16, 2016

pendragon1 said:
how many threads need to turn into multi-quote-cluster-fucks? you guys should take your pissing contest private...

This isn't a pissing match, he has made way too much stuff up (well hypothesized) to be anything remotely realistic in the near term with Vega and what AMD is doing and where the game industry is going.

Now if there is a problem with that. this is still on topic because the title of this tread is Vega, HPC......

pendragon1 · Dec 16, 2016

razor1 said:
This isn't a pissing match, he has made way too much stuff up (well hypothesized) to be anything remotely realistic in the near term with Vega and what AMD is doing and where the game industry is going.

Now if there is a problem with that. this is still on topic because the title of this tread is Vega, HPC......

ok but we all don't need to or want to see the page long mutli-quote bs. every little thing gets turned into this for no reason. "youre wrong. heres a bit of proof. now stfu." is all that's really needed. it they keep going physically ignore it. yeah I know trust me I'm trying....

razor1 · Dec 16, 2016

its hard to ignore a person that is obvious smart and capable of understanding things but has no clue on why things don't work in the real world with hypotheticals when it comes to business lol. But I understand what you are saying.

CommanderFrank · Dec 17, 2016

Address the issue at hand, not each other. Personal attacks on members are not tolerated and will result in administrative actions.

Vega GPU announced with some details, HPC so far

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

I Promise to RTFM

[H]F Junkie

razor1 is my Lover

[H]F Junkie

I Promise to RTFM

razor1 is my Lover

[H]F Junkie

I Promise to RTFM

[H]ard|Gawd

I Promise to RTFM

[H]F Junkie

I Promise to RTFM

[H]F Junkie

I Promise to RTFM

[H]F Junkie

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

Extremely [H]

[H]F Junkie

Extremely [H]

[H]F Junkie

Cat Can't Scratch It