ATI Physics

Cypher19 · Apr 22, 2006

rincewind said:
Forgive me if I'm being an idiot, but doesn't the current PhysX have to feed data back to the CPU over the PCI bus? 2Tb/s internal bandwidth is all well and good, but the PCI bus only offers ~132Mb/s (some of which is already eaten up). PCI-E x16 gives a more respectable 6.4Gb/s, and the bus itself is capable of double that (PCI-E mobos normally have 32 lanes afaik) Am I missing something, or does PhysX have a major achilles heel right there? The data can be moved internally at lightspeed, but the most it can afford to transmit is a meagre 4 megs per frame (to maintain 30FPS).

While I'm not familiar with the PhysX API, you'd only need to send back a few floating point numbers per object in most cases: 3 for position, 3 for orientation. For boxes tumbling about, you'd have to have about 175,000 boxes to saturate the PCI bandwidth at that rate. There simply isn't a lot of data that needs to be sent back to the CPU in most cases. Of course, PCIe would be less limiting, no?

Emulation?
That won't cut it.
For gameplay physics to operate, collision data have to be feedback to the CPU.
The problem is that once data enters the pixel pipelines, it's out of reach of the CPU.
You can't "emulate" a new GPU-stucture that is bi-directional...
The pipelines are a one way street...

No they're not. The pipelines generate data which is sent to a location in the GPU's memory. From there, it's not that hard to copy data from the GPU, over the AGP bus, and into system memory to get the results you need. Yes, it stalls a bit when you try to do something like that on D3D, but ATi is developing their physics API such that they're able to talk and interact VERY directly with the GPU, and so the stalling that would normally occur can be easily avoided.

Terra · Apr 22, 2006

Trimlock said:
i'm not trying to pretend to know what ATi or Nvidia has in store, but they wouldn't exactly put in development time into it if it didn't produce

Ever heard of markeeting bullshit before?

*L*

Terra...

Terra · Apr 22, 2006

Majin said:
OWNED!
Nice counter points Terra.

But I have to take this a different direction
you're SOOOO WRONG
P4 > A64
Xenos > Cell
X1800 > X1900XTX
nVidia AF > ATI AF
Green > Blue
Horse > Dog
69 > 68
At > The

I think you know your roll!

Majin - Shaking my head at Terra for 6+ Months!

*LOL*

Terra - Nice humor

Terra · Apr 22, 2006

BBA said:
Think about what you said. Then explain how a PCI connected card is going to have a higher bandwidth back to the CPU than a PCI-E connected card is. The video card has a great advantage here as well, now that you bring it up.

No it dosn't:

One of the biggest characteristics of the PPU is its memory architecture. It has 128bit memory interface for external memory, but has no internal cache memory.

We dont have cache memory hierarchy of any kind. This is very important because traditional cache is not suitable for physics," says Hedge.

PPU has no structure such as CPU cache that is synchronized with external memory by the set-associative method and updates automatically. Its because in physics simulation it has little data locality. They say memory cache hierarchy is more trouble than its worth.

In CPU and GPU, data has locality. But in physics not, as it has to do random access to many objects. Data structures are totally different says Nadeem Mohammad, who moved from a GPU vendor to AGEIA.

Still PPU has large internal memory in itself. It has various internal memories instead of cache, and has the organization that does explicit and programmable transfer between internal and external memories.

The patent explains memories such as dual-bank Inter-Engine Memory (IEM) connected to VPU, multi-purpose Scratch Pad Memory (SPM), DME Instruction Memory (DIM) which does instruction queuing, and so on. Hedge suggested that those memories in the patent are in the actual implementation by saying they are probably included in PPU.

Among those memories IEM is used in the way that looks like traditional data cache. According to the patent, DME loads a data set required for operation of processing units into IEM explicitly. Unlike cache memory, low-latency access is possible in IEM and apparently it could implement a large number of I/O ports. As the result, it could achieve huge internal memory bandwidth.

One of the important factors in a physics architecture is it requires huge on-chip memory bandwidth. Our PPU has 2Tb(Tera-bit)/sec on-chip memory bandwidth," says Hedge.

In short, removing complicated cache control made it possible that PPU has L2-cache size internal memory with L1-cache latency and huge bandwidth, and its suitable for physics algorithm according to them.

BTW...i found the "source" of your claims:
http://www.firingsquad.com/hardware/ati_physics/page2.asp

As for performance of the PPU vs. the GPU, the issue is one of efficiency and processing power. We think that even if Ageias PPU has 100% processing efficiency and ATIs GPU has 80% efficiency, if the PPU only has 100 Gflops of processing power, and we have 375 in our GPU, well still have a higher performing solution (those numbers are just examples, not proven stats, but I think you can understand how were looking at this).

Your numbers aren't even facts...come on

The 'internal to the GPU/PPU' memory bandwidth means nothing,

That's a LIE, read my first quote!

it's the processing that counts as well as getting the data back to the CPU and video card.

Thats why a reprogrammed X1800 is able to process video compression/rendering in seconds compared to the P4 doing it in almost an hour. (You can find that article link here if you look). All it takes is reprogramming the shaders and ATi is already making the drivers to do it.

Comparing a CPU to a GPU is just a stupid as comparing a GPU to PPU...
Let's recap:

Your numbers are NOT facts...
GFlops mean nothing in terms of acutal prefomance.
The PPU is buildt with an internal bandwith of 2 TB/sex..to keep track of those darn physics objects, that don't follow a fixed path...

Terra - So far...you GPU-PR isn't doing good..

Terra · Apr 22, 2006

BBA said:
The thing you are not recognizing is the GPU does not emulate...it reprograms teh alrogithyms and does it in hardware.

By your logic, a GPU emulates video rendering as well...

Chew on that a while.

No, they are adding more POST-PROCESSING effects in the GPU...

Terra - Chew on that on...

Anarchist4000 · Apr 22, 2006

Moving data off the video card is a relatively simple process. Right now it's as simple as taking a screenshot. If the drivers were updated to expect the data to frequently being sent back to system memory any delays could likely be avoided. Information could then be sent just as fast as it is received.

As far as comparing GFLOPS the numbers could be kind of close assuming the ATI number is only counting pixel processing power. With the 3 ALUs on the 1900's they could probably tear through a fair amount of data.

The key thing with physics is doing everything in parallel which is how video cards prefer to do things already. Collision is where things can get tricky because a logic comes into play and you start comparing to a bunch of stuff. Implementing a true more realistic system could be kind of rough.

Using a R2VB style approach most basic acceleration, velocity, position based calculations could be done on the GPU. Then the required data dumped off for the CPU to process. Then for the next frame the CPU would feed back data based on what collisions occured. Doing comparisions on a CPU would be much easier than running FP based math on it non stop.

Keep in mind for most games you don't need 100% accurate results. You just need close approximations so even a rough result would be beneficial. The key to good collision has always been intelligently narrowing down what needs tested against.

BBA · Apr 22, 2006

Terra said:
No it dosn't:

BTW...i found the "source" of your claims:

So...you finally decided to read what I posted above.

Comparing a CPU to a GPU is just a stupid as comparing a GPU to PPU...
Let's recap:

Your numbers are NOT facts...
GFlops mean nothing in terms of acutal prefomance.
The PPU is buildt with an internal bandwith of 2 TB/sex..to keep track of those darn physics objects, that don't follow a fixed path...

And you tell me just what is the GPU internal bandwidth? All you seem to quote is the video cards memory bandwidth, and considering the PPU has low speed memory, I just don't think your comparing apples to apples here. The internal PPU bandwith could be lower than the GPU internal bandwidth...but since those numbers havent been posted, I still say your going on BS marketing crap.

Terra · Apr 22, 2006

BBA said:
So...you finally decided to read what I posted above.

I didn't need to read anything I ahdn'r read before to know you where waaay of.
But let me hear again:
Why are you using numbers that are not facts as "arguments"?

And you tell me just what is the GPU internal bandwidth? All you seem to quote is the video cards memory bandwidth, and considering the PPU has low speed memory, I just don't think your comparing apples to apples here. The internal PPU bandwith could be lower than the GPU internal bandwidth...

And then reality sets in:
http://www.pcper.com/article.php?aid=225&type=expert

"So how does AGEIAs first generation PPU (physics processing unit) address these requirements and keep the future open for even more physic processing performance? First, AGEIA has massive amounts of internal bandwidth on the chip. They claim to have nearly two terabits per second (2 Tbits/s) of internal memory bandwidth to work with, many times more than even the fastest processors or GPUs available today. This addresses the needs of our first component of a physic processing system, scale, to a T. Detecting and resolving the collisions of a large number of moving rigid bodies requires this kind of bandwidth to implement the geometric math necessary for the calculations."

but since those numbers havent been posted, I still say your going on BS marketing crap.

Pot, Kettle, Black...
(I will refrain from calling you a liar, you could have been misinformed)

I am not the one been feeding a lot of bogus information.
I am not the one who tried same approch in the PPU forum...just to desert my own thread when facts contradicted the claims...

Terra - Infact I am STILL wondering what this post does in the ATI subforum..I will now report the post as misplaced...no more "hiding" for you...

Terra · Apr 22, 2006

Cypher19 said:
No they're not. The pipelines generate data which is sent to a location in the GPU's memory. From there, it's not that hard to copy data from the GPU, over the AGP bus, and into system memory to get the results you need. Yes, it stalls a bit when you try to do something like that on D3D, but ATi is developing their physics API such that they're able to talk and interact VERY directly with the GPU, and so the stalling that would normally occur can be easily avoided.

http://www.pcper.com/article.php?aid=225&type=expert.

The lack of a real write-back method on the GPU is also going to hurt it in the world of physics processing for sure. Since pixel shaders are read-only devices, they can not write back results that would change the state of other objects in the "world", a necessary feature for a solid physics engine on all four counts.

This tidbit is interesting too:

Another interesting issue that AGEIA brought up is that since the Havok FX API, and any API that attempts to run physics code on a GPU, has to map their own code to a Direct3D API using Shader Models then as shader models change, code will be affected. This means that the Havok FX engine will be affected very dramatically every time Microsoft makes changes to D3D and NVIDIA and ATI makes changes in their hardware for D3D changes (ala DX10 for Vista). This might create an unstable development platform for designers that they may wish to avoid and stick with a static API like the one AGEIA has on their PhysX PPU.

And let me go on the record and say that I have NO affiliations with AGEIA...what so ever...
Other than I will be buying a PPU when it's released.

Terra - As I might suspect that "argument" will surface soon....

gerbiaNem · Apr 22, 2006

Lets not forget that depending one how well the API is programmed can make a huge difference to how each method will perform.

I agree with Terra that the gpu will not be a true physics processor, but a sort of after-effect generator. It may have a cool truck explosion, but lets see it handle the player first shooting a crate which collapses, then shooting the pieces multiple times while it's still falling apart.

Cypher19 · Apr 22, 2006

Oh, my, yes. Listen to a hardware enthusiast, and not a graphics programmer.

Here's a summary of how that pixel pipeline data can be obtained:

Step 1) Create default pool render target surface. Using CreateRenderTarget: http://msdn.microsoft.com/library/en-us/directx9_c/IDirect3DDevice9__CreateRenderTarget.asp
(aside: a default pool surface refers to an allocated piece of memory on the video card itself)
Step 2) Create systemmem pool render target surface using CreateRenderTarget: http://msdn.microsoft.com/library/en-us/directx9_c/IDirect3DDevice9__CreateRenderTarget.asp
(aside pt. 2: a systemmem pool surface is a piece of memory that exists in the main system memory which the CPU can easily access)
Step 3) Create pixel shader. Your choice, you can probably use the ID3DXEffect classes or something, or just hte CreatePixelShader function.
Step 4) Apply pixel shader and constants. SetPixelShaderConstant, blah blah blah...
Step 5) Render to default pool render target. Draw*Primitive: http://msdn.microsoft.com/library/en-us/directx9_c/IDirect3DDevice9__DrawPrimitive.asp
Step 6) Call GetRenderTargetData with the default and systemmem RT's as parameters. http://msdn.microsoft.com/library/en-us/directx9_c/IDirect3DDevice9__GetRenderTarGetData.asp
Step 7) Lock the systemmem surface. http://msdn.microsoft.com/library/en-us/directx9_c/IDirect3DSurface9__LockRect.asp
Step 8) Get the data from the returned array.

This tidbit is wrong too:

Another interesting issue that AGEIA brought up is that since the Havok FX API, and any API that attempts to run physics code on a GPU, has to map their own code to a Direct3D API using Shader Models then as shader models change, code will be affected. This means that the Havok FX engine will be affected very dramatically every time Microsoft makes changes to D3D and NVIDIA and ATI makes changes in their hardware for D3D changes (ala DX10 for Vista). This might create an unstable development platform for designers that they may wish to avoid and stick with a static API like the one AGEIA has on their PhysX PPU.

Graphics cards maintain a certain degree of backwards compatibility. I can write an SM2.0 shader, and have it compiled as an SM2.0 shader, and run it on a GeForce 6 or 7 series card (i.e. SM3) just fine. The same thing will occur with the Havok FX stuff, and it is under ZERO danger of being "unstable" due to updates to Direct3D.

Now, any questions?

Xipher · Apr 22, 2006

Havok and partners have stated (I remeber hearing it in the videos, coming from the rep), that they don't intend for them to be able to effect gameplay. It may be possible, but its probably a waste of resources that are better suited for other tasks.

Josh_B · Apr 22, 2006

BBA said:
Exactly my point.

..looks like Terra has been owned once again.

Assuming there is no caching of meshes or anything.

I can't imagine they would've suggested using the PCI bus unless they knew it wouldn't be a bottleneck. The PCI-E spec has been out for awhile, and if it were an issue, they would've simply waited for the right mobos to become available.

I don't know... that's my $0.02

Terra · Apr 22, 2006

Cypher19 said:
Oh, my, yes. Listen to a hardware enthusiast, and not a graphics programmer.

Here's a summary of how that pixel pipeline data can be obtained:

Step 1) Create default pool render target surface. Using CreateRenderTarget: http://msdn.microsoft.com/library/en-us/directx9_c/IDirect3DDevice9__CreateRenderTarget.asp
(aside: a default pool surface refers to an allocated piece of memory on the video card itself)
Step 2) Create systemmem pool render target surface using CreateRenderTarget: http://msdn.microsoft.com/library/en-us/directx9_c/IDirect3DDevice9__CreateRenderTarget.asp
(aside pt. 2: a systemmem pool surface is a piece of memory that exists in the main system memory which the CPU can easily access)
Step 3) Create pixel shader. Your choice, you can probably use the ID3DXEffect classes or something, or just hte CreatePixelShader function.
Step 4) Apply pixel shader and constants. SetPixelShaderConstant, blah blah blah...
Step 5) Render to default pool render target. Draw*Primitive: http://msdn.microsoft.com/library/en-us/directx9_c/IDirect3DDevice9__DrawPrimitive.asp
Step 6) Call GetRenderTargetData with the default and systemmem RT's as parameters. http://msdn.microsoft.com/library/en-us/directx9_c/IDirect3DDevice9__GetRenderTarGetData.asp
Step 7) Lock the systemmem surface. http://msdn.microsoft.com/library/en-us/directx9_c/IDirect3DSurface9__LockRect.asp
Step 8) Get the data from the returned array.

I am no programmer either, but I do se a LOAD of overhead in what you just described...
But would you call Gabe Newell a programmer?

Its like you can have a bunch of different things bouncing around so long as they dont actually touch anything that matters. If you dont actually have to read the data out if your AI system ever needed to know about whether or not one of those objects had collided with something else it would run slower by running on the GPU than having it run on the main CPU. So physics that matter is different than physics that makes pretty pictures.

Graphics cards maintain a certain degree of backwards compatibility. I can write an SM2.0 shader, and have it compiled as an SM2.0 shader, and run it on a GeForce 6 or 7 series card (i.e. SM3) just fine. The same thing will occur with the Havok FX stuff, and it is under ZERO danger of being "unstable" due to updates to Direct3D.

Now, any questions?

What does "certain degree of backwards compatibility" mean?

Either it is or it isn't?

Terra...

Cypher19 · Apr 22, 2006

I am no programmer either, but I do se a LOAD of overhead in what you just described...
But would you call Gabe Newell a programmer?

There's a fair bit of overhead involved with it, yes. But I'd it can be primarily attributed to the D3D relationship with the graphics card. Because ATi's physics system will likely interact fairly directly with the drivers and hardware, it can circumvent the D3D API and get results that get a lot closer to theoretical numbers in more than a few areas, including AGP transfer rates and FLOPS. If you use the D3D API, it will undoubtedly be slow, though, and I believe that's what Gabe is addressing, and also how I think Havok FX works. But mind you, I'm not entirely sure how ATi's physics will work either, it may be pretty stuff there as well (e.g. using render to vertex buffer and such). I was merely saying and demonstrating that it IS possible to get data back from the GPU. Practicality is a seperate issue, though.

What does "certain degree of backwards compatibility" mean?

Well, a 7800GTX doesn't have specialized support for things like SM1.x, SM2.0, SM2.x and SM3.0. Most of the time, the driver will bring 'old' shaders up to the right compiler target and get it working.

MrNasty · Apr 22, 2006

Cypher19 - wouldn't you need to be running some sort of interpreter to allow the rendertarget thingy data to be seen as physics data?

Also how much memory overhead would that take? Storing physics data in that form would be pretty memory intensive, and storing data in that form for an entire level (as would be required for interactive physics) would surely be difficult.

I know for a fact Ageia's PPU doesn't need much storage at all because it simply computes vectors and tensors and stores the results - and while the transients are pretty large they are dealt with very quickly and so never need to be called to and from memory.

There's no dynamic results handling on a GPU AFAIK, so any intermediate results that couldn't be resolved in a one shader cycle (fluid simulations are one situation I can think of that would need a few passes) would need to be stored externally to the GPU - not good performance-wise, surely?

Another issue - how efficient can ATi's physics model be? Pixel shaders can't talk to each other as they operate, so results from one calculation will have no result on another, unless said calculations are performed repeatedly in light of each others' results - this adds a HUGE amount of memory usage into the balance does it not?

I'm curious, because I know very little about hardware calls and API's, but I know a lot about fluid mechanics and what you need to be able to accurately model continuums etc, in fact I've seen several attempts at what you might call 'hardcore' fluids modelling on GPUs, but these attempts tend to perform either erratically due to lack of communicative parrallelism (the effect I described before) or slowly because of the necessity of storing transient data - hence my questions above.

Cypher19 · Apr 22, 2006

MrNasty said:
Cypher19 - wouldn't you need to be running some sort of interpreter to allow the rendertarget thingy data to be seen as physics data?

Not really, the returned data would be basically an array of floats that you can use any way you want on the CPU.

Also how much memory overhead would that take? Storing physics data in that form would be pretty memory intensive, and storing data in that form for an entire level (as would be required for interactive physics) would surely be difficult.

It would be, but I couldn't say how great the memory usage would be. It probably wouldn't be that different than any other physics system, but I haven't made a physics system myself, so I couldn't say.

There's no dynamic results handling on a GPU AFAIK, so any intermediate results that couldn't be resolved in a one shader cycle (fluid simulations are one situation I can think of that would need a few passes) would need to be stored externally to the GPU - not good performance-wise, surely?

Nah, there's lot of graphical effects that need multi-pass approaches. One could ping-pong between render targets so that you render some data to one RT, and use that data for another pass to a seperate RT.

Another issue - how efficient can ATi's physics model be? Pixel shaders can't talk to each other as they operate, so results from one calculation will have no result on another, unless said calculations are performed repeatedly in light of each others' results - this adds a HUGE amount of memory usage into the balance does it not?

Getting physics simulation on a pixelshader isn't trivial, and it's not something I can comment on, really. I'm not sure exactly how to calculate physics quickly, let alone how to do it on a pixel shader. I just dived into the thread because Terra was being a bit of a loon.

Terra · Apr 22, 2006

Cypher19 said:
I just dived into the thread because Terra was being a bit of a loon.

Me?
Are you saying that I am the one 100% wrong in this thread?
That all others are 100% correct...by using fake numbers, GFlops compared across platform/technology, claims about a 3 to1 Physcis prerformance in favour of the GPU over PPU, while totaaly ignoring the architechtual differences like internal bandwith ect.?

Terra - Puuuuuurllleeease....

Trimlock · Apr 22, 2006

Ever heard of markeeting bullshit before? *L*

marketing bull shit doesn't mean it doesn't work, when you market a product for being able to do something, it should be able to do it, how fast and easy it does it where the bull shit plays into, if they market something that doesn't work or wasn't added in i believe you could say thats false advertisment, so i think both companies have found a way to give us real time physic's aiding at a speed they are happy with, something they are puting money into marketing for

if it is marketing bullshit and ends up being less able then our CPU is at processing physic's and they pumped it up this much, that would be somewhat of a dissapointment, agree?

Terra · Apr 22, 2006

Trimlock said:
marketing bull shit doesn't mean it doesn't work, when you market a product for being able to do something, it should be able to do it, how fast and easy it does it where the bull shit plays into, if they market something that doesn't work or wasn't added in i believe you could say thats false advertisment, so i think both companies have found a way to give us real time physic's aiding at a speed they are happy with, something they are puting money into marketing for

if it is marketing bullshit and ends up being less able then our CPU is at processing physic's and they pumped it up this much, that would be somewhat of a dissapointment, agree?

I just watched ATI's video of "physics"
I find it funny that in the end of the video, the mandelbrot end up EXTREMELY pixilated, while the CPU still looks okay
And their R2VB demo gives 22 FPS at 10.000 obejcts, 15 FPS at 20.000 objetcs, 10 FPS at 22.000 objects, 8 FPS at 30.000 objects.

I belive AGEIA has something like 32.000 objects listed as within the specs.
(The ability of the PhysX PPU)
I could be mistaken though...

But as someone else pointed out, the biggest "headache" for NVIDA/ATI is that hte PPU is PCI.
Care to wager how many emtpty PCI slot are out there compared to PCI-E?

It means that current users kan buy a PPU at ~$200-250, no need for a mobo upgrade, as the PPU prolongs the lifespan of a gaming PC.
Hence they talk about "physics" on their produts too, but don't talk to much about that it's effect-physics, not gameplay physics...hence I cal it "marketing BS"

Steal the "spotlight" from AGEIA via PR, not actual preformance.
And I still have to se NVIDA/ATI do anything remotely like eg. the new Cellfactor video.

Terra - Until they step up to plate, that will be my point of view on their solutuions...

MrNasty · Apr 22, 2006

Interesting stuff Cypher

Just to show how different PPU architecture is compared to GPU architecture here are brief rundowns on each:

PPU

And the best examples of an x1900's architecture

X1900 Overview
X1900 Quad shader unit

Right, now let's have a look at the differences:

1. Raw vector power (the most important aspect of modelling physics, speaking as a fluids engineering student having already done a computational fluid dynamics course - wasn't fun, CFX 10.0 users are you with me?

)

PPU: 16 units capable of 6 floating point executions per clock
16 result dedicated register per unit(all at 32 bit precision)
Total vector ops per clock: 96

GPU: 48 units capable of 1 floating point execution per clock
general memory ring register (precision unknown - ATI quotes 128 bit precision)
Total vector ops per clock: 48 max
Assuming you aren't using them for anything else, like actual pixel shaders for example.

2. Integer-wise

PPU: 16 units capable of 1 integer operation per clock (in addition to the vector calcs)
8 result dedicated register per unit (in addition to the 16 vector ops)
Total Integer ops: 16

GPU: 48 units capable of 1 integer operation per clock- exclusive of vector ops
general memory ring register
Total Integer ops: 48 max, if no vector ops are being done

3. Other differences

PPU: Inter-addressing bus, fully inter-parallel 4-way associativity, no equivalent instruction pipelines (as units can address each other), 1 Dedicated instruction controller per unit, 1 indirect dispatch processor.

GPU: One-way addressing bus, no inter-parallel associativity, 48 instruction pipelines, 1 excecution branch processor per unit (can only control branch instructions), 1 direct dispatch processor (this is where the equivalent functions of the PPU's instruction controller occur on a GPU).

Makes for interesting reading, and while, yes, a GPU's shader unit and a PPU are very similar, they are each much more suited to their task than the other...

For those of you who like quoting Gflops/s bear in mind that ATI/NV quote these with the calculation potential rest of the GPU included (i.e. the bits that do jack for physics processing), hence any comparison with Ageia's PPU is already thrown out the window.

ATI's solution is going to be slow compared to Ageia's, by raw numbers and not even mentioning the architectural execution differences. This analysis also seems to cast doubt on the "but I can run physics AND graphics on 1 GPU" claims made by both ATI and Nvidia - would you like to see your graphics performance cut in half and get 25% of the performance of Ageia's solution (clock for clock)?

All that remains to be seen is how much games use the PPU and whether ATI's (and nvid's) solution will be adequate for the amount of physics utilised.

It's not about raw power.

It's about utilisation and interactivity - and those remain to be seen from either camp at this time.

You can all clap now

Terra · Apr 22, 2006

*claps hands*
But insted of nifty words, I found this "funny" video of a game using PhysX.
Don't know what the game is callled or what the hell is going on...but it looks cool

http://www.gametrailers.com/player.php?id=8913&pl=game&type=wmv

Terra - *Homer mode* Physics...ummmh.....

BBA · Apr 22, 2006

MrNasty said:
Interesting stuff Cypher

Just to show how different PPU architecture is compared to GPU architecture here are brief rundowns on each:

PPU

And the best examples of an x1900's architecture

X1900 Overview
X1900 Quad shader unit

Right, now let's have a look at the differences:

1. Raw vector power (the most important aspect of modelling physics, speaking as a fluids engineering student having already done a computational fluid dynamics course - wasn't fun, CFX 10.0 users are you with me? )

PPU: 16 units capable of 6 floating point executions per clock
16 result dedicated register per unit(all at 32 bit precision)
Total vector ops per clock: 96

GPU: 48 units capable of 1 floating point execution per clock
general memory ring register (precision unknown - ATI quotes 128 bit precision)
Total vector ops per clock: 48 max
Assuming you aren't using them for anything else, like actual pixel shaders for example.

You need to take that a step further:

GPU 650 MHz clock
PPU 100 MHz clock

So, lets say 6.5:1 clock ratio for an XTX

PU

that gives:
GPU 48 Vector/clock X 6.5 Clock = 312 Vector Ops/ 1 PPU Clock
PPU 96 Vector/Clock X 1 Clock = 98 Vector Ops / 1 PPU Clock.

I'll stop there for now. I am sure you can all extrapolate how thats close to 3:1 adfvantage for the ATi GPU. Now...the nvidia GPU ...wouldn't be much different than the ATi, but nvidia isn't talking about it as much yet.

BBA · Apr 22, 2006

Terra said:
And let me go on the record and say that I have NO affiliations with AGEIA...what so ever...
Other than I will be buying a PPU when it's released.

Terra - As I might suspect that "argument" will surface soon....

Why wait...buy one right now. (It's been available since 3/15 unless they have all been sold already)

Greetings,

We at AGEIA are excited that you signed up at QuakeCon 2005 expressing interest in our PhysX technology.

PhysX is here now. PC games will never be the same again! This technology will bring exiting new ways to frag your opponent.

We have released 250 AGEIA PhysX cards to Vance Research, a logistics company that handles early card distribution to Modders. They will be sold to extreme modders that can create content and buzz prior to retail availability. These cards will contain two modifiable game-lets based on Unreal Engine 3 and Artificial Reality Engine.

Unreal Engine 3 game-let is the Hanger of Doom demo that we showed at QuakeCon. The Artificial Reality game-let is a multi-player LAN game that uses physics in game play in ways that have never been seen before. AAA games will roll out after retail availability.

AGEIA is planning a contest with a large cash prize for best game-mod. Any content created now will be eligible.

ASUS and BFG manufactured cards are available through Vance Research today @ www.vanceresearch.com . You will need to sign up as a new user. Anticipate two weeks for delivery. Non-disclosure agreement required prior to receipt.

Content can only be publicly shared when cards are available through retail.

Thank you for your continued interest in AGEIA and your contribution to the PC gaming world.

AGEIA Modder Relations

BBA · Apr 22, 2006

BTW Terra, you mentioned how many spare PCI vs PCI-E slots...right now I have no spare PCI slots, but I have one spare PCI-E X8-16 and 1 spare PCI-E X1 slots (one more X1 slot is covered by the X1900XTX heatsink)

Basically, PCI-E slots are more available as spare slots than PCI slots already, simply because after you install a sound card and a tuner/scsi/whatever, thats how it ends up.

Xipher · Apr 22, 2006

BBA said:
You need to take that a step further:

GPU 650 MHz clock
PPU 100 MHz clock

So, lets say 6.5:1 clock ratio for an XTXPU

that gives:
GPU 48 Vector/clock X 6.5 Clock = 312 Vector Ops/ 1 PPU Clock
PPU 96 Vector/Clock X 1 Clock = 98 Vector Ops / 1 PPU Clock.

I'll stop there for now. I am sure you can all extrapolate how thats close to 3:1 adfvantage for the ATi GPU. Now...the nvidia GPU ...wouldn't be much different than the ATi, but nvidia isn't talking about it as much yet.

So you want to dedicate the entire thing to physics processing, and lose out on all the graphical features it gives you then? I think a lot of the games coming out using Ageias PhysX (like UT2007) are using those vertex and pixel shaders quite extensively. I would guess that would leave little head room for the physics calculations. That is probably WHY Havok FX isn't intended to be used for game play effecting physics, because at the games lower requirements, it probably couldn't even run, and it will still have to dynamically adjust how much of a load it puts on the GPU in case the game wants to throw some more effects on.

Terra · Apr 22, 2006

BBA said:
You need to take that a step further:

GPU 650 MHz clock
PPU 100 MHz clock

Where the *beeeeeeeeeeeep* have you gotten the idea that the PPU core is 100Mhz?

So, lets say 6.5:1 clock ratio for an XTXPU

Not in this world?

Terra - WTF

Terra · Apr 22, 2006

BBA said:
Why wait...buy one right now. (It's been available since 3/15 unless they have all been sold already)

Thanks for nothing( again)

Terra - Fucking PREORDER!!!....

Terra · Apr 22, 2006

BBA said:
BTW Terra, you mentioned how many spare PCI vs PCI-E slots...right now I have no spare PCI slots, but I have one spare PCI-E X8-16 and 1 spare PCI-E X1 slots (one more X1 slot is covered by the X1900XTX heatsink)

Basically, PCI-E slots are more available as spare slots than PCI slots already, simply because after you install a sound card and a tuner/scsi/whatever, thats how it ends up.

*chough*
You are asuming that the majority of motherboards are PCI-E...
The way ATI/NVIDIA want it too be.
And again, this does not aplly for this world.

Terra - Do you have any affiliation with ATI?!

BBA · Apr 23, 2006

Xipher said:
So you want to dedicate the entire thing to physics processing, and lose out on all the graphical features it gives you then? .

Yes, thats one plan ATi is making..

Thats what ATi is letting you do, you will have one video card for normal 3D, the other for dedicated physics.

Basically, you have todays X1900, it's outdated and you buy a newer model for vista or whatever, and the X1900 becomes sole use for Physics.

Since I upgrade every time a new card comes out, this gives the X1900 a longer life of use.

BBA · Apr 23, 2006

Terra said:
Where the *beeeeeeeeeeeep* have you gotten the idea that the PPU core is 100Mhz?

Not in this world?

Terra - WTF

I got it from the Ageia owner/tech/marketing person at Quakecon.

Xipher · Apr 23, 2006

BBA said:
Yes, thats one plan ATi is making..

Thats what ATi is letting you do, you will have one video card for normal 3D, the other for dedicated physics.

Basically, you have todays X1900, it's outdated and you buy a newer model for vista or whatever, and the X1900 becomes sole use for Physics.

Eh, seems like a waste of money to me.

BBA · Apr 23, 2006

Xipher said:
Eh, seems like a waste of money to me.

Why? What else will you do with the X1900 after you upgrade? Sell it? I tend to keep older video cards myself, this makes nice use of it.

BBA · Apr 23, 2006

Terra said:
*chough*
You are asuming that the majority of motherboards are PCI-E...
The way ATI/NVIDIA want it too be.
And again, this does not aplly for this world.

Terra - Do you have any affiliation with ATI?!

Get real man...who in their right mind will buy a PPU if they are not already using a top of the line video card on a PCI-E system board?

You should stop working for Ageia and see what comes out in real life before taking your hard stand.

BBA · Apr 23, 2006

Terra said:
Thanks for nothing( again)

Terra - Fucking PREORDER!!!....

Im pretty sure I posted this the day it was sent to me...if you missed out, well, who cares anyway.

Xipher · Apr 23, 2006

Sell it, if you have that kind of cash why aren't you running SLI or Cross fire, that second card is in the way. Also if ATi is the only one backing this, then I could care less (Nvidia fan simply due to decent support for Linux).

BBA · Apr 23, 2006

Xipher said:
Sell it, if you have that kind of cash why aren't you running SLI or Cross fire, that second card is in the way. Also if ATi is the only one backing this, then I could care less (Nvidia fan simply due to decent support for Linux).

nvidia is going to support it as well, but only in SLi mode and it will share physics with graphics.

Trimlock · Apr 23, 2006

its dedicated support for the havocFX engine

MrNasty · Apr 23, 2006

BBA said:
You need to take that a step further:

GPU 650 MHz clock
PPU 100 MHz clock

Ageia has been keeping its clock speed details under HEAVY wraps - I know, I have a copy of the patent and even that doesn't have full specs and also in a lot of interviews Ageia have said they don't want to give the competition a heads up.

Lets examine what you've spouted anyway:

Power consumption is directly related to fab technique and clock speed.

a 125 million transistor processor dissapating 25 watts. 130nm.

25/125 = 0.2 w/million transistors

a 384 million transistor processor dissipating nearly 100 watts. 90nm.

100/384 = 0.26w/million transistors

ok, so lets see here, based on the fab technique decreasing the power consumption linearly wrt clockspeed (which is almost right - it should reduce by square but never gets there due to inefficiencies):

90/130 * 0.2 w/million transistors = 0.14

so, ATI's clock is 650Mhz which means that Ageia's PPU must be close to:

0.14*(650/0.26) = 350 Mhz

And this is an approximate calculation - but sorry BBA, your BBS is called: 25watts doesn't go nowhere.

so, let's look at your comparison now:

3.5*96= 336

6.5*48= 312

And that, as I mentioned earlier but you so carefully ignored, doesn't take into account that in physics any vector calculations performed usually have to be fed back into each other, something that the dispatch processor will have to handle, and as its already handling the instructions it will be pretty tied up I imagine.

Oh, and who'd you get that number off? a booth babe?

MrNasty · Apr 23, 2006

BBA said:
nvidia is going to support it as well, but only in SLi mode and it will share physics with graphics.

Oh look, another phallacy: nVidia's solutions will no more "share physics with graphics" than ATI's will.

nVidia have also jumped on the hype bandwagon (along with ATI) in saying that the most interesting application of HavokFX will be on a 1 GPU system.

The only difference between their dual-card physics system and ATI's is that with nVid's you have to use 2 identical cards, whereas ATI's (like crossfire) allows you to use any two cards. (ATI gets my vote here

)

I speculate that ATI is consistently denying you will need to run the cards in "crossfire" mode because they want people to associate crossfire with graphics, in actual fact 2 graphics cards operating in this physics/graphics mode will be indistinguishable from how they work in crossfire, just sans dongle.

nVidia has been far more forthcoming with details than ati have about their "unnamed, unknown, undeveloped" physics API.

ATI Physics

Limp Gawd

2[H]4U

2[H]4U

2[H]4U

2[H]4U

[H]ard|Gawd

Supreme [H]ardness

2[H]4U

2[H]4U

2[H]4U

Limp Gawd

2[H]4U

Supreme [H]ardness

2[H]4U

Limp Gawd

Limp Gawd

Limp Gawd

2[H]4U

[H]F Junkie

2[H]4U

Limp Gawd

2[H]4U

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

2[H]4U

2[H]4U

2[H]4U

2[H]4U

Supreme [H]ardness

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

[H]F Junkie

Limp Gawd

Limp Gawd