Video card folding

Postalgeist · Sep 3, 2004

Several of us asked over the past years whether a vidcard could fold. The answer has been "no." Now, I saw this on Tom's: http://www.tomshardware.com/hardnews/20040902_135943.html

The article is about using the Nvidia cards to process audio, but it says the same technique could be used for Genomics or SETI. Hopefully Stanford will go somewhere with this.

relic · Sep 3, 2004

hehe...I bet the first thread on this topic was over two years ago....now it's possible, huh?
Somebody go read the article for me...tomshardware is blocked at my firewall.
(Yes, I blocked it myself.

)

The[H]uman · Sep 3, 2004

For Relic

Audio supercomputer hidden in your graphics card?

By Wolfgang Gruener, Senior Editor

September 2, 2004 - 13:59 EST

Cambridge (MA) - Nvidia's graphic cards may have much more to offer than simply drawing pixels on the screen: A startup company has found a way to translate audio signals into graphics, run them through the graphics card and overcome a common issue of limited audio effect processing performance in computers.

It is not unusual that professional music artists run into performance barriers even with the most powerful computers today. Multi-track recording still is a challenging and sometimes frustrating task. James Cann from BionicFX in Massachusetts however noticed that audio processing task does not have to happen just in the CPU. His Audio Video Exchange technology (AVEX) converts digital audio in graphics data and then performs effect calculations using the 3D architecture of Nvidia GPUs. Compared to the capability of just six GFlops of a typical CPU, Nvidia's chips can reach more than 40 GFlops, according to Cann.

"This technology allows music hobbyists and professional artists to run studio quality audio effects at high sample rates on their desktop computer," he said. Cann's invention is purely software-based and is not capable substituting a sound chip. The approach exploits the video card 3D chip, which usually is idle when users are working with multi-track recording software. "It's a great resource to use as a coprocessor," Cann said. "AVEX is designed to reduce the CPU load by moving the processing to the video card for certain types of audio effects when making music." Cann said that the technology is purely targeted at music enthusiasts and at this time brings no advantages for applications such as gaming.

But if Cann is right, audio effect processing might be just a starting point how a GPU could be used for other applications. He believes that several other software types could be greatly enhanced in the same way, such as Genomics or SETI. "The GPU has some numeric precision issues that need to be worked out for scientific applications to be possible, but the thought of performing the computations on a resource theoretically capable of 50 and more GFlops of the GPU instead of five GFlops of the CPU is exciting," he said.

So far Cann cannot take as much performance away from the GPU as he would like. "Right now, getting the data back from the video card is very slow, so the overall performance isn't even close to the theoretical max of the card. I am hoping that the PCI Express architecture will resolve this. This will mean more instances of effects running at higher sample rates," he said.

Still, there is significant boost of performance and reduce the load for CPU for people who are using applications such as Cubase, Ableton Live, and other VST compatible hosts. Cann's first commercial application will be BionicReverb, which is expected to go into public and free beta in October. The final version is scheduled to be released at the Winter NAMM Conference in January 2005.

BionicReverb is an impulse response reverberation effect that runs as a plug-in inside VST compatible multi-track recording software. The audio effect is generated by combining an impulse response file with digital audio. Impulse response files are created by firing a starter pistol inside a location, such as Carnegie Hall, and recording the echoing sound waves. Combining the two files through mathematical convolution is a CPU intensive process that is reduced by moving expensive calculations onto the GPU. Amateur and professional guitarists, singers, pianists, and other musicians will be able to create performances in their home or studio that sound exactly like they were recorded in famous locations around the world, according to Cann.

At this time, Cann plans to only support Nvidia graphics cards. "When I started, ATI had a problem with floating point data. I have heard they have resolved it, but I won't have time to purchase and research their newest cards until after this is released," he said.

Pricing was not announced yet, but Cann says he will make his technology available for "far less" than the cost of professional studio DSP solutions which can run into the high five-figure range. He estimates the price will be somewhere between $200-$800.

AtomicMoose · Sep 3, 2004

relic said:
hehe...I bet the first thread on this topic was over two years ago....now it's possible, huh?
Somebody go read the article for me...tomshardware is blocked at my firewall.
(Yes, I blocked it myself. )

Some kind of personal vendetta?

Postalgeist · Sep 3, 2004

There is little love between this site and Tom's. But, info on possible increases in folding is all good.

AtomicMoose · Sep 3, 2004

Postalgeist said:
There is little love between this site and Tom's. But, info on possible increases in folding is all good.

Yeah, I know all about the rift....but Ididn't know that it ran that deep with relic.

relic · Sep 3, 2004

AtomicMoose said:
Yeah, I know all about the rift....but Ididn't know that it ran that deep with relic.

I block THW at every firewall I install. Just doing my part.
Nothing to do with the [H]...unless you consider that GMTA.

Whitespace · Sep 3, 2004

So you guys are very lucky to have me around, since this exact topic is what I've been working on for months.

This is only possible with GPUs that support true branching (DirectX 9.0 compliant). Before, GPUs would have to precompute the number of iterations in a loop, and would essentially have repetitive code after compilation from Cg or a higher shader language. With the limits on pixel shaders at 65k instructions now, and true branching, the only things limiting this possibility are 64 bit IEEE 754/854 floating point, and integer operations such as bitshifting, bitmasking, and a host of other operations.

In Cg, things like double are reserved keywords, implying a future implementation, but nVidia and ATi wouldn't jump to 64bfp any time soon, which sucks

Now, if one could write a double precision library using existing R16G16B16A16 128bit values (storing info in each color channel, and utilizing the fast vector ops on GPUs), even with massive overhead, you could then implement something like SETI@Home or, the best, Folding@Home.

So, in short, GPUs aren't precise enough with 24 and 32 bit floats, and they don't have an easy way to create 64 bit doubles, and are severely limited regardless since integer and bitwise operations aren't implemented either.

But, if something does work, usually its faster than the CPU; sometimes 10x faster!

For more info, check out http://www.gpgpu.org

If I make any progress, I'll be sure to keep interested parties notified.

BillR · Sep 3, 2004

No one would want to do that anywaywould cut the crap out of frame rates

Carnival Forces · Sep 3, 2004

Whitespace said:
So you guys are very lucky to have me around, since this exact topic is what I've been working on for months.

This is only possible with GPUs that support true branching (DirectX 9.0 compliant). Before, GPUs would have to precompute the number of iterations in a loop, and would essentially have repetitive code after compilation from Cg or a higher shader language. With the limits on pixel shaders at 65k instructions now, and true branching, the only things limiting this possibility are 64 bit IEEE 754/854 floating point, and integer operations such as bitshifting, bitmasking, and a host of other operations.

In Cg, things like double are reserved keywords, implying a future implementation, but nVidia and ATi wouldn't jump to 64bfp any time soon, which sucks

Now, if one could write a double precision library using existing R16G16B16A16 128bit values (storing info in each color channel, and utilizing the fast vector ops on GPUs), even with massive overhead, you could then implement something like SETI@Home or, the best, Folding@Home.

So, in short, GPUs aren't precise enough with 24 and 32 bit floats, and they don't have an easy way to create 64 bit doubles, and are severely limited regardless since integer and bitwise operations aren't implemented either.

But, if something does work, usually its faster than the CPU; sometimes 10x faster!

For more info, check out http://www.gpgpu.org

If I make any progress, I'll be sure to keep interested parties notified.

damn, i understood maybe 5% of that post.

funny thing is, a couple of days ago i came up w/ this idea too...what if idle VidCard cycles ==> F@H ??

well, i'll just let the experts handle it...hopefully it'll work out in the end!

relic · Sep 3, 2004

Whitespace said:
So, in short, GPUs aren't precise enough with 24 and 32 bit floats, and they don't have an easy way to create 64 bit doubles, and are severely limited regardless since integer and bitwise operations aren't implemented either.

Sounds like no matter how you attack it, you just don't have the proper tools yet. Will the GPU market drive new implementations of higher precision on floating point ops or the required operations for doubles or is it a long shot? I guess what I'm asking "Is is needed for future GPU performance or is this all just 'what if' ."?

Whitespace · Sep 3, 2004

relic said:
Sounds like no matter how you attack it, you just don't have the proper tools yet. Will the GPU market drive new implementations of higher precision on floating point ops or the required operations for doubles or is it a long shot? I guess what I'm asking "Is is needed for future GPU performance or is this all just 'what if' ."?

Well, nVidia definitely understands that GPGPU is a huge topic of interest to scientific computing (10x the performance for the same cost, or less), and I heard from an inside source that the next generation will please the GPGPU people greatly, but as far as implementing doubles, I'd give it 3 years.

nVidia got flack for being slower `cause they implemented fp32 while ATi had fp24, and the stuff you can do with floating point textures is so pimpingly cool and it's not even in games yet. Also, the visual difference for these neat effects wouldn't be much, since the algorithms are tuned for speed, and that means make it look good enough at 30+ fps, not accurate (example, dynamic waves in the ocean, diffraction, dispersion and chromatic abberation of light through glass; they look awesome, but aren't physically accurate)

So, in short

, I doubt there'll be fp64 featured in a consumer GPU any time soon. That doesn't mean we can't emulate fp64, it'll just require effort (which I'm doing, albeit slowly).

My goal is to run FAH on a video card (why I started doing research), so you guys should be happy that FAH is my goal (and not cracking passwords or something stupid)

relic · Sep 3, 2004

English translation "We're not going to see it in Hardware/Firmware anytime soon, but we might be able to make it happen in software."

Is that about right? It's got to be a hell of a challenge.

ChingChang · Sep 4, 2004

You would be able to run multiple folding clients at once? One for the gpu and others for CPUs?

btw, Whitespace, what do you do for a living if you don't mind me asking? very interesting stuff

Postalgeist · Sep 4, 2004

Another thing different now than before is the PCI Express interface. One of the problems before was getting the data back from the GPU. Now, we have a high-bandwithm two-way connection. Plus all the big words Whitespace is using...

CIWS · Sep 4, 2004

Anyone thought about starting this topic in the Stanford forum, possibly get those guys to eyeball it ?

Whitespace · Sep 4, 2004

relic said:
English translation "We're not going to see it in Hardware/Firmware anytime soon, but we might be able to make it happen in software."

Is that about right? It's got to be a hell of a challenge.

Absolutely correct. Unless nVidia or ATi decide to directly target scientific computing on GPUs (which will eventually happen, since they'd make bling bling and take over cluster computing for many projects), they have no reason to make GPUs for video cards fp64 for now.

ChingChang said:
You would be able to run multiple folding clients at once? One for the gpu and others for CPUs?

btw, Whitespace, what do you do for a living if you don't mind me asking? very interesting stuff

For GPGPU stuff, the CPU sets up the DirectX/OpenGL stuff and you use the textures to store the data to be computed. So the CPU is giving info to the GPU, and perhaps some pre/postprocessing of the data. So, of course, you would run another FAH on the CPU in addition to the GPGPU fold.

I'm a math major (completed) at SUNY Stony Brook. I was comp sci, but math is more interesting. But there are always these CS projects that pull me back.

I'm lucky, because there are people here in Stony Brook doing top of the line GPGPU work. In fact, they made a super cluster of GPUs to do airbone pathogen experiments (anthrax in times square) for the (damned) Dept. of Homeland Security. But I have access to people who KNOW the ins and outs of GPUs much more than I do, which r0x0rz.

Postalgeist said:
Another thing different now than before is the PCI Express interface. One of the problems before was getting the data back from the GPU. Now, we have a high-bandwithm two-way connection. Plus all the big words Whitespace is using...

Yes, a HUGE problem currently is the bandwidth sucks so much that you are really limited (even more so than already!) to which problems in scientific computing get a speedup. If bandwidth wasn't an issue, then there'd be a lot more 10x speedup gains and a lot more research in the area. PCI-X will help, but its still limited. Also, with nVidia's SLI, the PCI-X architecture is still limited, since I believe PCI-X 16's bandwith is shared between all PCI-X 16 slots in the mobo, which is like having 2 AGP 8x slots, which are bandwidth limited. Something like that.

CIWS said:
Anyone thought about starting this topic in the Stanford forum, possibly get those guys to eyeball it ?

Every script kiddie thinks about this idea; it is not new. And since the limitations are so great, it is impossible/quite hard to do it currently. Those guys hear a lot about this, and believe me, if IBM or someone could benefit from an easy little program, they'd have done it already.

Now, I am in no way an expert in GPGPU, but from the year or so I've been actively researching and fiddling around, I've learned a lot here and there. Eventually it'll happen. If we're lucky, soon.

Whitespace · Sep 4, 2004

As an example of how absolutely new this stuff is, nVidia just recently (in the last month) put full support for the 6800's new features into their Cg programming language, and last week put support into their FX Composer shader creation application. So yeah, this stuff is JUST coming out!

I guess, for those of you who know about computer programming, GPUs are not Turing-Complete. This means they aren't general enough to program anything a CPU can do (an OS, for example), nor should they be for now.

The power of a GPU comes from being hardware-designed to pimp as many vertices and pixels they can. If you can simulate data as a pixel, and then take advantage of the massive parallelism of a GPU, and you don't have a lot of data to process per iteration, then and only then will you see speedups, assuming you the limitations of the GPU architecture don't hold you back.

Even matrix multiplication, something which one would assume to be blazingly fast on an architecture which is vector/matrix based, is slower than a CPU (for bandwidth reasons though).

But we're getting there

Postalgeist · Sep 7, 2004

"The power of a GPU comes from being hardware-designed to PIMP as many vertices and pixels they can."

Whitespace, thanks for telling me this. Now I know why my Radeon is always stealing my Cadillac Eldorado.

ChelseaOilman · Sep 8, 2004

CIWS said:
Anyone thought about starting this topic in the Stanford forum, possibly get those guys to eyeball it ?

You mean like this?

http://forum.folding-community.org/viewtopic.php?t=2135

ChelseaOilman

Whitespace · Sep 8, 2004

Hmm, it appears that the Pande Group is already working on a port. Vijay says it here around the middle.

I guess I don't have to write anything now

-Sn1PeR- · Sep 9, 2004

http://slashdot.org/article.pl?sid=04/09/03/158210&tid=152&tid=137

Slashdot^

Whitespace · Oct 5, 2004

Here's a link to Dr. Vijay Pande's lecture at Xerox PARC about folding.

He mentions F@H on GPUs around 35:00 into it, and says later on that they have it working, but are dead even with CPUs. Great update on the future of F@H and distributed computing in general.

Also of GREAT note is 45:00 into, where he describes how F@H is helping find info about p53, the "tumor suppressor", which >50% of all cancers are caused by a mutation in this molecule. Brought tears to my eyes that we may be able, eventually, to prevent >50% of all cancers in one shot.

Good info so you know what the hell your computer is actually doing, and what we're up against computationally.

Video card folding

Gawd

[H]ard|DCer of the Month - August 2007

Secret Agent Man

[H]F Junkie

Gawd

[H]F Junkie

[H]ard|DCer of the Month - August 2007

n00b

Born Again Cynic

Supreme [H]ardness

[H]ard|DCer of the Month - August 2007

n00b

[H]ard|DCer of the Month - August 2007

Supreme [H]ardness

Gawd

CIWS

Guest

n00b

n00b

Gawd

[H]ard|Gawd

n00b

2[H]4U

n00b