NF200 "True" 3-Way SLI Preliminary Results @ [H]

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
55,532
NF200 "True" 3-Way SLI Preliminary Results - We take a look at NVIDIA's answer to "true" 3-Way SLI on current Intel X58 chipset equipped motherboards. We all know SLI on X58 was a good thing, but do gamers need to pay for the NF200?


Today we have a new video published which covers some of the discussion surrounding the NF200 chipset is being used on the ASUS P6T6 WS "Workstation" Motherboard that we have previously reviewed here. The use of the NF200 chipset in addition to the Intel X58 chipset has been touted as "true" 3-Way SLI, or 3-way CrossFireX for that matter, but what does it mean to the gamer?
 
thanks, you just made me just spend over $800 bucks on a Dell 3007..
I was hesitant until I saw it in your video...
 
So... at current genereation the difference is small to none. WIth over 100fps, 1 more or less doesn't matter. As to be suspected :p Wonder at which point the 16/16/16 will get some difference between 16/8/8.

And it's just me reading the graphs wrong, or is the simple 16/8/8 (blue) actually faster then 16/16/16 with nf200?
 
So... at current genereation the difference is small to none. WIth over 100fps, 1 more or less doesn't matter. As to be suspected :p Wonder at which point the 16/16/16 will get some difference between 16/8/8.

And it's just me reading the graphs wrong, or is the simple 16/8/8 (blue) actually faster then 16/16/16 with nf200?

Green bar is X58+NF200 @ 16-16-16. It is slower than lone X58 @16-8-8.
 
So... at current genereation the difference is small to none. WIth over 100fps, 1 more or less doesn't matter. As to be suspected :p Wonder at which point the 16/16/16 will get some difference between 16/8/8.

And it's just me reading the graphs wrong, or is the simple 16/8/8 (blue) actually faster then 16/16/16 with nf200?

The 16/8/8 solution is indeed faster, like Kyle mentioned in the video, they think there is some added latency with the NF200 chipset.
 
Actually, in cases where large gpu-to-gpu transfers between GPUs exist and the extra bandwidth between GPUs is used, the 16/16/16 is faster. But the SLI profiles ensure that most transfers that can be eliminated are eliminated, so there is little use for that extra bandwidth most of the time.

16/16/16 would probably be faster in the case of a game where there is no SLI profile but a lot of gpu to gpu transfers, but once a profile is defined the bandwidth is not as useful between GPUs.

This is exactly the same issue faced by the 4870x2 and it's sideport. ATI said they don't see a need to turn it on because they end up profiling all games an disabling transfers to improve performance anyway, but if they ever didn't profile a game and just applied AFR mode by default, the sideport would come in handy in speeding up transfers do they don't bottleneck your multi-gpu scaling.
 
I'm glad I bought a "pedestrian" evga X58 board.
Truely the nf200 is not ready for prime time.

Good viewing and I'm sure a ton of work, thanks.;)
 
Can't watch the vid at work, but the chart tells the story. Not a large difference, but doesn't look like the NF200 is worth it for tri-SLI. Could be other applications for the PCIe lanes, though.
 
This doesnt suprise me at all.

Look at the Tesla line of compute GPU's specifically the S1070, they are putting 2 gpus onto a x8 pci-e bus and not seeing much in the way of slowing down computing. I dont think that the GPUs can fully utilize the full bandwidth of the bus being 1.1 or 2.0.

I also think that Kyle hit the head of the nail when he explained that a lot of the info is going across the SLI connector.

I think that the nf200 does have more bandwidth but #1 it causes latencies (thats why it showed slower) and #2 there is nothing to utilize that much bandwidth in todays computers.

either way it doesnt matter to me, tri sli I dont think is really needed even on a 30" for MOST gamers I think that its still about being able to "play crysis", and I need that third slot to take care of the real PC bottleneck; Harddrives :)


edit: Big thanks to kyle and staff for another kickass review! thanks for "keepin it real"
 
I'm just curious why the mother of all games - in terms of pushing the threshold - wasn't used in the benchmarking? And by that I'm referring to Crysis?

Granted the initial results do not look promising but each game benchmarks differently.

Tri 16x sadly appears to be a gimmick (for now).
 
L O L
the P6T6WS is still limited to a single x16 from the NF200 to the X58 chip (and thus the CPU and everything else, including the first GPU (the TRUE native x16, direct from the X58)) so the NF200 cant help. And as you/Kyle mentioned, latencies go up with anything in the way.

I'm just curious why the mother of all games - in terms of pushing the threshold - wasn't used in the benchmarking? And by that I'm referring to Crysis?
Probably because FC2 is stupid easy to benchmark with.
And tri-SLI isnt shown to be a gimmick here, its shown that 16/8/8 is enough bandwidth for it on FC2.
 
Tri 16x sadly appears to be a gimmick (for now).

I personally never saw "true" 3-way sli to be proven on PCIe Gen1. So now we are on Gen2 which in PCIe Gen1 bandwidth terms is actually x32. So in Gen1 measurements we are seeing X58 chipsets with x32-x16-16. That is a LOT of bandwidth. I am going to guess it will be years before we see it utilized by any gaming application if not longer.
 
nice video kyle.. i was wondering if you or anyone one else that has tested that board if anyone had tried putting 6 gfx cards in to see if it would actually run all 6.. without the SLI bridges.. because it sure as heck would be an interesting board for people that are insane F@H folders..
 
ASUS can't really claim "true" 3-way. "true" 3-way in my book is is full speed to the CPU. Right now all the chip does is add a extra step the CPU has to go through to talk to the other cards.
 
Interesting conclusion. Good pre-review, though I personally hate all these videos... text is much easier to skim through in less time.
 
Interesting conclusion. Good pre-review, though I personally hate all these videos... text is much easier to skim through in less time.


booooo...

come on.. they post it in text as well you dont have to watch the videos... you cant say that the videos are unoriginal.. personally im sick of reading text but thats just because im a visual learner..
 
nice video kyle.. i was wondering if you or anyone one else that has tested that board if anyone had tried putting 6 gfx cards in to see if it would actually run all 6.. without the SLI bridges.. because it sure as heck would be an interesting board for people that are insane F@H folders..

There is no reason it would not do that that I know of off hand.

Interesting conclusion. Good pre-review, though I personally hate all these videos... text is much easier to skim through in less time.


Takes you 4 extra minutes to absorb it, for me a couple extra days to prepare it.
 
I rather prefer the videos for this, I was much more of a video imagery kind of learner than text text text, but it's something personal I guess. Thanks for posting this :)
ecstatic.gif
 
ASUS can't really claim "true" 3-way. "true" 3-way in my book is is full speed to the CPU. Right now all the chip does is add a extra step the CPU has to go through to talk to the other cards.

Well every chipset before X58 was like that. 780i used nForce 200 for the 2 PCIe 2.0 slots and the 570 mcp for the final PCIe 1.1 x16 slot and 790i used the 570 mcp to add the final PCIe 1.1 slot.
 
Somewhat off-topic...where can I get Kyle's wallpaper? I'd love 1920 x 1200 if poss. PM plz ;-) Thanks in advance guys! Like the new P6T6! Great review (as usual :) )
 
“Man-O-Man good stuff!” If you yourself haven’t decided which board to use what are we to think? Is there something your not telling us? I went threw the UD5, pissed me off so I replaced it with the P6TD, left everything to auto because I need it to work (afraid to chance it). Was wondering why nobody has done a serious review on the Smackover? I mean in most reviews you guys are only using three sticks of ram and not everybody is content to let video dominant their PC. Think with the right cooling and hardware it could be more than what it seams or at least to figure out the difference between Northbridge placement.
Once again great review and thanks for the video.

Regards,
Mario
 
Good stuff Kyle, thanks. I figured the NF200 chip had to add some latency but it's good to see both standalone X58 and the NF200 + X58 combo perform so well. I bought the P6T6 board for reasons other than tri-SLI as I'm not a big gamer so I still don't have any regrets yet.
 
Considering going from 8x to 16x is very little difference in performance, considering overhead, these results could be said to of been expected. Plus it's not like there is any game out that that needs 3 gpu's anyways.
 
This NF200 stuff, I donno, as far as I can tell all they've done is taken all the pci-e lanes from the chipset available and OC'd them to some speed which gives the same throughput as 3X 16X pcie 2.0 lanes. But again, thats as far as I know, no documentation into it, just Nvidia's word that "it helps".

Huh? It works just like nForce 100 did with the 7900GX2.

http://www.anandtech.com/video/showdoc.aspx?i=2769&p=2

This chipset agnostic implementation works is by incorporating a PCIe switch which acts as a bridge between the system's X16 interface and the two GPUs. Because of the way PCIe works, the operating system is able to see the two graphics cards as if they were independent parts. You can think of this as being similar to connecting a USB hub to a single USB port in order to plug in multiple devices. Only in this case, the devices and switch are all in one neat little package.

The PCIe switch itself is a 48 lane device, capable of routing each of the three x16 connections to any one of the other two depending on its intended destination. On their 7900 GX2, NVIDIA takes full advantage of this, but for the 7950 GX2, only 8 lanes are routed from the switch to each GPU. The end result is that what the chipset would have had to manage, NVIDIA's 7950 GX2 moves on board.
 
BAH Where's the article?! I need to be able to read text and graphs so I can skim through the stuff I already know.

Good info though, Kyle. Thanks!
 
I don't mind the video. Besides, everything you need to know is in that one supplemental graph.
 
am i the only freakin one that cant view the damn video?
 
I've read the specs up on the nf200 (kinda) and it still connects to the NB (x58) using a x16 path.

By design the only improved speed is between two of the three cards.

Allow me to illustrate:

ghetto_x58_nf200.JPG



As you can see, Card 2 and 3 communicate at full x16 speeds, but both have to squeeze through a single x16 channel to communicate with card 1. According to an article on tomshardware this is how most manufacturers will use the nf200 chips.

I could be wrong in this case, but if this is the case then it's not a _TRUE_ 16/16/16, it's pseudo.

This could potentially be causing the (<1%) loss in performance outlined in the video.
 
Last edited:
So they cludged together a platform that doesn't even have true 16/16/16 pci-e... nice one!

What is the point then?

What about running SLI on slots 2/3 verses slots 1/2?

And.. no PCI slots.... bleh.. stuck with onboard audio.
 
Thanks for the info Kyle. Just the kind of information we needed.

Good news is that pcie 2.0 x8 is plenty, even for 3 way SLI. I can take from this that other boards might fit my needs adequately for the forseeable future.

Interesting how the 2 x16 slots are multiplexed into a single x16 to interface to the x58. Seems lame. That obviously contributes to the small performance drop.

I may still go with the P6T6 WS board, simply because I want the ability to run 2 way SLI, and at the same time use my x8 raid adapter. This is where having the extra lanes is a good thing. I've been having trouble finding a board with the necessary slots available in the correct physical locations (so that it all fits) while letting me use 2 smaller x1 cards (5 pcie slots needed). I still wish the x58 had been designed with more lanes, so that mobo manufacturers would have the freedom to design a board with 6 pcie slots.

If the first 4 pcie slots are the nf200 which all multiplex down to x16 slots on the x58, why would those slots be the recommended slots for single gpu and 2 way crossfire?

The performance drop was pretty small, maxing at 3%. I'm not too crazy about the raid card having to run thru a multiplexing setup tho. That could possibly impact overall OS performance.

Thanks again [H] :)
 
And.. no PCI slots.... bleh.. stuck with onboard audio.
PCI is going the way of ISA. No big loss for me as I have PCI-E LAN and Audio. I've been waiting a long time for a legacy free board, glad this is now available.
 
Well every chipset before X58 was like that. 780i used nForce 200 for the 2 PCIe 2.0 slots and the 570 mcp for the final PCIe 1.1 x16 slot and 790i used the 570 mcp to add the final PCIe 1.1 slot.

Hence no board using X58 should use the term "true".
 
Hence no board using X58 should use the term "true".

Then you get into semantics, just like we had with "true" quad core cpus. Either way, with this solution, every card has 16 PCIe 2.0 lanes for Tri-SLI.

I've read the specs up on the nf200 (kinda) and it still connects to the NB (x58) using a x16 path.

By design the only improved speed is between two of the three cards.

Allow me to illustrate:

ghetto_x58_nf200.JPG



As you can see, Card 2 and 3 communicate at full x16 speeds, but both have to squeeze through a single x16 channel to communicate with card 1. According to an article on tomshardware this is how most manufacturers will use the nf200 chips.

I could be wrong in this case, but if this is the case then it's not a _TRUE_ 16/16/16, it's pseudo.

This could potentially be causing the (>1%) loss in performance outlined in the video.

Your diagram is a little off, NF200 goes to slots 1 and 3 (cards 1 and 2) and X58 goes to slot 5 (card 3).

nforce_200_3_slot.png


nVidia has also stated that you can use 2 NF200 chips to provide 4 full bandwidth lanes, although I haven't seen a board configured like this, maybe EVGA will surprise us with the FTW version.

nforce_200_4_slot.png
 
nVidia has also stated that you can use 2 NF200 chips to provide 4 full bandwidth lanes, although I haven't seen a board configured like this, maybe EVGA will surprise us with the FTW version.
Reason #24 that enthusiast level boards need to start using the EATX format. :D Would love a board with 8 to 10 PCI-E slots. 4 x16 for video with the rest running in x4 or x8. :D
 
what kind of powersupply is needed to run all that O_O

got a small geothermal plant nearby?
 
There are two advantages of the nf200 promoted by Nv. One is peer to peer writes which is the "Broadcast" function, the other is "PW short". The pcie bus has now implemented native peer-to-peer writes (as of pcie v2.0) which eliminates the need for the"Broadcast" functionality of the nf200.

This only leaves the the "PW short" function. This function reduces the communications over the FSB (QPI in x58) while reducing latencies for the data sent between the cpu(s) and multiple gpu(s). From the data we have seen so far, (Kyle's comparison and the excellent sli scaling seen with the x58 in other reviews) that would mean there is no bottleneck in the cpu(s) <-> gpu(s) communications. This is due to Neleham and the x58's qpi interface with a bandwidth of 25.6 Gb/s. Until video cards arrive that push the x16 pcie 2.0 bus limits, we will probably never see any advantages of a "PW Short" function.
 
Back
Top