Benchmarking the Benchmarks @ [H]

Climber · Feb 12, 2008

toddw said:
As a long time reader I, and probably most of your audience, already know how you evaluate video cards from reading many reviews, so who is this article for?

The plethora of users both old and new in the 3870 x2 review who wanted to know why [H] got the reviews they did when AT got the reviews they did. After explaing ad-nauseum over and over the reasons why, this article was released as promised in kyle during the other thread.

People continaully said there was no difference between the two or failed to grasp that there was a significant difference between the two.

No one is going to play the game As these guys play EVery single time. Kyle can't Say that his playing method is how everyone does it. Just because he believes that the average person who buys a 500 dollar 3dcard must only use it with a 30inch screen thats how he has to test it. Because the reality is Not everyone HAS the same exact reasons why they buy a 3dcard or how they intend to use it... OR what games they intend to use it with.

There are people who just play WOW. would His test method really show them Which is the best CARD for them? Someone who plays SupremeCommander that doesn't play the game online at all. because they prefer playing Skirmishes.. and they only play against 2 ai's at a time. IS this testing method going to help those who play in this manner? Someone who only buys Flight Slims. WHich 3dcard is the best for FLight Sim x with the Expansion pack? How is this testing method really going to help those readers?

I think you have completely missed the whole point of video card reviews, or at least the methodlogy here. I still play Starcraft am I upset that his reviews don't cover that? Nope. The whole purpose is to tax the game with a demanding game. Crysis being the current, or suppossed, king of the GPU heavy hitters is what they tested on. Reason being if you read their reasons, is because they have alimited amount of time and focus on the gameplay through the entire game. So their options are limited when it comes to playing multiple games. They focus on one or two of the more demanding games. However, I will preface it wishing they would have added 4 gb and a quad core to their test bed. That seems to be just about standard now or will soon become standard amongst gamers.

Logan321 · Feb 12, 2008

toddw said:
As a long time reader I, and probably most of your audience, already know how you evaluate video cards from reading many reviews, so who is this article for?

Is that a rhetorical question?

Everyone who isn't a long-time reader, obviously.

Logan321 · Feb 12, 2008

Climber said:
However, I will preface it wishing they would have added 4 gb and a quad core to their test bed. That seems to be just about standard now or will soon become standard amongst gamers.

QFT That's my system to a T... even though I'm running vista 32bit and it only addresses 3gigs...

aShrubbery! · Feb 12, 2008

Your "apples to apples" comparisons are mostly informative.
When you evaluate a graphics card, I would love to have some reference point to compare it to when different classes are compared.
For instance on this graph, even though it is what I like to see, I would have loved to see the performance of a 8800 GT at the same settings, to see if I really should invest in any of the three evaluated cards.
I like to see the same graphs with an inferior model, at the same settings so we can compare the results, whether it is worth the extra money to get a GTS over a GT.

Also, not all of use game at those resolutions. Your data reveals how well those cards run at those resolutions but is not necessarily an indication that their high performance at those levels will justify buying those as opposed to an 'inferior' model.
You advocated that your aim was to inform the consumer, so we can 'put our money where our mouth is', but there is still some information missing.
Is there any need to go as low as 640x480 to evaluate the performance of a graphics card? Sure, in a few instances there might be a use for such a testing methodology, but who games at those resolutions these days.... other than in Crysis?

Your "highest playable settings" graphs are not without value, but I think it is not helping me to know what a card can do with 2XAA in one resolution, when the others are performing at 4xAA at another given resolution, like this one, especially since the resolutions you are testing video cards at are way higher than the ones I use (1280x1024, 1600x1200).
Especially, since the results are not always easy to decipher or extrapolate. Extrapolate? No, we shouldn't resort to that. You might as well use canned benchmarks, if in the end you have to extrapolate to get the answers you are looking for.

Angelus897 · Feb 12, 2008

Great read. I'm glad that you guys are sticking by your decision. I am still very interested in seeing benchmarks from other sites just because it is interesting to compare gpus (in a apples-to-apples pov). But whenever I plan a purchase, the only place I read is the "Highest Playable Settings" section of your rev... err.. evaluation. Keep up the good work!

dawurz · Feb 12, 2008

Niceone said:
You forgot this:
"A surprisingly successful FPS on the PC, Call of Duty 4 also lacks any sort of in-game benchmark so we benchmark the cut scene at the beginning of the first mission."

Great add; see my edited post.

Toconator · Feb 12, 2008

I've agreed with and always appreciated the [H] method of videocard testing, more so now that Kyle has spelt out the considerable effort they take evaluating the cards. They definitely deserve a coupla cold ones hoisted in their honor next time Crysis is fired up. I've ignored timedemos and 3D mark scores for years now ever since I started seeing the canned scores go higher with suspicious new drivers than they did overclocking the card on the slightly older drivers and used FRAPs as well to get my scores for comparison. I did this when I had my suspicions about "cheats" back when 3D mark 2001 was at the height of it's popularity and everyone was using the scores for bragging rights. What I see here is the [H]ard truth and an [H]onest opinion, every time. Keep up the good work boys, and thanks a bunch.

Opie · Feb 12, 2008

Melons said:
This methodology isn't good enough. It isn't good enough at all.

2. You cannot deny canned results. They will compare to previous/similar hardware reasonably enough.

You can, and easily. The underlying thesis of the article is that canned benchmarks are gamed. Driver optimizations completely rule out any legitimacy that such benchmarks might offer. Instead what you are evaluating with timedemos are one cardmakers ability to cater to the needs of a given timedemo.

I look at both the canned results and the "gameplay" evaluations offered here. I believe the information offered here to be of uniformly higher utility to consumers especially if you are an enthusiast looking for maximum visual fidelity from any given setup.

Blacklash · Feb 12, 2008

"To measure performance when playing The Witcher we ran FRAPS during the game's first major cutscene at the start of play." - Anand Lal Shimpi, Anandtech - 01/28/08

Anyone that took the time to run through the Trade or Temple Quarter in "TheWitcher" would immediately notice the horrid hitching Crossfire causes, it doesn't matter if it's on one board as the X2, or two separate cards. I've tried it with 2xAA enabled and no AA @ 1680x. Cards tested were HD 3850 256Mb cards @ 760|2038 on the 790x platform with an Agena 9600, and a HD 3870 X2 plugged into the rig in my sig, or a P35 plus an overclocked Q6600. Both systems had the advantage of 4Gb ram and were using Vista x64 with all patches plus SP1 RC applied. Disabling Crossfire via the normal method for the 3850s or disabling it via CAT AI for the X2, makes the frequent hitching vanish in the Trade and Temple Quarters.

Taking time to play at least some of the most demanding areas in a game ftw.

Demosthenes642 · Feb 12, 2008

aShrubbery! said:
When you evaluate a graphics card, I would love to have some reference point to compare it to when different classes are compared.
For instance on this graph, even though it is what I like to see, I would have loved to see the performance of a 8800 GT at the same settings, to see if I really should invest in any of the three evaluated cards.
I like to see the same graphs with an inferior model, at the same settings so we can compare the results, whether it is worth the extra money to get a GTS over a GT.

Comparing differently classed video cards really isn't that useful. Most people shop to a price point so comparing cards at the same price point is useful. A graph of a GTS with a GTX and an 3870 X2 will just let you either see the GTS get slaughtered or the high end cards top out cpu limited.

Comparing differently classed cards at identical resolutions is where canned benchmarks shine, especially within the same product lineups. If you want to see how a GT, GTS, GTX, and Ultra stack up then there you go. If you want to compare between companies then RWT seems to be more useful. Generally people who are looking to buy are looking to a price point and companies generally try to position their product lines so that they only have one product at a given price point leading to most consumers being forced to choose between the two companies offerings at that price point thus making RWT more useful to a potential buyer.

Also, [H] tends to do a good job of picking the comparisons, I believe that when the GT came out there was a comparison with a GTX

Sr7 · Feb 12, 2008

Still no answer to my question. I'm getting annoyed now by how it's just being ignored.

Initially with their X2 review, [H] published that the reason their results are different is that they use real world benchmarks (A) while other sites use "canned benchmarks" (B)

In this article [H] has shown that results from method A and B are different.

However, they have *not* shown why their runs with method B are still giving totally different results than other sites results with method B.

I can't believe how many people are jumping onboard and saying "great review" and saying how this answers the question when, in fact, this has failed to answer any question that anyone has asked. Has everyone all of a sudden forgotten the original issue at hand?

Silus · Feb 12, 2008

Sr7 said:
Still no answer to my question. I'm getting annoyed now by how it's just being ignored.

Initially with their X2 review, [H] published that the reason their results are different is that they use real world benchmarks (A) while other sites use "canned benchmarks" (B)

In this article [H] has shown that results from method A and B are different.

However, they have *not* shown why their runs with method B are still giving totally different results than other sites results with method B.

I can't believe how many people are jumping onboard and saying "great review" and saying how this answers the question when, in fact, this has failed to answer any question that anyone has asked. Has everyone all of a sudden forgotten the original issue at hand?

I'm not sure where you're trying to get, since it's quite obvious that the settings [H] considered to be playable are NOT the ones used in other sites. Other sites usually used "high" settings and claimed it was playable. This article showed how those settings are not usable at all, since with real-world gameplay tests, the results are dramatically different (taking into consideration the differences in resolution obviously, since everything else is the same).

And using Anandtech again as an example:

http://www.anandtech.com/video/showdoc.aspx?i=3209&p=5

Now look at what [H] got with the X2 at those settings:

http://enthusiast.hardocp.com/article.html?art=MTQ2MSw2LCxoZW50aHVzaWFzdA==

Anandtech even claims that @ 1920x1200, the X2 offers more performance than what [H] shows @ 1600x1200. This is the definitive answer to those that question real-world gameplay numbers, when compared to built-in benchmarks.

Sr7 · Feb 12, 2008

Silus said:
I'm not sure where you're trying to get, since it's quite obvious that the settings [H] considered to be playable are NOT the ones used in other sites. Other sites usually used "high" settings and claimed it was playable. This article showed how those settings are not usable at all, since with real-world gameplay tests, the results are dramatically different (taking into consideration the differences in resolution obviously, since everything else is the same).

And using Anandtech again as an example:

http://www.anandtech.com/video/showdoc.aspx?i=3209&p=5

Now look at what [H] got with the X2 at those settings:

http://enthusiast.hardocp.com/article.html?art=MTQ2MSw2LCxoZW50aHVzaWFzdA==

Anandtech even claims that @ 1920x1200, the X2 offers more performance than what [H] shows @ 1600x1200. This is the definitive answer to those that question real-world gameplay numbers, when compared to built-in benchmarks.

Who cares about the absolute framerates right now? I'm not disputing that the in-game and built-in benchmark framerates will be different, I'm disputing the *relative* performance between the X2, Ultra, and GTX according to [H] vs. the relative perf between the cards according to other sites.

While it may be true that the settings are different and that can vary the gap a bit, the settings alone don't turn the X2 beating the ultra by large margins (other reviews) into the X2 totally losing to the GTX ([H] review). That kind of drastic difference is coming from something else.

How can you just swallow up that reasoning as being sufficient? You don't want to see some proof that this "drastically-different-results-depending-on-settings" phenomenon is real?

Atech · Feb 12, 2008

Sr7 said:
Still no answer to my question. I'm getting annoyed now by how it's just being ignored.

Initially with their X2 review, [H] published that the reason their results are different is that they use real world benchmarks (A) while other sites use "canned benchmarks" (B)

In this article [H] has shown that results from method A and B are different.

However, they have *not* shown why their runs with method B are still giving totally different results than other sites results with method B.

I can't believe how many people are jumping onboard and saying "great review" and saying how this answers the question when, in fact, this has failed to answer any question that anyone has asked. Has everyone all of a sudden forgotten the original issue at hand?

What other sites?
Links please, not some"I saw this on a site, can't remember where"...

Silus · Feb 12, 2008

I would also like to add this, since I'm surprised that this topic is still very much alive.

Built-in benchmarks are NOT being questioned in the sense that they are completely worthless. They ARE useful as a measure of comparison. However, and that is the whole point of the article, they do NOT provide an accurate "idea" of performance of a given graphics card, across the board. Real-World Gameplay numbers do. Real-World gameplay numbers give you a very close estimate of the performance you'll get at home, with a similar system. Built-in benchmarks only show you how good that graphics card handles the pre-defined set of situations in that benchmark.

I already said this in a previous post and I'll say it again, for those that question [H]'s methodology. Go ahead and buy your $400-500 graphics card, based on timedemos and cutscenes. That's a very expensive trial and error, because when you confirm that your results are not inline with the ones provided in sites that only use built-in benchmarks (as those that bought the HD 2900 XT realized, for example), you'll value [H] methodology (and other sites that use it too) much more.

tokey · Feb 12, 2008

thanks for the article ....... its always interesting to see how things are done...... personally i dont depend on one sites opinion before i make a decision on anything I trust [h]'s methods as well as others to make an informed decision on my purchase..... i like to know that during lab testing and real world testing how a product is going to perform......heck even in my search for a good quiet case fan i am hitting reviews on several sites and getting recomendations from people in forums that actually use this stuff. good article and the debate that comes with it has been a good read also.... on both sides of the field...... its interesting to see how many people have such great opinions on this matter and the occasional toss of the turd has been amusing as well...... my only opinion on this subject is...... to the people that hate this method then dont use it..... and frankly if anyone on either side of this little skirmish has better ideas than the two put forth please i wanna see your tech site i am sure your hardware reviews are top notch as well!! nah just kiddin yall know more than i do....

Sr7 · Feb 12, 2008

Silus said:
I would also like to add this, since I'm surprised that this topic is still very much alive.

Built-in benchmarks are NOT being questioned in the sense that they are completely worthless. They ARE useful as a measure of comparison. However, and that is the whole point of the article, they do NOT provide an accurate "idea" of performance of a given graphics card, across the board. Real-World Gameplay numbers do. Real-World gameplay numbers give you a very close estimate of the performance you'll get at home, with a similar system. Built-in benchmarks only show you how good that graphics card handles the pre-defined set of situations in that benchmark.

I already said this in a previous post and I'll say it again, for those that question [H]'s methodology. Go ahead and buy your $400-500 graphics card, based on timedemos and cutscenes. That's a very expensive trial and error, because when you confirm that your results are not inline with the ones provided in sites that only use built-in benchmarks (as those that bought the HD 2900 XT realized, for example), you'll value [H] methodology (and other sites that use it too) much more.

FYI, I'm not questioning the methodology. I'll rephrase one more time for those who still don't get the problem here.

----[H] shows GTX beating x2. Most other sites show the x2 beating the Ultra.

----[H] says the difference is because they tested in real world gameplay and not using canned benchmarks like other sites.

**The Kicker**
----[H] tests with canned benchmark and still comes out with the GTX on top of the x2. (?)

Having normalized the test to be on par with what other sites are doing, they should now see the x2 beating the Ultra, if their theory about in-game performance being totally different than canned benchmark performance is true. They didn't.. they had the SAME result of the GTX beating the x2.

So now we have a problem. [H] tested *JUST* like other sites did, and *still* got totally different results than them for which card is faster. Why?

Silus · Feb 12, 2008

Sr7 said:
While it may be true that the settings are different and that can vary the gap a bit, the settings don't turn the X2 beating the ultra by large margins (other reviews) into the X2 totally losing to the GTX ([H] review). That kind of drastic difference is coming from something else.

How can you just swallow up that reasoning as being sufficient? You don't want to see some proof that this "drastically-different-results-depending-on-settings" phenomenon is real?

I already did. [H] and bit-tech used real-world gameplay numbers and got similar results. Other sites used timedemos and cutscnes and saw their results inflated. [H] doesn't need to shows us every timedemo of every game they tested, to prove this point, since every other "timedemo" site has shown us this for a while now.

And the X2 never "totally lost" against the GTX. In the Crysis example, it just couldn't handle high shaders as well as the GTX, but the performance difference, with comparable settings, is not that big. 5 fps is hardly "totally losing".

The high shaders in the GTX, is actually easily explained, since it's been proven over and over (in games like Oblivion, which is also shader intensive), that the Stream Processors in R600 and its derivatives, are not as efficient as NVIDIA's. Remember that even a full blown HD 2900 XT with 320 Stream Processors and a 512 bit memory interface, could barely keep up with a 8800 GTS 320/640, with 96 Stream Processors and a 320 bit memory interface.

Silus · Feb 12, 2008

Sr7 said:
FYI, I'm not questioning the methodology. I'll rephrase one more time for those who still don't get the problem here.

----[H] shows GTX beating x2. Most other sites show the x2 beating the Ultra.

----[H] says the difference is because they tested in real world gameplay and not using canned benchmarks like other sites.

**The Kicker**
----[H] tests with canned benchmark and still comes out with the GTX on top of the x2. (?)

Having normalized the test to be on par with what other sites are doing, they should now see the x2 beating the Ultra, if their theory about in-game performance being totally different than canned benchmark performance is true. They didn't.. they had the SAME result of the GTX beating the x2.

So now we have a problem. [H] tested *JUST* like other sites did, and *still* got totally different results than them for which card is faster. Why?

Because you are still missing the settings [H] used. Look at these:

http://techreport.com/articles.x/13967/9

Take the 8800 GTS 512 numbers as GTX numbers. Where exactly does the X2 beat anything "except" @ 1680x1050 ?
And even then, the difference between the X2 and the GTS 512 is minimal, which is right in line, with the usual 3-5 fps error margin, which ALL benchmarks (canned or real-world) suffer.

Now look at [H]'s numbers (all medium settings in the timedemo) where the X2 averages 45 fps. Now bump all settings to high, as most other sites used, and in the timedemo, you'll lose around 15-20 fps, again with a 3-5 fps error margin, which makes it inline with Tech-Report's results @ 1680x1050, which is of 30 average fps for the X2.

You can't just look at the simple graphics and take conclusions based on the numbers. That's EXACTLY why [H] playable settings are important for and those were what Kyle used to compare both real-world gameplay numbers and timedemos.

Sr7 · Feb 12, 2008

Silus said:
I already did. [H] and bit-tech used real-world gameplay numbers and got similar results. Other sites used timedemos and cutscnes and saw their results inflated. [H] doesn't need to shows us every timedemo of every game they tested, to prove this point, since every other "timedemo" site has shown us this for a while now.

And the X2 never "totally lost" against the GTX. In the Crysis example, it just couldn't handle high shaders as well as the GTX, but the performance difference, with comparable settings, is not that big. 5 fps is hardly "totally losing".

The high shaders in the GTX, is actually easily explained, since it's been proven over and over (in games like Oblivion, which is also shader intensive), that the Stream Processors in R600 and its derivatives, are not as efficient as NVIDIA's. Remember that even a full blown HD 2900 XT with 320 Stream Processors and a 512 bit memory interface, could barely keep up with a 8800 GTS 320/640, with 96 Stream Processors and a 320 bit memory interface.

*again*, the difference between medium and high settings don't turn an ultra killer into a card beaten by a GTX, so that argument is out the window.

The other thing to point out is that you made a fatal flaw here... performance is *ALWAYS* in terms of relative percentages. 5fps is proportionately smaller at 100fps than it is at 20 or 30. 5 fps here is is very large because we're talking about framerates from 18fps to 40fps... that's 28% to 8%.

Sr7 · Feb 12, 2008

Ok, maybe not many sites showed the x2 beating the Ultra in Crysis itself, but other games they did.

This is not in line with what [H] showed for CoD4 and UT3. These would probably be better apps to normalize with since their Crysis results may not have been drastically different from what others gathered.

I guess what I'd like to know is what results [H] gets if they were to benchmark the cutscene Anand did (not that this is a good way to benchmark, but it would make the two reviews comparable and give us a reference between them) and see if the GTX still comes out on top, or if they get results like Anand did, with the x2 smashing the GTX.

Same goes for their use of vCTF-flyby for UT3. If [H] takes my challenge (purely for confirming accuracy), they could run the numbers on either of these and should theoretically get the same results that Anand did.

This would tell us that under the same circumstances, [H] is getting the same relative performance that other sites are showing with their canned benchmarks, and that the in-game numbers [H] has in their review stand strong as the more accurate approach. If, on the other hand, their numbers still show the GTX beating the x2, something is drastically wrong.

I'm definitely not an ATI fanboy, I just want to make sure we're not fooling ourselves here.

Silus · Feb 12, 2008

Sr7 said:
*again*, the difference between medium and high settings don't turn an ultra killer into a card beaten by a GTX, so that argument is out the window.

The other thing to point out is that you made a fatal flaw here... performance is *ALWAYS* in terms of relative percentages. 5fps is proportionately smaller at 100fps than it is at 20 or 30. 5 fps here is is very large because we're talking about framerates from 18fps to 40fps... that's 28% to 8%.

It does for the very same reason I explained in the post you quoted me. Stream Processors are more efficient in NVIDIA's G80 and its derivatives, than on R600, so G80 is much more powerful in shader intensive games.
Now why did the X2 beat the Ultra in the "timedemo" sites ? Maybe you should ask them, since the discrepancies you mention, only come from those sites.

It's not a fatal flaw at all, since I took the numbers right off [H]'s article and the difference is 5 fps. Percentage in this factual case, makes no difference, since it's enough to tell you that 5 fps is, in no way, "totally losing" against a GTX.

iamwirthless · Feb 12, 2008

Silus said:
Real-World gameplay numbers give you a very close estimate of the performance you'll get at home, with a similar system.

This is true, but the key point here is WITH A SIMILAR SYSTEM. But if my CPU is half as fast, or a similar speed but a quad core, then I'm going to have significantly different results. If I'm running on XP-32 w/ DX9 instead of Vista-64 w/ DX10, then I'm going to have significantly different results.

And most importantly, if i have a different definition of "playable" settings than Kyle, his numbers aren't going to be much more useful than canned benchmarks.

This is especially true on Crysis, because I played through in "High" settings with a 8800gts 320. It wasn't always as smooth as I would have wanted, but it was worth more to me to see the game on high. It was playable to me!

This is the point where Kyle's methodology becomes considerably less superior than he believes it to be. If he did real world testing, but did it for all play settings, that would be very useful. If he did it for dual-core and quad-core processors of varying speeds from AMD and Intel, that would be very useful. If he did it with 2GB and 4 GB of ram, that would be useful.

But since he doesn't, my system will differ from his system. And this will make it necessary to extrapolate from his data. And this makes his testing of little more value than canned results.

Silus · Feb 12, 2008

iamwirthless said:
This is true, but the key point here is WITH A SIMILAR SYSTEM. But if my CPU is half as fast, or a similar speed but a quad core, then I'm going to have significantly different results. If I'm running on XP-32 w/ DX9 instead of Vista-64 w/ DX10, then I'm going to have significantly different results.

And most importantly, if i have a different definition of "playable" settings than Kyle, his numbers aren't going to be much more useful than canned benchmarks.

This is especially true on Crysis, because I played through in "High" settings with a 8800gts 320. It wasn't always as smooth as I would have wanted, but it was worth more to me to see the game on high. It was playable to me!

This is the point where Kyle's methodology becomes considerably less superior than he believes it to be. If he did real world testing, but did it for all play settings, that would be very useful. If he did it for dual-core and quad-core processors of varying speeds from AMD and Intel, that would be very useful. If he did it with 2GB and 4 GB of ram, that would be useful.

But since he doesn't, my system will differ from his system. And this will make it necessary to extrapolate from his data. And this makes his testing of little more value than canned results.

Well, then you should question all benchmarks, since it's very remote the possibility of you having a similar system as every other site, at home.
When evaluating a graphics card, sites tend to use the most powerful CPU they can (among other high-performance components), so that nothing other than the graphics card itself, bottlenecks the system.

If you can in fact, get a similar system, your results won't be too far off, but will NEVER be exactly the same.
If you don't get a similar system, but do have the same graphics card, you need to extrapolate much more info and find out, if the game you want to play, uses too much CPU or too much RAM, etc, so that you can pin point what is crippling your performance, other than the graphics card itself.

Sr7 · Feb 12, 2008

Silus said:
Well, then you should question all benchmarks, since it's very remote the possibility of you having a similar system as every other site, at home.
When evaluating a graphics card, sites tend to use the most powerful CPU they can (among other high-performance components), so that nothing other than the graphics card itself, bottlenecks the system.

If you can in fact, get a similar system, your results won't be too far off, but will NEVER be exactly the same.
If you don't get a similar system, but do have the same graphics card, you need to extrapolate much more info and find out, if the game you want to play, uses too much CPU or too much RAM, etc, so that you can pin point what is crippling your performance, other than the graphics card itself.

None of the sites in question use a system that will alter results drastically or bottleneck the cards... the review sites know better. That's why I'm curious if something else is at fault here. I just have a hard time believing that benchmarking a rendered cutscene is SO different from in-game play that it makes the difference between a card being labeled as an Ultra killer into a card beat by the GTX by at least 10%. That's the problem here. There are two totally different realities at play, and I'm unwilling to believe it's *all* in the benchmark method.

I'd really have respect here if these sites would stop this stand off and try to truthfully figure out what's up here. If Kyle told me that he tried Anand's method and got similar results to Anand, but his in-game results were totally different, I'd be happy then. But without that I have a hard time just believing it because [H] doesn't want to double check.

DaBoonies · Feb 12, 2008

Silus said:
Well, then you should question all benchmarks, since it's very remote the possibility of you having a similar system as every other site, at home.
When evaluating a graphics card, sites tend to use the most powerful CPU they can (among other high-performance components), so that nothing other than the graphics card itself, bottlenecks the system.

If you can in fact, get a similar system, your results won't be too far off, but will NEVER be exactly the same.
If you don't get a similar system, but do have the same graphics card, you need to extrapolate much more info and find out, if the game you want to play, uses too much CPU or too much RAM, etc, so that you can pin point what is crippling your performance, other than the graphics card itself.

And this is the very reason why I will never take any one sight as gospel. As with anything if the mentality of not shopping around isn't in your idealism then perhaps Wally World is your one stop shop for everything.

iamwirthless · Feb 12, 2008

Silus said:
Well, then you should question all benchmarks, since it's very remote the possibility of you having a similar system as every other site, at home.
When evaluating a graphics card, sites tend to use the most powerful CPU they can (among other high-performance components), so that nothing other than the graphics card itself, bottlenecks the system.

If you can in fact, get a similar system, your results won't be too far off, but will NEVER be exactly the same.
If you don't get a similar system, but do have the same graphics card, you need to extrapolate much more info and find out, if the game you want to play, uses too much CPU or too much RAM, etc, so that you can pin point what is crippling your performance, other than the graphics card itself.

I agree. But Kyle says: "We think our job is to explain the level of gaming experience you should expect when you purchase the video card we have evaluated." My point is that that experience is going to be shaped by other factors in the system. His review technique is very, very good for a SYSTEM evaluation, but it doesn't really tell me what to expect if my system and settings differ significantly from his.

If that is the case, I have to use his numbers as a relative performance guide. Which is what I can do with canned benchmarks anyway.

tokey · Feb 12, 2008

DaBoonies said:
And this is the very reason why I will never take any one sight as gospel. As with anything if the mentality of not shopping around isn't in your idealism then perhaps Wally World is your one stop shop for everything.

dude if they sold pc parts i am all over it man.... where else can you get games pizza beer and perscription drugs and still have enough dough left for a couple 16.9oz cans of red bull... if they sold pc parts worth a crap it would be a gold mine for them.... do your research then when your ready for some beer you could pick your parts up as well. one stop shopping kicks ass dont ever bring wally world into a tech fight eventually they are going to be the supreme rulers on the universe and they are going to do it with everyday low prices!

DaBoonies · Feb 12, 2008

tokey43074 said:
dude if they sold pc parts i am all over it man.... where else can you get games pizza beer and perscription drugs and still have enough dough left for a couple 16.9oz cans of red bull... if they sold pc parts worth a crap it would be a gold mine for them.... do your research then when your ready for some beer you could pick your parts up as well. one stop shopping kicks ass dont ever bring wally world into a tech fight eventually they are going to be the supreme rulers on the universe and they are going to do it with everyday low prices!

Wally World Electronics...Your one stop shopping for all your computer needs.

edjay · Feb 12, 2008

This is the best evaluation of graphics cards yet. Thanks for all the work!

Silus · Feb 12, 2008

iamwirthless said:
I agree. But Kyle says: "We think our job is to explain the level of gaming experience you should expect when you purchase the video card we have evaluated." My point is that that experience is going to be shaped by other factors in the system. His review technique is very, very good for a SYSTEM evaluation, but it doesn't really tell me what to expect if my system and settings differ significantly from his.

If that is the case, I have to use his numbers as a relative performance guide. Which is what I can do with canned benchmarks anyway.

Then please do tell me what "timedemo" sites offer, to solve this for you, because none uses the same system as you.
The point is, with all the differences in components in mind, real-world gameplay, used by [H] and a couple more sites, is MUCH more accurate than just running timedemos.

Silus · Feb 12, 2008

DaBoonies said:
And this is the very reason why I will never take any one sight as gospel. As with anything if the mentality of not shopping around isn't in your idealism then perhaps Wally World is your one stop shop for everything.

It never is gospel, is it real-world or timedemo benchmarks. But, and that's the point of this article, real-world gameplay numbers are much more accurate that timedemos.

tokey · Feb 12, 2008

DaBoonies said:
Wally World Electronics...Your one stop shopping for all your computer needs.

maybe we should petition wal-mart to start selling [h] recomended pc parts lmao then when you check out you receipt will consist of

diapers
paper towels
itel q6600
doritos
milk
Ultra x2 750w psu ! lol j/k
razors

dude i can already feel the sweetness

btw the diapers are for those long nights of playin games....er... or for the kid? i guess it Depends

Silus · Feb 12, 2008

Sr7 said:
None of the sites in question use a system that will alter results drastically or bottleneck the cards... the review sites know better. That's why I'm curious if something else is at fault here. I just have a hard time believing that benchmarking a rendered cutscene is SO different from in-game play that it makes the difference between a card being labeled as an Ultra killer into a card beat by the GTX by at least 10%. That's the problem here. There are two totally different realities at play, and I'm unwilling to believe it's *all* in the benchmark method.

I'd really have respect here if these sites would stop this stand off and try to truthfully figure out what's up here. If Kyle told me that he tried Anand's method and got similar results to Anand, but his in-game results were totally different, I'd be happy then. But without that I have a hard time just believing it because [H] doesn't want to double check.

It's not just the benchmark method...but it is the most important factor, because [H] mainly uses playable settings, which you and others, are comparing to "high settings" in other sites and that can't be done directly.

[H] showed their version of timedemos and also their usual real-world gameplay numbers. Now let's see the timedemos sites, doing their real-world gameplay numbers.
Bit-tech uses the same method as [H] and their numbers are similar, taking into account the resolution and settings used. I believe Driver-Heaven uses the same method too, and the results in the games that both DH and [H] used, are also similar, again taking into account the differences in resolution and settings used.
So, let's see the exclusively timedemo sites, provide their real-world gameplay numbers. The best way would actually be for a 3rd party entity to join the staff of these sites and provide a system with an array of graphics cards to test. That way no one could say they are making stuff up.

Unfortunately, I don't think the timedemo sites will answer this challenge, because real-world gameplay is hard work and some already "confirmed" that they are not willing to do it.

tokey · Feb 12, 2008

Silus said:
The best way would actually be for a 3rd party entity to join the staff of these sites and provide a system with an array of graphics cards to test. That way no one could say they are making stuff up.

i'll do it! send me a bunch of free hardware and software and i will take time out of each day to enjoy....er benchmark each and every video card.... heck free video cards and software would be good..... i can use my po mans rig to test on.....

Love-Man · Feb 12, 2008

Anandtech has changed their own benchmark methods.

UT3 when testing x2
http://www.anandtech.com/video/showdoc.aspx?i=3209&p=9

UT3 when testing a new system
http://www.anandtech.com/systems/showdoc.aspx?i=3223&p=7

They added bots and admittedly decreased reproducibility to make the benchmark more like real world gameplay.

I think it's a good start for Anandtech.

jbzx86 · Feb 12, 2008

Love-Man said:
Anandtech has changed their own benchmark methods.

UT3 when testing x2
http://www.anandtech.com/video/showdoc.aspx?i=3209&p=9

UT3 when testing a new system
http://www.anandtech.com/systems/showdoc.aspx?i=3223&p=7

They added bots and admittedly decreased reproducibility to make the benchmark more like real world gameplay.

I think it's a good start for Anandtech.

I think so too. As flawed as real world gaming might be from a scientific approach, it is far more realistic in determining actual performance than timedemos. The issue really isn't so much that the method is wrong as there is more tampering from outside forces to change the results in favor of their products.

cyclone3d · Feb 12, 2008

As many other people have brought it up.. I too want to see the [H] run the benchies again with a Quad running at nice high speeds.

My Q6600 is running at 3.68Ghz.. How about something like that?

The others are right... get rid of the system as a bottleneck!

Silus · Feb 12, 2008

cyclone3d said:
As many other people have brought it up.. I too want to see the [H] run the benchies again with a Quad running at nice high speeds.

My Q6600 is running at 3.68Ghz.. How about something like that?

The others are right... get rid of the system as a bottleneck!

How about asking the other sites to provide real-world gameplay numbers ? Why must [H] do everything for everyone ?

mkygod · Feb 12, 2008

Honestly, was it really necessary to take 6 pages to explain that realworld gameplay show more realistic gameplay experiences with videocards than "canned" timedemos? We've known this since the days of Quake1.

This is really confusing article because makes it sound like ALL canned timedemos are a bad thing. This is obviously not true because you get quite realistic results from the builtin "canned" gameplay timedemos that we've seen time and time again the Quake series, among others.

HardOCP should be more specific by saying that fly-through/fly-by timedemos are the ones that don't work well: For instance the Crysis GPU timedemo, Unreal flythrough benchmark, and other flyby ones we've seen in the past.

Benchmarking the Benchmarks @ [H]

Supreme [H]ardness

[H]ard|Gawd

[H]ard|Gawd

[H]ard|Gawd

Gawd

n00b

Gawd

Gawd

[H]ard|Gawd

n00b

Gawd

Supreme [H]ardness

Gawd

2[H]4U

Supreme [H]ardness

Supreme [H]ardness

Gawd

Supreme [H]ardness

Supreme [H]ardness

Gawd

Gawd

Supreme [H]ardness

n00b

Supreme [H]ardness

Gawd

Weaksauce

n00b

Supreme [H]ardness

Weaksauce

n00b

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Limp Gawd

Limp Gawd

Fully [H]

Supreme [H]ardness

Limp Gawd