Tuesday, October 23, 2007

on FutureMark's "Vantage" benchmark.

Recently I read through a break down on the FutureMark produced Vantage benchmark suite. Now, I don't like FutureMark to begin with. Their benchmarks have had multiple stability and performance issues in the past. Exampling the most recent 3DMark, 3DMark 2006, the benchmark tested Shader Model 3.0 support under Microsoft Windows Operating systems. However, on ATi's Shader Model 3.0 hardware, ShaderDMark 2006 test would crash out on looped testing. Thus, stress testing ATi Shader Model 3.0 hardware on 3DMark 2006 using the loop methods required disabling the Shader Model 3.0 tests.

FutureMark's products have generally been found to be less than accurate at their stated task, comparing computer hardware, and benchmarking overall performance. Rather, the benchmarks are generally only useful for comparing identical hardware at different clock rates in order to get a handle on the performance difference. This is because FutureMark's products are generally Synthetic Benchmarks.

HardOCP has gone into the subject of why Synthetic Benchmarks are bad to begin with, and like me, they pursue benchmarking trying to capture how applications actually perform in real usage conditions. Thing is, in my own testings with real-usage conditions between Intel Core2 and AMD Socket AM2 processors, I can't find that much difference between identically clocked hardware. In a post made in April I pointed out the following:

Something else popped into my head a while back and I figure now is a good as time as any to address it. Someone sent me an email pointing out that Intel and AMD processors were not sold in equivalent clockrates.

This is true: AMD Athlon64 processors have clock rates of 1.9ghz, 2.0ghz, 2.1, 2.2, and so on up.

Intel Conroe processors have clock rates of 1.86ghz, 2.13ghz, 2.4ghz, and 2.66ghz.

Part of this is due to the antiquated front side bus design that the Conroe processors use. The other part is that it prevents direct comparisons of Conroe processors to Athlon64 X2 processors.

Say what? Think about it... aside from the 2.4ghz entry, every other Conroe based processor isn't at an "even" stepping. If you normalize the clock rates in order to compare it to an AMD Athlon64, you either have to underclock, or overclock, one of the processors.

Cute trick isn't it.


In my own personal tests, I've never found a performance situation where a Core2 at 2.66ghz drastically outpowers a 2.6ghz Socket AM2, or approaches a 2.8ghz Socket AM2 on a regular basis.

But, when the overview pitted a 2.66ghz Core2 against a 3.0ghz Socket AM2... That's exactly what they got. The 2.66ghz outperforming or matching the 3.0ghz
Socket AM2 in every test. That's where this part comes in.
I find the results... well, questionable to begin with. In my own experience between Conroe and Socket AM2 processors I've found that they are clock for clock equal. That the Core2 consistently posted better scores than the faster clocked AMD chip? Indicates a severe problem in the benchmark.

A comment made during the article indicated that the Vantage benchmark was specifically programmed to not deny the results of previous entries in the PCMark Suite. Cue something that smells like a rat again. I thought the point of a benchmark was to find out factual performance. Not deny results that were already known. So, the benchmark suite admits that it is not actually looking for factual performance. It's looking for performance that fits already known results. I've already referenced behavior like this before, back in the Supplemental about Linux Development, when I quoted from Hogan's Heroes.


Continuing on, I then stated the following:
As to why the benchmark is like it is, and why the program is biases in favor of Intel's products, I am not likely to find out on my own. Vista is a Dead on Arrival OS. That Microsoft had to relax its restrictions on Xp sales, then abandon the restrictions all-together indicates exactly how bad Vista is doing. When major OEM's openly refer to Vista as "Windows ME II", and one makes an off the record comment that is should have been named "Microsoft Bobista", that should be an indication that OEM's are not moving Vista systems.

So, aside from its inaccurate reporting, PCMark Vantage went after the wrong market. It is not going to valuable to the reporting press, not in any sense. First hardware reviewers are going to have to get around Windows Hardware Activation (and after a string of having to call Microsoft 8 times in a row for my own GPU testing on a Vista Ultimate install), Vista was already a scratch for reviews to begin with.
The Windows ME II or Windows ME 2 comments are well known by now. The jokes about Vista following in the path of Microsoft Bob are not quite as well known, probably because an official partner of Microsoft's would probably have their job at risk if their name was attached to the line, even if they were the president or CEO of the vendor.

I have had another editorial on the back burner covering why Microsoft Windows and Microsoft Vista are rapidly removing the independent Hardware Review enthusiast. I wish I was joking that I had to call Microsoft 8 times in a row to re-activate Vista Ultimate, but I did. I finally abandoned my Vista testing because it simply wasn't worth my time to sit on the phone to reactivate Vista everytime I was swapping a graphics card out.

I wasn't finished with the initial comment though.
As far as I see it, Futuremark simply doesn't get it. DirectX is a dead API. Several game developers are having to rapidly rethink development strategy and move to OpenGL rendering to cover all platforms, while at the same time maintaining Shader Model 4 (what Microsoft calls DX10) support. Had Futuremark done their benchmarks in OpenGL, then this benchmark would have been valid for Vista releases, NT5 releases, and could easily be ported to Linux Kernel platforms. Futuremark didn't, so the benchmark is worthless from the start, even ignoring all the other factors.

What makes this worse, from my point of view, is that FutureMark has created OpenGL based tests before, for mobile platforms. But for desktop gaming where OpenGL is the only reliable cross-platform Graphics API available?

From my point of view, the use of DirectX 10 to provide Shader Model 4.0 support falls in with the already pre-determined benchmark results. FutureMark is not interested in creating reliable cross-platform absolute performance benchmarks. Over the years of the FutureMark benchmarks, such behavior has become more obvious. FutureMark bows to the highest bidder, and makes sure that their benchmarks show what advertisers want the benchmarks to show.

Most of the times when I do these breakdowns, such as with Sony or Intel, I typcially advise the company on how to repair the damage done by the product or action. I don't think I can with FutureMark. All I can do is hope that other gamers and PC users figure out that FutureMark is again being dishonest and sneaky with their benchmarks, and that as a benchmark, Vantage is not a viable choice.


HardOCP entries on Synthetic Benchmarks : #1 / #2 / #3

No comments: