Sunday, August 29, 2010

ATi Catalyst 10.8 / Unigine: Performance Scaling on Higher End Hardware Part 4

While mucking about with Unigine on the little Asus F3Ka Laptop, I went back to trying to sort out the stability issues of the 5770's when coupled against the Phenom II platform. After witnessing Unigine's inability to scale at all I decided to show the engine scaling against a more powerful processor. The easiest way to do this would be to drop the multiplier on the processor and benchmark Unigine as the processor performance increased. Of the motherboards I have on hand though then, that wasn't quite working. The Intel D975XBX v1 motherboard, for example, was sold as a Crossfire Capable Gaming motherboard, but it's BIOS is complete and utter Junk, and the processor multiplier is not exposed within a BIOS made by Intel. The DFI LanParty Jr X58-T3H6 does expose the multiplier, but the Windows 7 64bit Operating System wouldn't recognize it. I'd set a really low multiplier for the OS, boot up, check CPUZ, and it would be at the stock 20x multiplier. The Asus M4N82-D I had performed exactly the same with the multiplier being manipulatable in the BIOS, and the Operating System not caring what the Multiplier was set to. Surprisingly, Nvidia's revamped nTune software package also refused to allow changing the multiplier in real time, although I could change everything else.

That just left me with the semi-broken Phenom II system. It's still not exactly what I would call stable as everytime Windows 7 is shut down the entire system blue-screens and reports a recovery from a serious error on the reboot. However, with AMD's OverDrive system software I can manipulate the processor's multipliers in real time. So once again I wind up with a series of benchmarks that are not applicable to real-world performance. The processor is still a Quad-Core Phenom II Processor, and it's still backed by 8gigs of ram.

Because of the previous benchmarks we know that enabling Tessellation carries a huge performance hit under the OpenGL API. We also know from the previous benchmarks that Crossfire support for OpenGL is either broken, or missing in action. So, for each multiplier setting I ran 3 different benchmarks:

  • OpenGL: tessellation off
  • DirectX: tessellation off / crossfire off
  • DirectX: tessellation on / crossfire on

This allows us to continue to compare the OpenGL and DirectX API's, while showing the benefit that a multi-gpu rig can have under certain circumstances.

X4: 800mhz



What you might not know, and probably won't even care about, is that 800mhz is the average operating speed of most AMD processor's under Windows NT5, Windows NT6, and most Linux Distributions. These operating systems largely support power-saving schema and unless you put the system into a high-performance mode the processor will default to a low operating speed under most computing tasks.

That being said, I did find an interface bug with the Overdrive UI. I have the multiplier set at x4, but the UI does not indicate this change. Which is why CPUZ is open to confirm that a x4 multiplier is set.

OpenGL:



DirecX 11: Tess off / CFX Off



DirectX 11: Tess On / CFX On



The results are odd to say the least. With this processor speed the OpenGL renderer turned in a score nearly half of it's DirectX counterpart.

Adding a second GPU did pretty much nothing for the average frame-rate at first, till I put out that Tessellation was turned on. As was demonstrated in the earlier benchmarks simply enabling tessellation under DirectX could result in a 50% loss of performance.

x6: 1.2ghz



Bumping the processor speed up to 1.2ghz with a 6x multiplier sees the UI bug on OverDrive still hanging around. So what did this do to our Game's performance?

OpenGL




DirectX: Tess Off / CFX Off



DirectX: Tess On / CFX On



Here we can see that the gap between OpenGL and DirectX is closing up. Interestingly, even with Tessellation turned on, the Crossfire setup is already pulling away and delivering a playable experience. So let's turn up the processing speed once again.

x8: 1.6ghz



Interestingly, AMD's OverDrive utility is now showing the multiplier that is set as well as CPUZ.

OpenGL:



DirectX Tess off / CFX Off:



DirectX Tess On / CFX On:



At this clock-speed we see the OpenGL API close the gap on the DX11 API, as now are both within a few points of each other. We also see the Crossfire configuration extend it's lead, even with Tessellation active.

What we also see is that the overall performance isn't making the same leaps and bounds that it was before. There is there practically no difference between the single GPU 5770 at 1.2ghz and 1.6 ghz, and the average frames per second spread between the 1.6ghz and 800mhz is only 2frames. Doubling the clock-speed, at least in this benchmark, only netted a realistic 2 extra frames per second. That being said, the minimum frame-rate did go up significantly, from just under 8 frames to around 13 frames.

There was also a sharp difference between the 800mhz Crossfire Configuration and the 1.2ghz Crossfire configuration. There's not as much difference between the 1.2ghz Crossfire Configuration and the 1.6ghz Crossfire configuration, with the maximum number of frames barely moving.

So, let's bump the clocks up again.

x10: 2ghz



Now we've gotten to the same clock-speed as the Turion64 in the Asus F3Ka. Unlike the Turion64, the Phenon II has two more processing cores, loads more cache, a much faster memory bus, and a much faster system bus. It's also coupled with a much more powerful Graphics Processor.

OpenGL



DirectX: Tess Off / CFX Off




DirectX: Tess On / CFX On



An additional 400mhz sees the Single Card OpenGL performance sustain a higher average frame-rate than the Single Card DirectX performance figures, while having both a lower minimum frame-rate and a lower maximum frame-rate.

The Single Card DirectX Performance is largely unchanged from the 1.6ghz speed, gaining only a few frames on the minimum side, but barely pushing the average frames per second or the maximum frames.

Crossfire Performance is still increasing, but again, not by much. The immediate conclusion to make is that at this point Unigine Heaven 2.1 is not being limited by the processor, but is instead being held back by the graphics cards.

Which becomes pretty evident as the speed is bumped again:

x12: 2.4ghz



OpenGL



DirectX: Tess Off / CFX Off



DirectX: Tess On / CFX On



At 3 times the base clock speed, the only API that has shown any significant improvement is the OpenGL API, which once again manages a higher average frame-rate than it's DX counterpart, while still having a much lower minimum frame-rate and a lower maximum frame-rate.

The Single Card DirectX performance has doubled it's minimum frame-rate, and the maximum number of frames has also increased. The average number of frames has only gone by about 2 frames per second.

The Crossfire DirectX performance has improved dramatically from the 800mhz base, but it hasn't improved so much from the 2ghz clock speed.

So, onwards to 2.8ghz:

x14: 2.8ghz




OpenGL:



DirectX Tess Off / CFX Off



DirectX Tess On / CFX On



I really could have left this series of benchmarks out. Absolutely nothing has changed in terms of performance.

x16: 3.2ghz / x17 3.4ghz



At 3.2ghz this system is now just 200mhz off the figures shown in the first posting on Catalyst 10.8. So we'll also include the x17 single cared figures as well, since I haven't done them on this system yet.

OpenGl 3.2ghz:



DirectX Tess Off / CFX Off 3.2ghz



OpenGL 3.4ghz



DirectX Tess Off / CFX Off 3.4ghz



DirectX Tess On / CFX On



As you probably expected, performance still did not change. With 4 times the raw processing speed, and the Single Card DirectX performance basically went absolutely nowhere.

Unigine, at least in the benchmarks available for download, is a questionable software product. I'm left wondering just how much of the performance difference in OpenGL is down to AMD's drivers, or Unigine's product. I'm also left wondering why Crossfire scaling ceased to scale as well. Is it an engine issue or is it a Driver issue?

As I said, the real-world benefit of this performance scaling is pretty much nil. There's no commercial game on the market that uses the Unigine Engine. Hopefully, when those games arrive, they'll show much better performance scaling across hardware than the benchmarks.

No comments: