NVIDIA’s GeForce GF100 Under the Microscope

by Michael "SKYMTL" Hoenig     |     January 17, 2010

Improved ROP and Texture Performance


For the better part of this article we have been talking about what NVIDIA has done to increase performance in DX11 and attain that ever elusive “geometric realism”. Meanwhile, it is important to remember that ROP and Texture Unit performance also plays a huge roll in past, present and future games. There are several very popular current games such as Crysis and Left 4 Deal that take advantage of highly detailed textures while ROP performance is critical for performance scaling when anti-aliasing is enabled.


ROP Performance Details

Even though the GF100 architecture does feature significantly more ROPs than the GT200 (48 versus 32), NVIDIA did much more with these units than just add more of them to the die. Each of the six groups of eight ROPs is serviced by a single dedicated 64-bit memory controller for increased efficiency, but unlike other architectures the ROPs don’t have a dedicated cache. Rather, they make use of the shared 768KB of on-die L2 cache and can each output a 32-bit integer pixel per clock, an FP16 pixel over two clocks, or an FP32 pixel over four clocks. In plain English, this means the ROPs are far more flexible than those found on the GT200 architecture.


This new and improved ROP layout and design means a drastic increase in AA performance as you can see in the slide above. Where the GT200 architecture experienced a 60% drop in performance when going from 4x to 8x AA, the GF100 shows a mere 24% fallout. This minimal drop can also be chalked up to improved framebuffer efficiency as well.

With the GF100, it seems that we can expect to play games with extreme IQ settings enabled without having to worry about framerates tanking.


Texture Unit Performance Details

At the beginning of this article we mentioned that the GF100 architecture actually has less texture units than the GT200 (64 versus 80) which when taken at face value does seem concerning but there’s more to these GF100 texture units than what first meets the eye.

First of all, let’s refresh our memory about the GT200 texture unit layout and its specifications. Basically, the older architecture had multiple SMs sharing one texture unit which caused a data bottleneck when more than one made a request at the same time. In addition, the speed of the texture units was directly tied to the core clock. All of these points made the texture units on the GT200 perform quite well but they went about their jobs inefficiently.

With the GF100 architecture on the other hand, each SM has its own texture unit so multiple SMs don’t have to compete for the same texture cache. In addition, these new units run asynchronously to the core clock speed and are actually designed to run significantly faster than the core itself. This means a fair amount of scalability within the way the GF100 addresses textures. Additionally, the GF100’s texture units also include total support for DirectX 11’s BC6H and BC7 texture compression formats which are supposed to reduce the memory footprint of HDR textures and render targets.


What NVIDIA has set about to accomplish is the use of less texture units but increased per-unit performance in high texture situations. As such, even with less texture units, the GF100 is able to run circles around the GT200 in terms of high-level texture performance which bears out to a 60% increase in texture-only framerates for certain games.
 
 
 

Latest Reviews in Video Cards
February 8, 2012
The HD 7970 is currently one of the most popular cards around and Gigabyte has once again done their part to design a custom card that improves upon the reference design in nearly every way possible. ...
February 6, 2012
The HD 7970 3GB is currently the highest performing graphics card on the market and AMD's board partners have been quick to take advantage of its willingness to overclock.  XFX's Black Edition Double ...
January 30, 2012
With the HD 7970 sitting firmly at the forefront of today's GPU market, it was only a matter of time until its performance trickled down into lower end products as well.  Today marks the launch of AMD...
Digg this Post!Share on Twitter