Say Hello to the GF104
Say Hello to the GF104
When it came to shrinking the GF100 which graces the GTX 480, GTX 470 and GTX 465, NVIDIA looked closer at the architecture and decided to make a few changes to a number of areas. In order to scale back things, certain sacrifices would have to be made if the original GF100 layout was kept as we speculated in our original article. The main issue with the GF100 is a real lack of texture units as you start eliminating Streaming Multiprocessors. So, if linear scaling was kept, NVIDIA have had possible lower-end GPUs with 320 cores or fewer cores but only 40 or fewer texture units. Like it or not texture performance is still one of the cornerstones of modern games and if that trend had continued, NVIDIA may have found it very hard to compete with the HD 5000 series.
One of the primary reasons behind designing the GF104 was the need to lower the thermal and power consumption needs of the Fermi architecture by producing a more compact core. Not only is this easier and less expensive to produce but it also allows NVIDIA to attack certain price points which ATI may have left vacant.
The differences between the GF100 and GF104 layouts start with the Streaming Multiprocessor which houses the CUDA cores, Texture Units, Polymorph Engine, Warp Schedulers, Load / Store units, SFUs and their associated cache hierarchies. Let’s start at the top and make our way down.
Instead of two Dispatch Units each being accessed by their own Warp Schedulers, the GF104 makes use of a 2:1 ratio between the dispatch units and the schedulers while the number of Special Function Units has doubled per SM. As a result, transcendental instruction performance has been increased over the GF100 even though the number of concurrent threads has remained as it was. Otherwise, the Instruction Cache and the Register File size stay the same as GF100.
The main changes to the SM come with the number of CUDA cores as well as the number of texture units each houses. Instead of the usual 32 cores per SM, the GF104 uses a structure which allows for 48 cores along with 16 load / store units and 8 Special Function Units. This in and of itself is quite an eye opener but the real differences are with the number of texture units each Streaming Multiprocessor houses. The GF100 cards have four texture units per SM while the NVIDIA equipped the GF104 with eight TMUs per SM. This can and will lead to a massive increase in texture performance which will benefit older DX10 and DX9 games.
Much like the GF100 layout, the GF104 makes use of four Streaming Multiprocessors (SMs) and their associated Polymorph Engines per GPC along with a common Raster Engine. The only differences are the ones we mentioned above and the result is a GPC with sufficiently more horsepower than the GF100 was able to put forth.
Above is a picture of a full GF104 core and we’re hoping you are paying very close attention to its layout and the number of cores it houses within its two GPCs. In total, there are 384 cores, 64 texture units, 32 ROPs, 512KB of L2 cache and four 64-bit memory controllers. It is quite evident that even though the GF104’s SM structure got a face lift, NVIDIA kept the ROP, L2 cache and memory controller array as is when making the transition from the GF100.
To us it looks like NVIDIA took some of the lessons it learned from the GF100 and put them towards designing a core that is infinitely more adaptable for the sub-$250 market. Not only is the GF104 much more compact than the higher-end silicon (it has 1.95 billion transistors versus the GF100’s 3 billion) but it is supposedly quite a bit more efficient as well. The one thing which could hold it back is the fact that it only has a maximum of eight PolyMorph Engines that are essential for DX11 performance. For example, if you wanted to achieve 384 cores with a GF100, a total of 12 SMs (and 12 PolyMorph Engines) would be needed. Will these eight or fewer engines have a negative impact on the GF104’s DX11 performance in future applications? Only time will tell but for the time being this looks like the perfect graphics processor for the current mid-range market.
|Latest Reviews in Video Cards|