Quantcast
 


NVIDIA GeForce GTX 470 Review

Author: Michael "SKYMTL" Hoenig
Date: March 26, 2010
Product Name: NVIDIA GeForce GTX 470
 
Share |

In-Depth GF 100 Architecture Analysis (Core Layout)


The first stop on this whirlwind tour of the GeForce GF100 is an in-depth look at what makes the GPU tick as we peer into the core layout and how NVIDIA has designed this to be the fastest graphics core on the planet.

Many people incorrectly believed that the Fermi architecture was primarily designed for GPU computing applications and very little thought was given to the graphics processing capabilities. This couldnít be further from the truth since the computing and graphics capabilities were determined in parallel and the result is a brand new architecture tailor made to live in a DX11 environment. Basically, NVIDIA needed to apply what they had learned from past generations (G80 & GT200) to the GF100.


What you are looking at above is the heart and soul of any GF100 card: the core layout. While we will go into each section in a little more detail below, from the overall view we can see that the main functions are broken up into four distinct groups called Graphics Processing Clusters or GPCs which are then broken down again into individual Streaming Multiprocessors (SMs), raster engines and so on. To make matters simple, think of it way: in its highest-end form, a GF100 will have four GPCs, each of which is equipped with four SMs for a total of 16 SMs broken up into groups of four. Within each of these SMs are 32 CUDA Cores (or shader processors from past generations) for a total of 512 cores in total. However, the current GTX 480 and GTX 470 cards make do with slightly less cores (480 and 448 respectively) while we are told there will be a 512 core version in the near future.

On the periphery of the die is the GigaThread Engine along with the memory controllers. The GigaThread Engine performs the somewhat thankless duty of reading the CPUís commands over the host interface and then fetching data from the systemís main memory bank. The data is then copied over onto the framebuffer of the graphics card itself before being passed along to the designated engine within the core. Meanwhile, the GF100 incorporates a total of six 64-bit GDDR5 memory controllers for a total of 384-bits. The massive amount of bandwidth created by a 384-bit GDDR5 memory interface will provide extremely fast access to the system memory and eliminate any bottlenecks seen in past generations.


Each Streaming Multiprocessor holds 32 CUDA cores along with 16 load / store units which allows for a total of 16 threads per clock to be processed. Above these we see Warp Schedulers along with the associated dispatch units which process 32 concurrent threads (called Warps) to the cores.

Finally, closer to the bottom of the SM is the L1 / L2 cache, Polymorph Engine and the four texture units. In total, the maximum number of texture units in this architecture is 64 which should come as a surprise considering the outgoing GT200 architecture supported up to 80 TMUs. However, NVIDIA has implemented a number of improvements with the way the architecture handles textures which we will go into in a later section. Suffice to say that the texture units are now integrated into the SP without having multiple SPs addressing a common texture cache.


Independent of the SM structure is six dedicated partitions of eight ROP units for a total of 48 ROPs as opposed to the 32 units from the GT200 architecture. Also different from the GT200 layout is that instead of backing up directly into the memory bus, the ROPs interface with the shared L2 cache which provides a quick interface for data storage.
 
 
 

Latest Reviews in Video Cards
September 18, 2014
With the GTX 980, NVIDIA's Maxwell architecture has finally morphed into a bonafide high end graphics card with the new GM204 core. With a price of just $550, it is about to upturn the GPU market....
August 20, 2014
PNY's latest Customized series will be rolling through their GTX 780 and GTX 780 Ti lineups, bringing high end cooling and increased performance....
August 7, 2014
PNY's GTX 780 Ti may not be the fastest card on the block but with a lifetime warranty it represents a great value. Plus, when placed in SLI it can outpace a TITAN Z....