|by Michael "SKYMTL" Hoenig | April 2, 2009|
The GT200-series Architecture
The GT200-series Architecture
The GT200-series represents Nvidia’s first brand new architecture since the G80 launched all the way back in November of 2006. In human years this timeframe may have not seemed like a long time but in computer years it was an eternity.
Even though these new cards are still considered graphics cards, the GT200 architecture has been built from the ground up in order to make use of emerging applications which can use parallel processing. These applications are specifically designed to take advantage of the massive potential that comes with the inherently parallel nature of a graphics card’s floating point vector processors. To accomplish this, Nvidia has released CUDA which we will be talking about in the next section.
On the graphics processing side of things the GT200 series are second generation DX10 chips which do not support DX10.1 like some ATI cards while promising to open a whole new realm in graphics capabilities. Nvidia’s mantra in the graphics processing arena is to move us away from the photo-realism of the last generation of graphics cards into something they call Dynamic Realism. For Nvidia, Dynamic Realism means that not only is the character rendered in photo-real definition but said character interacts with a realistically with a photo real environment as well.
To accomplish all of this, Nvidia knew that they needed a serious amount of horsepower and to this end have released what is effectively the largest, most complex GPU to date with 1.4 billion transistors. To put this into perspective, the original G80 core had about 686 million transistors. Let’s take a look at how this all fits together.
Here we have a basic die shot of the GT200 core which shows the layout of the different areas. There are four sets of processor cores clustered into each of the four corners which have separate texture units and shared frame buffers. The processor core areas hold the individual Texture Processing Clusters (or TPCs) along with their local memory. This layout is used for both Parallel Computing and graphics rendering so to put things into a bit better context, let’s have a look at what one of these TPCs looks like.
Each individual TPC consists of 24 stream (or thread) processors which are broken into three groups of eight. When you combine eight SPs plus shared memory into one unit you get what Nvidia calls a Streaming Multiprocessor. Basically, a GTX 280 / 285 will have ten texture processing clusters each with a grand total of 24 stream processors for a grand total of 240 processors. On the other hand a GTX 260 has two clusters disabled which brings its total to 192 processor “cores”. Got all of that? I hope so since we are now moving on to the different ways in which this architecture can be used.
At the top of the architecture shot above is the hardware-level thread scheduler that manages which threads are set across the texture processing clusters. You will also see that each “node” has its own texture cache which is used to combine memory accesses for more efficient and higher bandwidth memory read/write operations. The “atomic” nodes work in conjunction with the texture cache to speed up memory access when the GT200 is being used for parallel processing. Basically, atomic refers to the ability to perform atomic read-modify-write operations to memory. In this mode all 240 processors can be used for high-level calculations such as a Folding @ Home client or video transcoding
This architecture is primarily used for graphics processing and when it is being as such there is a dedicated shader thread dispatch logic which controls data to the processor cores as well as setup and raster units. Other than that and the lack of Atomic processing, the layout is pretty much identical to the parallel computing architecture. Overall, Nvidia claims that this is an extremely efficient architecture which should usher in a new damn of innovative games and applications.
|Latest Reviews in Featured Reviews|