Quantcast
 


NVIDIA GeForce GTX TITAN; GK110’s Opening Act

Author: SKYMTL
Date: February 18, 2013
Product Name: GeForce GTX Titan
Share |

GK110 Bares All & Adds Double Precision


With Kepler maturing in a number of product spaces, NVIDIA has gradually perfected their manufacturing process, increasing yields and allowing the GK110 to become a bona fide option for the GeForce lineup. However, since Kepler was made with gaming and HPC environments in mind, porting it required very few sacrifices and large blocks of advanced HPC-oriented features have been carried over en masse.


GK110 is by far the largest and most complex GPU NVIDIA has ever built. It is a 7.1 billion transistor monster with a die that measures 551mm², which veritably dwarfs the 294mm² GK104 core and even outsizes the GF110’s 521mm². However, as we’ve already mentioned, this gigantic footprint hasn’t necessarily translated into out of control temperatures or power consumption like it did with GF100. Rather, NVIDIA has kept these variables on a short leash.


From a high level architectural standpoint, the GK110 core is just a supersized GK104 with a whole lot of cores and an additional GPC. Indeed, all Kepler GPUs share the same basic elements which fit together into a cohesive design. The real differences here lie at the SMX level which retains many of the Tesla-centric elements for optimized compute performance.

In its GeForce Titan guise this core incorporates 14 SMX blocks (a fully enabled GK110 houses 15 so one has been disabled, likely to increase yields) each of which holds 192 CUDA cores and 16 texture units for a total of 2688 cores and 224 TMUs. These are split into five GPCs, each of which contains its own Raster Engine. Even though GK104 uses a pair of SMXs per engine, there shouldn’t be any additional overhead since the central processing stages are more than fast enough to ensure the Raster Engines don’t fall behind in their scheduled tasks and bottleneck performance.

As with all of NVIDIA’s architectures dating back to Fermi, the memory controller, ROP structure and L2 cache are tied at the hip, leading to six 64-bit memory controllers which are each paired up with eight ROPs and 256KB of L2 Cache. For more detail about the Kepler architecture, make sure to read our architectural analysis posted in our GTX 680 review.


The largest changes in the GK110 reside in the way it handles compute data. While the SMX layout still includes the PolyMorph Engine’s fixed function stages, 64KB of shared memory, data cache and its associated texture units, the CUDA core layout has been drastically changed. It still houses 192 single precision cores backed up by 32 load/store units and 32 special function units which are able to process 32 parallel threads, but these have been augmented with 64 FP64 Double Precision units.

While the GK104 core did feature Double Precision support, it only included eight units per 192-core SMX, leading to FP64 operations per clock which ran just 1/24th the SP data rate. With TITAN NVIDIA has increased this to 1/3, allowing for 896 concurrent threads to be processed within a single GK110 GPU. In addition, when working in FP64 mode, TITAN will eliminate Boost but also operate at dynamically lower clock speeds.


At face value, the inclusion of full Double Precision functionality may not seem like a major selling point for enthusiasts and truth be told, it isn’t. Games and even applications like Folding@Home simply don’t use the double precision floating point format. Rather, granting access to a $999 FP64 powerhouse makes CUDA development much more accessible since full DP compliance no longer requires a $3000 Tesla K20 or $4500 K20x card. NVIDIA is hoping this will lead to something of a renaissance for CUDA programming and will open up this stage to a whole new beginner-focused market.

Since gamers won’t want to run their card in its 896-core Double Precision mode, NVIDIA has granted easy on/off control over it. Simply change the mode within NVIDIA’s Control Panel to GeForce Titan and you’re off to the races, though at slightly lower clock speeds than if the card where running under full 3D mode.
 
 
 

Latest Reviews in Video Cards
September 23, 2014
A price of just $339, GTX 780-beating performance and near silent operation. The ASUS GTX 970 STRIX OC brings all of those elements and more to the table in what may be the best graphics card availab...
September 18, 2014
With the GTX 980, NVIDIA's Maxwell architecture has finally morphed into a bonafide high end graphics card with the new GM204 core. With a price of just $550, it is about to upturn the GPU market....
August 20, 2014
PNY's latest Customized series will be rolling through their GTX 780 and GTX 780 Ti lineups, bringing high end cooling and increased performance....