Quantcast
 


NVIDIA GeForce GTX TITAN; GK110ís Opening Act

Author: SKYMTL
Date: February 18, 2013
Product Name: GeForce GTX Titan
Share |

GK110 Bares All & Adds Double Precision


With Kepler maturing in a number of product spaces, NVIDIA has gradually perfected their manufacturing process, increasing yields and allowing the GK110 to become a bona fide option for the GeForce lineup. However, since Kepler was made with gaming and HPC environments in mind, porting it required very few sacrifices and large blocks of advanced HPC-oriented features have been carried over en masse.


GK110 is by far the largest and most complex GPU NVIDIA has ever built. It is a 7.1 billion transistor monster with a die that measures 551mm≤, which veritably dwarfs the 294mm≤ GK104 core and even outsizes the GF110ís 521mm≤. However, as weíve already mentioned, this gigantic footprint hasnít necessarily translated into out of control temperatures or power consumption like it did with GF100. Rather, NVIDIA has kept these variables on a short leash.


From a high level architectural standpoint, the GK110 core is just a supersized GK104 with a whole lot of cores and an additional GPC. Indeed, all Kepler GPUs share the same basic elements which fit together into a cohesive design. The real differences here lie at the SMX level which retains many of the Tesla-centric elements for optimized compute performance.

In its GeForce Titan guise this core incorporates 14 SMX blocks (a fully enabled GK110 houses 15 so one has been disabled, likely to increase yields) each of which holds 192 CUDA cores and 16 texture units for a total of 2688 cores and 224 TMUs. These are split into five GPCs, each of which contains its own Raster Engine. Even though GK104 uses a pair of SMXs per engine, there shouldnít be any additional overhead since the central processing stages are more than fast enough to ensure the Raster Engines donít fall behind in their scheduled tasks and bottleneck performance.

As with all of NVIDIAís architectures dating back to Fermi, the memory controller, ROP structure and L2 cache are tied at the hip, leading to six 64-bit memory controllers which are each paired up with eight ROPs and 256KB of L2 Cache. For more detail about the Kepler architecture, make sure to read our architectural analysis posted in our GTX 680 review.


The largest changes in the GK110 reside in the way it handles compute data. While the SMX layout still includes the PolyMorph Engineís fixed function stages, 64KB of shared memory, data cache and its associated texture units, the CUDA core layout has been drastically changed. It still houses 192 single precision cores backed up by 32 load/store units and 32 special function units which are able to process 32 parallel threads, but these have been augmented with 64 FP64 Double Precision units.

While the GK104 core did feature Double Precision support, it only included eight units per 192-core SMX, leading to FP64 operations per clock which ran just 1/24th the SP data rate. With TITAN NVIDIA has increased this to 1/3, allowing for 896 concurrent threads to be processed within a single GK110 GPU. In addition, when working in FP64 mode, TITAN will eliminate Boost but also operate at dynamically lower clock speeds.


At face value, the inclusion of full Double Precision functionality may not seem like a major selling point for enthusiasts and truth be told, it isnít. Games and even applications like Folding@Home simply donít use the double precision floating point format. Rather, granting access to a $999 FP64 powerhouse makes CUDA development much more accessible since full DP compliance no longer requires a $3000 Tesla K20 or $4500 K20x card. NVIDIA is hoping this will lead to something of a renaissance for CUDA programming and will open up this stage to a whole new beginner-focused market.

Since gamers wonít want to run their card in its 896-core Double Precision mode, NVIDIA has granted easy on/off control over it. Simply change the mode within NVIDIAís Control Panel to GeForce Titan and youíre off to the races, though at slightly lower clock speeds than if the card where running under full 3D mode.
 
 
 

Latest Reviews in Video Cards
November 23, 2014
EVGA's GTX 970 FTW is one of the fastest sub-$400 GPUs on the market and when paired up with the new ACX 2.0 heatsink, it also happens to be one of the quietest....
November 12, 2014
There has been a lot of talk about the prevalence of graphics card coil whine and we decided to take a closer look by detailing our tests of 50 cards from NVIDIA and AMD....
November 9, 2014
NVIDIA's GTX 970 is arguably the most popular graphics card on the market today. In this roundup we take a look at examples from EVGA, GALAX and Gigabyte....