GK110 Bares All & Adds Double Precision
GK110 Bares All & Adds Double Precision
With Kepler maturing in a number of product spaces, NVIDIA has gradually perfected their manufacturing process, increasing yields and allowing the GK110 to become a bona fide option for the GeForce lineup. However, since Kepler was made with gaming and HPC environments in mind, porting it required very few sacrifices and large blocks of advanced HPC-oriented features have been carried over en masse.
GK110 is by far the largest and most complex GPU NVIDIA has ever built. It is a 7.1 billion transistor monster with a die that measures 551mm², which veritably dwarfs the 294mm² GK104 core and even outsizes the GF110’s 521mm². However, as we’ve already mentioned, this gigantic footprint hasn’t necessarily translated into out of control temperatures or power consumption like it did with GF100. Rather, NVIDIA has kept these variables on a short leash.
From a high level architectural standpoint, the GK110 core is just a supersized GK104 with a whole lot of cores and an additional GPC. Indeed, all Kepler GPUs share the same basic elements which fit together into a cohesive design. The real differences here lie at the SMX level which retains many of the Tesla-centric elements for optimized compute performance.
In its GeForce Titan guise this core incorporates 14 SMX blocks (a fully enabled GK110 houses 15 so one has been disabled, likely to increase yields) each of which holds 192 CUDA cores and 16 texture units for a total of 2688 cores and 224 TMUs. These are split into five GPCs, each of which contains its own Raster Engine. Even though GK104 uses a pair of SMXs per engine, there shouldn’t be any additional overhead since the central processing stages are more than fast enough to ensure the Raster Engines don’t fall behind in their scheduled tasks and bottleneck performance.
As with all of NVIDIA’s architectures dating back to Fermi, the memory controller, ROP structure and L2 cache are tied at the hip, leading to six 64-bit memory controllers which are each paired up with eight ROPs and 256KB of L2 Cache. For more detail about the Kepler architecture, make sure to read our architectural analysis posted in our GTX 680 review.
The largest changes in the GK110 reside in the way it handles compute data. While the SMX layout still includes the PolyMorph Engine’s fixed function stages, 64KB of shared memory, data cache and its associated texture units, the CUDA core layout has been drastically changed. It still houses 192 single precision cores backed up by 32 load/store units and 32 special function units which are able to process 32 parallel threads, but these have been augmented with 64 FP64 Double Precision units.
While the GK104 core did feature Double Precision support, it only included eight units per 192-core SMX, leading to FP64 operations per clock which ran just 1/24th the SP data rate. With TITAN NVIDIA has increased this to 1/3, allowing for 896 concurrent threads to be processed within a single GK110 GPU. In addition, when working in FP64 mode, TITAN will eliminate Boost but also operate at dynamically lower clock speeds.
At face value, the inclusion of full Double Precision functionality may not seem like a major selling point for enthusiasts and truth be told, it isn’t. Games and even applications like [email protected] simply don’t use the double precision floating point format. Rather, granting access to a $999 FP64 powerhouse makes CUDA development much more accessible since full DP compliance no longer requires a $3000 Tesla K20 or $4500 K20x card. NVIDIA is hoping this will lead to something of a renaissance for CUDA programming and will open up this stage to a whole new beginner-focused market.
Since gamers won’t want to run their card in its 896-core Double Precision mode, NVIDIA has granted easy on/off control over it. Simply change the mode within NVIDIA’s Control Panel to GeForce Titan and you’re off to the races, though at slightly lower clock speeds than if the card where running under full 3D mode.
|Latest Reviews in Video Cards|