The GCN 1.2 Architecture…On Testosterone
The GCN 1.2 Architecture…On Testosterone
AMD’s GCN architecture may seem to have more lives than disco, the fact of the matter is that a brand new ground-up redesign hasn’t been needed. The same basic principles that made it so potent years ago still hold true today though, as DX12 begins rolling out, some of its higher level feature sets may begin to show their age in the next 18 to 24 months.
As was mentioned in the introduction, Fiji is essentially an upscaled GCN 1.2 part which is very similar to Tonga in its capabilities. This means it shares a lot in common with the GCN 1.1-based and Hawaii / Grenada cores, as with has additional efficiencies built into several key processing stages. From our standpoint there isn’t quite enough to call this a new completely version of Graphics Core Next since it still uses the 1.1 revision as a primary foundation. However, many of AMD’s design choices for Fiji are throughput-focused so the core can physically keep up with the ultra high bandwidth HBM interface.
In a core layout diagram there’s very little to distinguish Fiji from its predecessors. Like Hawaii it utilizes a quartet of Shader engines which hold the respective geometry processors, rasterizers, ROPs (within the Render Back-Ends) and Compute Units. Meanwhile, at a lower level the Compute Units still hold 64 cores and a quartet of texture units along with their associated caching blocks so not much has changed from a physical perspective there either. But don’t stop there and assume AMD simply took Hawaii, added in a bit of GCN 1.2 goodness and called it a day.
One of the first things AMD did to create Fiji was expand the Compute Unit count within each Shader Engine. In comparison to Grenada / Hawaii, there are five additional CU’s per engine which equates 325 additional cores, 20 more TMUs and of course more on-die caching capability.
By far the largest change here has to be the addition of HBM and the inner workings that were necessary to insure the high bandwidth memory interface was fed with enough information to justify its presence. To accomplish that, AMD doubled the L2 cache from 1MB to 2MB and instituted advanced lossless color compression algorithms for frame buffer reads and writes. Naturally, there’s also the eight 512-bit memory controllers that can process a veritable torrent of information towards the quartet of 1GB HBM modules.
All of this has contributed to a relatively large transistor count increase from Hawaii’s 6.2 billion to 8.9 billion on Fiji. That equates a die size of 596 mm² for the core and 1011 mm² when the HBM interposer is taken into account as well. Ironically, you can see that Hawaii’s legacy is still firmly ingrained into the die since there’s space for TrueAudio DSPs, a technology which AMD seems to have abandoned.
In order to keep power consumption to a minimum despite the massive die, AMD turned to a highly tuned 28nm manufacturing process. This allowed them to boost transistor count by 44% over Hawaii yet Fiji’s die size has only increased by a little over 35%. The maturity of 28nm has also contributed towards high transistor efficiency and lower overall power consumption. As a matter of fact, several insiders have told us AMD reduced their TDP forecasts numerous times over the course of designing this new core.
Eagle eyed readers may have noticed that despite increasing the number of SIMD cores and Texture Units, the number of Render Back-Ends containing the ROPs and other tertiary output functions have remained untouched. There are still four of them per Shader Engine which is the exact same number on Hawaii. At first glance that’s a bit concerning since overloading these key architectural elements could result in a bottleneck within some games.
According to AMD, any potential hindrance caused by a stagnant number of ROPs has been mitigated by a number of core modifications. The ROPs are able to process 16 bits per clock which is the exact same rate as Hawaii’s theoretical limit but they can now run a full speed due to the implementation of HBM and the aforementioned color compression routines. There are some other additions as well which enhance the ROPs’ processing abilities As a result, Fiji’s claimed fill rate performance is more than double that of Hawaii.
AMD has also modified a few elements within those Compute Engines. There are new data processing instructions which allow for parallel sharing between SIMD lanes, new 16-bit floating point and integer instructions and double the amount of effective cache per CU.
Another area that doesn’t have any physical changes is the geometry processing stage but once again the revisions have been done below the skin. There are still four processors containing the assemblers and tesselllators but each unit has received a significant speedup, thus boosting tessellation performance among other things.
Last but not least AMD has enhanced the throughput of their Asynchronous Compute Engines and further worked upon prioritizing data across their shared Crossbar. This should help Fiji in higher level DX12 scenarios.
|Latest Reviews in Video Cards|