The Polaris Architecture; Much More Than GCN 1.4.
The Polaris Architecture; Much More Than GCN 1.4.
There has been a lot said about AMD’s Polaris architecture and how it moves Graphics Core Next into the next generation. Make no mistake about it though; this so-called fourth generation design doesn’t change any of GCN’s fundamentals, it simply updates some elements in order to bolster performance in DX11, DX12, OpenGL and Vulkan workloads. More importantly, it also takes a massive step forward in terms of overall efficiency due to the use of Samsung’s 14nm FinFET manufacturing process.
One of the main reasons why we are seeing such massive inter-generational performance increases with the newest architectures from AMD and NVIDIA is their respective use of those selfsame 14nm and 16nm manufacturing processes. While a decade ago we were used to seeing a relatively quick (and yearly) cadence from one node to the next and reaping the benefits which came alongside such a technological rollover, things ground to a complete halt in 2011. Back then, 28nm was introduced but due to the inherent challenges with moving beyond that point, we’ve been stuck with that process for the better part of five years.
Now this isn’t all to say that 28nm was inefficient since while it started life as a pretty hot-running node due to increased transistor density over 40nm, current 28nm GPU cores are relative power misers. This is because engineers have found innovative ways of squeezing every last drop of performance from 28nm while also decreasing heat output and increasing efficiency.
14nm on the other hand represents something of a quantum leap forward for AMD since they are able to utilize all the lessons learned from their 28nm architectures and enhance them on this new node.
The 4th generation GCN architecture will be initially find its way into two separate core designs: Polaris 10 and Polaris 11. For the purposes of this review we’ll be looking at the larger and more capable 10 part. In this iteration Samsung’s 14nm FinFET process has been harnessed to create a core which packs in 5.66 billion transistors into an extremely compact die area. For comparison’s sake, the Hawaii-based cards like AMD’s R9 390X and R9 390 are both rough analogs for Polaris 10 from a performance perspective but this new core is actually smaller than the one used on the Bonaire-based HD7790.
So what does 5.66 billion transistors packed into a die size of 243.3mm² get you? A core design that actually looks a fair amount like a slightly cut down Hawaii Pro but one that it infinitely more capable. One thing to note is this is a fully enabled version of Polaris 10 and there won’t be any other “unlocked” parts with higher performance beinng derived from this particular core.
From a high level perspective Polaris 10 houses four dedicated geometry processors, each of which houses nine Compute Units for a total of 36 CUs. These are allowed to function as a holistic entity through the use of a dedicated Graphics Command Processor. From a strictly visual perspective there really isn’t anything to differentiate this new architecture from previous iterations of Graphics Core Next. However, not all performance uplifts are achieved through drastic design changes. Rather, with Polaris things do change in a big way once you drill down into the finer-grain improvements AMD has been able to build into this core, all of which contribute to a fundamental shift towards higher rendering efficiency.
First and foremost among these changes is an improvement in the way the Geometry Processor within each block handles workloads. Here there’s been a significant generational uplift through the use of more a more efficient communications string so each of the Compute Units is more fully utilized rather than sitting idle at some points.
Polaris’ caching hierarchy has also seen some pretty drastic changes to its layout. It utilizes 2MB of L2 cache, essentially doubling up on what Hawaii offered and several other instruction caches scattered throughout the die but actual throughput has been boosted and in some cases even doubled. This is particularly important since enhanced caching efficiency will take some stress off the chip’s 256-bit memory interface which is spread over eight 32-bit memory controllers.
One area that hasn’t seen many changes is the Render Back-Ends. While there have been some minor ROP throughput increases, this is one area that could prove to be a bottleneck for Polaris. Instead of the sixteen RBE’s found on Hawaii-class cards, there are only eight here.
From a broad scale perspective you’ll also notice AMD has eliminated the TrueAudio fixed function block, freeing up die space for additional computational resources. This functionality is now done by the shaders themselves but we’ll get into that a bit further below. In addition, there’s a new display controller with native support for HDMI 2.0 and DisplayPort 1.4 along with a heavily updated multimedia block. Last but not least, like many other GCN-based architectures, Polaris allows for driver-based firmware upgrades. This is a function directly derived from consoles which could grant AMD the ability to “evolve” Polaris as new functionality is needed.
Many of Polaris’ changes have been done at the individual Compute Unit level. As with other GCN-based parts, these CUs include 64 Shaders / Vector Units broken into four banks of 16, four 64KB register caches a quartet of texture units load / store functionality and a dedicated 16KB cache block.
Intrinsic shader efficiency boosts were a priority for AMD this time around and they’ve supposedly accomplished exactly that with an enhanced instruction prefetch algorithm. This improves efficiency by reducing pipeline stalls and makes instruction caching much more streamlined. It can also be quite beneficial for single threaded performance situations where workloads within DX11 were functions can’t be broken into multiple threads off like within DX12 and Vulkan.
From a pure comparative perspective, the various improvements built into the CU’s leads to a 15% clock per clock improvement over R9 390. That’s even before the higher frequencies granted by the shift to 14nm get factored into the equation.
Another addition here is what AMD calls Shader Intrinsic Functions. These are directly derived from AMD’s experience within the console market and while they may not have much to do with the Polaris architecture per se, SIF could have a drastic impact on the future of Radeon GPUs. These extensions are essentially carried over from the consoles through an API library within GPUOpen, can easily be ported to the PC space and can grant developers improved performance on an architecture they are already familiar with.
|Latest Reviews in Video Cards|