AMD Carrizo APU Preview; Efficiency Forward
The mobile processor market has seen its fair share of competition as of late with the launch of Intel’s Broadwell architecture and AMD’s own Kaveri FX, Mullins and Beema APUs. Now, AMD has begun slowly releasing details about their next generation parts, code named Carrizo.
Unlike Kaveri, Richland and Trinity, Carrizo represents a significant departure from the way AMD used to design APUs. In many ways it is a ground-up redesign that is supposed to boost overall APU efficiency while also helping AMD’s bottom line by leveraging existing technologies to achieve design goals normally set aside for drastic manufacturing process shrinks. Carrizo and the improvements built into its silicon can be considered the first phase in AMD’s goal of achieving a 25x improvement in energy efficiency by 2020.
While there’s only a minimal of details about what makes Carrizo tick (we’ll get into a few of those below), we do know these new APUs will be available in two different segments. The standard performance-oriented processors are the subject of this article and will effectively replace Kaveri in the second half of this year, the Carrizo-L series will take over from Beema in low voltage scenarios. Both will share the same socket infrastructure even though Carrizo-L utilizes an updated version of the Puma architecture while Carrizo packs new Excavator x86 processing cores. Meanwhile, Mullins will be left to fend for itself in the ultra low voltage segment.
AMD certainly wants to talk big about efficiency improvements but actually achieving their goals certainly isn’t easy. In order to maximize yields and improve time to market, Carrizo is based off of the same 28nm bulk HKMG manufacturing process as its Kaveri predecessor. With Intel already utilizing 22nm and 14nm technology for Haswell and Broadwell respectively, actually achieving reasonable performance per watt improvements with an older node did pose some unique challenges for AMD. 28nm may be cost effective but they needed to take some innovative approaches on the power management front without advancing the technology node.
Along with the integration of the Excavator CPU architecture and next-generation Radeon cores (with hardware-accelerated H.265 capabilities) for improved performance over Kaveri, Carrizo also takes a page from Puma+ and will integrate the Fusion Controller Hub’s southbridge functionality directly onto the silicon. Not only does this single chip SoC-like integration allow for a more efficient communications but it also opens up the door for more aggressive power management due to placement of various I/O functions on-die.
Despite the addition of more functions into the core silicon via the FCH and the hardware point necessary for full HSA 1.0 compliance, Carrizo aims to initially target the same 12W to 35W segment as Kaveri. Supposedly, there’s a “sweet spot” around the 15W TDP mark where products will see an optimal blend of performance and power consumption which points towards a focus on lower wattage parts rather than battling Intel’s i7 series.
One of the primary ways that AMD improved on-die efficiency while also optimizing their bill of material costs was to use modify their manufacturing approach. Their older CPUs and APUs used what’s called a tapered metal stack design which is great for high performance computing since information pathways are built for the enhanced throughput needed for raw clock speed and enhanced frequencies. However, by moving their approach to a more general purpose GPU-oriented stack AMD was able to achieve a higher density core design for the Excavator cores.
With the high density library design there has been a notable improvement of the circuitry’s power efficiency while also cutting down on each core’s footprint. Meanwhile, on-die communication improvements have enhanced IPC by 5% and cut power by roughly 40% when compared against Kaveri’s Steamroller cores.
Other than the obvious performance per watt benefits brought about by the innovative new design, AMD was also able to pack 29% more transistors into the Carrizo (for a total of 3.1 billion) while retaining the same die area as Kaveri. Remember, all of this was achieved on the same 28nm manufacturing process which makes it all that much more impressive.
Drilling a bit further into that high density cell design versus the older high performance approach we can see that various function points have seen a large-scale reduction in size. However, these minimizations weren’t accompanied by any cut-down functionality since Carrizo retains the previous generation’s capabilities and even improves upon them in many instances. All in all it looks like AMD has been able to accomplish an internal die shrink without opting for a corresponding process node shrink.
Another interesting addition to this new architecture is a new voltage adaptation feature. In the past, compensating for the typical voltage fluctuations occurring within a processor was done via applying additional voltage to compensate for the droops. This resulted in additional power being wasted on regulation rather than going towards maximizing clock speeds.
In order to work around this situation AMD added circuits that allow for real time monitoring of voltage to the CPU and GPU, opening up the possibility for real-time voltage regulation. Voltage Adaptation operates at an average voltage, implements a predictive algorithm and then proactively reduces frequencies for a miniscule amount of time to compensate.
As a result of this adaptive voltage tuning, Carrizo has been able to achieve some impressive power savings for both CPU and GPU operations, particularly at higher frequencies. However, the CPU behaves very differently from the graphics processing stages since many of these improvements had already been rolled out to some extent on previous versions of AMD’s Graphics Core Next architecture. The next generation of GCN will move things to the next level and will actually back-weight its power savings towards higher frequency operations in an effort to optimize performance at its most-used speed bins.
In order to implement their predictive algorithms within Carrizo’s Excavator architecture, each individual processing module will contain ten Adaptive Voltage and Frequency Scaling sensors. In short, these AVFS modules have been engineered with hundreds of sensing paths to pick up on voltage and frequency currents throughout the silicon, in addition to the usual temperature and power readings. Considering previous architectures only included basic temperature and heat sensors, this is actually the first time the CPU industry has seen higher levels of functionality built into a monitoring package.
This implementation is supposed to improve energy efficiency within the Excavator core by a good 10-15% over Kaveri’s Steamroller architecture.
When combined together, Excavator’s high density library and Adaptive Voltage and Frequency Scaling features dramatically improve overall performance per watt. According to AMD, these advances should also lead to improved frequencies across the product range since there is more thermal headroom and the advanced algorithms will be able to wring maximum speeds out of each clock cycle.
Moving forward with these numerous design changes does improve overall efficiency but in some situations they act like a double edged sword. Carrizo will span the 12W to 35W categories but the lion’s share of its architectural benefits are clustered (on the x86 processing side that is) in the sub-15W per core pair range. This will certainly have positive cumulative effects upon lower wattage parts, high performance variants in the 30W and above range may end up struggling to differentiate themselves from the previous generation in some benchmarks. Luckily, there are other features hidden within Carrizo that haven’t been detailed yet which will likely have a more pronounced cumulative impact upon performance.
Carrizo’s Excavator modules may have received the lion’s share of space within this quick preview but that doesn’t mean the graphics side of the equation has been left high and dry. While the CPU has received design improvements that have moved its manufacturing hierarchy closer to what’s found on graphics cores, AMD has utilized CPU-like transistor tuning capabilities to achieve better GPU performance. Essentially, leakage has been reduced by 18%
As a result of these advances, the GCN compute units within Carrizo are able to achieve frequencies that are roughly 10% faster than their predecessors when operating at comparable power levels. On the flip side of that equation, when operating at identical clock speeds the graphics core within Carrizo is roughly 20% more efficient than the one in Kaveri.
Other than raw performance benefits, this evolutionary approach also allows for more graphics cores to be activated within a given APU. For example in sub-20W TDP Kaveri parts, a pair of SIMD arrays was disabled to achieve a lower power envelope but Carrizo can have all eight arrays enabled.
Up until this point we have been talking a lot about efficiency when the APU is in its high performance states but there are additional low-usage power savings as well. This is particularly important for the mobile market since the vast majority of time devices are being left to idle which drains battery power if left unchecked. In order to better manage power AMD has implemented an S0i3 power state which replaces the standard S3 standby mode with one that is completely hardware controlled.
Intel has been using S0i3 since Atom was first introduced and in a nutshell it allows the onboard manager to power down unnecessary components on the fly. Since it is hardware-based, AMD has been able to achieve sub-50mW power consumption numbers while providing a nearly seamless transition between S0i3 and more active states.
Thus far, the Carrizo launch has been treated unlike any other. Instead of going through an official “launch” with pipeline availability following soon after, AMD has been content to release tiny morsels of information to whet their buyers’ appetites. While this does provide a certain amount of excitement about what’s approaching sometime before the second half of 2015, there’s still a lot we simply don’t know about these new APUs and how they function.
At this point it looks like Carrizo will be an extremely important step forward for AMD’s general computing division since it incorporates several evolutionary and even some potentially revolutionary technology advances. Its Excavator x86 cores alone should provide sufficient IPC and performance per watt uplifts to make some waves in the mobile CPU market. Meanwhile, AMD’s continued leadership in graphics architectures may finally be leveraged properly as their Heterogeneous System Architecture has been fully implemented. What will this all mean for their success in the highly challenging mobile processor market? Only time will tell.
Something that wasn’t extensively discussed during our calls with AMD is how their former 2015 feature strategy has been realized in Carrizo. While the adaptive voltage and performance-aware energy optimizations have been covered in this article, key items like inter-frame power gating of next generation graphics cores have been kept under wraps for the time being.
All in all Carrizo is looking very promising from an efficiency standpoint but creating a successful architecture requires much more than on-paper hypothetical power benchmarks. Were hoping AMD can finally achieve some sales success in their target markets and continue the positive trend Kaveri set not all that long ago. We’ll just have to wait a bit longer to find out how they fared as Carrizo’s official launch slowly approaches.
|Latest Reviews in Featured Reviews|