A Deeper Architectural Dive
A Deeper Architectural Dive
Intel’s approach to processor releases has been one of slow but steady progression in what is affectionately called the “Tick / Tock” plan. It is quite straightforward too: when transitioning to a new manufacturing process (the “Tick”, or in this case Ivy Bridge on the 22nm process), the same basic architectural foundation as the previous generation is used which in turn should in theory simplify the process. Meanwhile the “Tock” in this equation occurs when a new design is released and it uses a known process node so a large number of problems can potentially be avoided. So every Tick marks a die shrink while every Tock represents a new architecture.
To put this into context, the original Nehalem processors represented a new design for Intel on the 45nm manufacturing process. In 2010, Intel used that same Nehalem core to create Westmere on the 32nm node. Westmere represented the last gasp of Nehalem and in 2011, Intel’s new Sandy Bridge architecture was introduced. By this same trend, the successor to the Sandy Bridge design will be released in 2013 on the 22nm process and will be called Haswell.
At its most basic, this approach allows Intel to retain a yearly trend of updates, keeping their lineup fresh and ahead of the competition. However, while the Sandy Bridge architecture isn’t quite yet showing its age, Intel’s steady progression towards new product generations and manufacturing processes remains in place. So say hello to Ivy Bridge.
Don’t think of Ivy Bridge as a whole new architecture since it is considered by Intel to be the next step in Sandy Bridge’s architectural evolution. The main differentiating feature of this generation is the use of an advanced Tri-Gate 22nm fabrication process which has allowed for more transistors within the same space and relative power constraints of Sandy Bridge. For end users this means higher clock speeds, better overall performance and higher end graphics capabilities and for Intel it will eventually lead to lower wafer costs and potentially higher profits.
May will want Ivy Bridge to bring about significant performance changes but for the most part, Intel has left the core processing stages alone. Naturally, the higher clock speeds allowed by a move to 22nm and some instruction changes will bring about some improvements but Intel’s main focus has been upon brining their onboard graphics controllers up to current generation expectations. As a result, a massive amount of the core’s die space is taken up by the Processor Graphics stages as an additional four Execution Units have been added.
As we have been alluding to for most of this article, the improvements built into Ivy Bridge go beyond the new 22nm manufacturing process. Of most interest to notebook users, it incorporates additional power saving features like Power Aware Interrupt Routing which effectively directs power to individual cores as needed. Power Gating for the memory and graphics core has also been included in order to reduce consumption when the core is in lower power states. For interested OEMs, there is now an option to implement DDR3L Low Voltage modules which should be of interest for Ultrabooks and other small form factor portables. These just scratch the surface of what Ivy Bridge is capable of on the power front and from our conversations with several OEMs, a 20% decrease in clock for clock power has been realized on these new mobile processors versus those of the previous generation.
Naturally, there are several other additional features on Ivy Bridge mobile processors like compatibility with PCI-E 3.0 discrete graphics cards, the continued support of AVX extensions and additional security enhancements for peace of mind.
When compared to the units contained within Sandy Bridge processors there have been plenty of architectural changes to Intel’s integrated graphics cores this time around. The Processor Graphics Unit is now broken up into three distinct graphics processing stages: the Global Assets containing the fixed function stages along with the Geometry engines, the Slice Common with its Rasterizer, L3 cache setup and pixel back ends and finally the main Slice unit which houses the Execution Units, L1 cache and other rendering pipeline necessities. Separate units have also been included for the Media CODECs and necessary display output features.
As with Sandy Bridge’s architecture, the Execution Units still do the lion’s share of heavy lifting in this core design. Much like NVIDIA’s cores or AMD’s shaders, they are responsible for the day to day multistage processing for both graphics and compute workloads. However, Intel has now added support for Compute Shaders so high levels of parallelism are now possible and shared local memory has been added to increase the performance of Direct Compute applications. As necessitated by the addition of DX11, the architecture also supports Shader Model 5.0.
Speaking of the switch to DX11 compatibility, it has necessitated the modification of the primary rendering stages. A dedicated tessellation unit as well as a pair of programmable stages –the Hull Shader and Domain Shader- has been thrown into the equation. In order to further aid DX11 performance, the architecture now supports BC6H/7 compressed texture formats as well.
While Intel have made plenty of sizeable microarchitectural enhancements to the graphics processor, what’s really interesting is that they have given the IGP its own L3 cache. While the Last Level Cache (LLC) is still shared between CPU and IGP, this small cache has been integrated into the graphics core and slightly reduces the need for the IGP to use power-hungry 256-bit ring bus interface that connects all the elements of the chip. This change, along with the lower GPU frequency and voltage, and of course the switch to the 22nm process has allowed Intel to double GPU’s performance per watt.
Along with the architectural improvements that may not be apparent by looking at the on-paper specifications of Ivy Bridge’s Processor Graphics, the HD4000 series now includes 16 Execution Units, an improvement over the 12 within Sandy Bridge’s higher end layout, resulting in a twofold improvement in certain cases. The HD2500 maintains the six EUs of the previous generation but with the wide range of on-die changes it should still offer a performance bump of between 10-20% in certain graphics intensive workloads. Quick Sync video transcoding and other GPGPU intensive tasks will also see a significant across the board improvement with these new PGUs, regardless of the clock speed differences.
The new HD graphics architecture isn’t completely focused upon offering a preset specification layout either. It is able to easily scale upwards or downwards, creating a nearly infinite list of derivatives. We likely won’t see any of these offshoots in this generation but expect higher performance from an expanded layout when Haswell hits sometime in 2013.
The HD Graphics on Ivy Bridge can dynamically adjust its frequency in order to automatically increase the clock speeds of the graphics controller when higher loads are detected. Much like the Turbo Boost technology on the CPU itself, this acts as a way to conserve power when high speeds aren’t needed and yet allows for on-the-call performance in demanding situations. And as you will see in our IGP gaming benchmarks section, the HD Graphics 4000 needs every bit of extra performance to compete with the Llano APUs.
|Latest Reviews in Processors|