| ||
| by MAC | November 3, 2008 | ||
| Microarchitecture Dissected #2 Microarchitecture Dissected #2Although Nehalem has been highly touted as one of the most significant architectural overhauls ever, it still shares significant roots with the original P6 microarchitecture that was debuted in the Pentium Pro in 1995. Furthermore, there is no denying that Nehalem was built upon a Penryn base, however from there Intel's engineers have added significant performance-oriented features, like an integrated memory controller, a completely new system interconnect, and a multi-level shared cache. As you will see, they have also focused a great deal on the chip's power efficiency capabilities. Let's examine some of these advancements:
For the Nehalem architecture, Intel has foregone the legacy front side bus in favour of the QuickPath Interconnect (QPI). The QPI is a high-speed, low-latency point-to-point processor link. From a technical standpoint, the QPI is a bi-directional 20-bit wide bus that is integrated onto the processor itself. The result? An incredibly fast interconnect that will improve overall bandwidth while reducing latency. This high-speed interface is used to access the distributed shared memory, it helps cores communicate with each other, and it links up with the X58 northbridge; now known as the IO Hub (IOH). Consumer-oriented Nehalem models will have a single QPI link, but the workstation/server processors will have up to four of these high-speed interconnects. With its faster 6.4 Gigatransfers per second (GT/s) QPI link, the 965 Extreme Edition will benefit from a theoretical bandwidth 25.6GB/s link, which is double the bandwidth offered by the 1600MHz front side bus implemented in the X48 Express chipset. Interestingly, it is also equivalent to Nehalem's triple-channel DDR3-1066 memory bandwidth. The lesser 920 and 940 models feature a 4.8GT/s QPI interface with 19.2GB/s of bandwidth. Some of you may be wondering why a replacement to the front-side bus was needed. Well the easy answer is that the conventional shared-bus interconnect topology was bandwidth-starved and not scalable. The front-side bus is a decade-old concept that was never meant for a multi-core era. At the moment, there are significant communication bottlenecks between the processor and chipset, as well as among the various cores in one processor. For example, if one core wanted to communicate with a core on another die or access that core's L2 cache, the data had to go through the slow FSB, causing a bottleneck and performance hit. There is no denying that current multi-core processors perform very well, but they are simply relying on their large cache to offet the front-side bus bottleneck issues.
![]() Following AMD's lead, Intel has finally integrated the memory controller into the processor itself. As a result, the memory is directly connected to the processor, which not only means significantly lower latency, but much higher bandwidth as well. Current Core i7 processors feature a triple-channel memory interface, and each channel can support one or two DDR3 modules. This means that memory modules should be installed in sets of three, not two as has been the norm since the dual-channel memory architecture was first introduced back in 2003. It also means that most Core i7 motherboards will ship with three or six memory slots, but you will see the occassional four slot design, like Intel's DX58SO Smackover motherboard. With this new design, Intel claims up to a 3.4x increase in memory bandwidth from Penryn, as well 40% lower memory latency. We definitely look forward to testing out that claim.
Building upon Penryn's implementation of SSE4.1, which was focused on improving video encoding, image/video editing, faster 3D game physics, etc...Nehalem adds 7 new instrutions, namely Accelerated String and Text New Instructions (STTNI) and Application Targeted Acceleration (ATA), which focus on faster XML parsing, faster search and pattern matching, and other cryptic processor functions. Keep in mind that with Penryn, the SSE4 instructions were responsible for the most significant performance increases, so we definitely look forward to seeing what Intel have accomplished with these latest instructions.
![]() Nehalem also brings Hyper-Threading (HT) back from the dead. With HT enabled, a processor with four physical cores is viewed by the operating system as having eight logical cores. A core usually processes the pieces of the different threads one after another, however an HT-enabled core can process two threads in a simultaneous manner. While Hyper-Threading did not perform particularly well on the Pentium 4, Nehalem's architecture was designed to remove many of the processing bottlenecks. Depending on the workload, and how effectively multi-threaded an application is, the performance increases could be 20% or higher.
Nehalem’s Power Control Unit (PCU) is an extremely innovative power management feature that uses an on-chip micro-controller to actively manage the power and performance of the entire processor with the help of numerous integrated power sensors. The PCU can dynamically alter the voltage and frequency of the CPU cores to lower power consumption or provide performance boost in conjunction with the new Turbo Mode feature. Also, thanks to a development know as Power Gates, idle cores can be completely shut down and placed in a C6 sleep mode while other cores continue working. This is noteworthy because C6 mode had previously only been featured on mobile processors.
![]() For the first time ever, Intel has included a feature that automatically overclocks a processor based on the workload demand. Basically, all Core i7 processors come with two additional speed bins, which is to say that they have two higher multipliers that they can use under certain scenarios. For example, if you are using a single-threaded application, the PCU will down-clock or shut down three cores, thereby freeing up power and lowering heat output while "overclocking" that one core that is in use. If an application is multi-threaded and the cores are not running too hot, the PCU will overclock all the cores up one speed bin. The only limit to Turbo Mode is the power and thermal headroom, so keeping your processor cool should definitely be an even greater priority with the Core i7 series than it ever was. Taken as a whole, these new performance and energy-saving features are what truly distinguish Nehalem as a veritable next-generation microarchitecture. They are little elements that some users may never know exist, but which ultimately deliver a superior computing experience. We look forward to testing and examining each and everyone one of these new capabilities. If you are truly interested in knowing everything there is to know about this new microarchitecture, we highly recommend that you watch this Intel Developer Forum 2008 presentation by Steve Pawlowski, the Digital Enterprise Group chief technology officer and general manager for Architecture and Planning for Intel Corporation. | ||
| |
| Latest Reviews in Processors | |||||||||
|