Clock-per-clock: Deneb vs. Zambezi
Clock-per-clock: Deneb vs. Zambezi
One of the questions that we have seen asked over and over during the last few years is how much faster will Bulldozer be clock-per-clock when compared with Phenom II. Given the radically new microarchitecture it was pretty much impossible for anyone to give a credible estimate, but today we are going to attempt to answer that very question.
Here is the duel: FX-8150 "Zambezi" versus Phenom II X4 980 "Deneb". To ensure that both chips were competing on a roughly equal playing field, we set identical frequencies and timings for both processors, and we also disabled the FX-8150's extra cores. However, we allowed the FX-8150 two advantages in the form of its faster 2200MHz northbridge frequency (vs. 2000MHz for Phenom II) and 2600Mhz HyperTransport Link (vs. 2000MHz for Phenom II).
(*EDIT*: Check out the bottom page for updated and more accurate clock-per-clock results) Now you should take these results with a grain of salt. Unlike with Intel's chips, where it is easy to disable cores from within the BIOS, there is no such luxury with Bulldozer at this time. Therefore, we had to limit the FX-8150's number of cores from within the OS. This fact and perhaps some peculiarities when it comes to how Windows 7 assigns workloads to the Bulldozer microachitecture might have caused exaggerated results. We will be better able to gauge C-P-C performance once we get our hands on a true four-core Zambezi chip.
Now looking at the above table will cause just about anyone who's been even casually awaiting Bulldozer to ask; what the hell happened? Nearly across the board we experienced a serious decline in performance. The only exception to this was WinRAR 4.0.1, which is probably making use of one the new instruction sets.
It's impossible for us to tell whether the issue is the inherent performance-killing effect of the additional pipeline stages, whether the integer cores are waiting for access to resources and thus creating extra latency, or whether the much higher cache latencies are the main cause for this situation. Assuming our numbers are indeed correct, something has pretty clearly caused overall clock-per-clock performance to dive off a cliff.
Yes, Zambezi might take the lead in some specialized software that takes advantage of its AES, AVX, FMA4, and XOP instructions, but those are few and far between in the consumer software realm at the moment. As you will see in the coming pages, at full-strength the FX-8150 can deliver some impressive multi-threaded performance, but single and lightly-threaded performance has actually gotten worse despite the huge clock speed advantage that Zambezi brings over Phenom II.
As mentioned above, since we had serious doubts about the validity of our previous clock-per-clock results due to Windows 7's wonky (vis-à-vis Bulldozer anyways) scheduler, we decided to try another approach. Instead of telling the OS to simply ignore the other four cores, we decided to try manually setting processor affinity from within the task bar. Every time we opened a program, we set the processor affinity to cores 0, 2, 4 and 6 (which provide optimal performance according to AMD). This allowed us diminish the negative impact the OS was having on our C-P-C tests. We had to remove WinRAR and DiRT 3 since it we couldn't prevent them from using all eight cores.
As you can see, our new approach made a sizeable difference in some instances. Having said that, many of the comments we made above still ring true with regard to Zambezi's performance shortcomings compared to the venerable Deneb core. Zambezi's performance is inconsistent, never impressive in lightly threaded workloads, but also sometimes lagging badly in highly multi-threaded programs.
|Latest Reviews in Processors|