The Zen Architecture; An Overview
The Zen Architecture; An Overview
While AMD has done a great job in detailing their Zen architecture, this review is a perfect place to go over some of its highlight points as they pertain to Ryzen processors. Remember, Zen will be around in some shape or form for at least the next half decade so it was imperative that AMD design it to not only excel in current workloads but also position its feature set to deal with future tasks as well.
The path to Zen has always followed four primary goals: the architecture needed to have significantly improved single threaded performance, simultaneous multithreading was an absolute necessity, it needed to boast great efficiency and module scalability had to be improved.
For the enthusiast market, the Ryzen 7 processors will hold a preeminent flagship place for the forseeable future. Their cores make use of an advanced 14nm FitFET manufacturing process and pack a total of 4.8 billion transistors. Unlike past AMD CPU designs, other than the eight physical cores, a lot of I/O capabilities have been built into Ryzen in an effort to streamline platform design and speed up critical communication pipelines.
Other than 16 native Gen3 PCI-E lanes, there’s also four more lanes dedicated to NVMe or SATA storage solutions and four USB 3.1 Gen1 ports. All of these will play a key role in future motherboard designs.
The scalability aspect of AMD’s goals was achieved by creating a very simple modularized building block called the CPU Complex or CCX. Each of these CCX’s contains four cores which use simultaneous multithreading technology to process up to eight concurrent threads in parallel, 64K of L1 cache, 512KB core-specific L2 cache and 8MB of general L3 cache which can be shared across all four cores.
These CPU Complexes can be used individually as a simple high efficiency four core, eight thread part or combined to make larger, more capable processors for higher end markets. Meanwhile, individual cores within each CCX can be disabled without impacting overall performance metrics. For example, Ryzen 7 has two of these modules while Ryzen 5 makes use two CCX’s but disables a pair of cores to create a 6-core, 12-thread CPU. The possibilities really are endless.
Tying the CCX’s together is AMD’s newfound Infinity Fabric which is essentially a high speed interconnect that’s meant to facilitate on-die communications and aid with the integration of other components like onboard graphics or sound solutions.
In order to address the performance end of the equation, AMD very much focused on the way Zen goes about executing its workloads. Not only has the instruction scheduler been significantly expanded but its resource pool has also been augmented. In plain English this means the scheduler can send information to the execution units at a much quicker pace than in previous designs.
There has been a lot of talk about machine intelligence and deep learning in the last year as scientists from all over the world over attempt to build computer networks that can think for themselves. For their part AMD has taken some of those highbrow concepts and have built an artificial network -albeit a simple one that isn’t Terminator-level smart- inside the Zen microarchitecture.
Called Neural Net Prediction, it builds a model of the decisions driven by software code execution and anticipates future needs, can pre-loads instructions and then choose the best path through the CPU for workloads. As such, a Zen-based processor could get faster over time as it “learns” your usage habits.
Every modern processor has some form of prefetch algorithm built into its design but AMD is hoping to take this to the next level with Smart Prefetch. This is an effort to boost execution stream performance so data can be fed through the core at a faster, more efficient pace. Smart Prefetch is supposed to anticipate the location of future data accesses by applications and then utilizes a high level algorithm to learn application data access patterns and model its responses in parallel. It will then prefetches vital data into local cache so it’s ready for immediate use.
As we already mentioned in the CCX description above, there have been some pretty major revisions to cache hierarchy. When combined with the new prefetcher, these changes allow for a lower level cache nearer to the core netting up to 5X greater cache bandwidth into a core.
|Latest Reviews in Processors|