An Architectural Deep Dive: Geometry Processing
An Architectural Deep Dive
One of the main design goals for the Cayman series of cards was to increase overall performance in a number of key areas. AMD’s original plan was to have this generation of products produced on TSMC’s 32nm manufacturing process but that didn’t turn out quite as expected. Upon tape-out the realization dawned that 32nm wouldn’t bring forth the expected performance or economic benefits so the decision was made to stick with the already-mature 40nm process. This meant porting over the original designs to an existing process which did cause some delays particularly at the upper end of the spectrum where power consumption and thermals became concerns. With all of this being said; the products we have all come to know as Cayman XT and Cayman Pro (the HD 6970 and HD 6950 respectively) are the 40nm “clones” of the originally planned Ibiza cards.
With all of this being said, do the Cayman series of cards feature an all-new architecture? Yes and no. The Barts products did borrow quite of bit of their design and core features from the Cypress series but with Cayman, AMD charted a different course. From a high-level architectural standpoint, very little has changed in terms of the overall core layout but nearly all of the “building blocks” have either seen a significant face lift or have had their functionality refined. In order to cover all of these changes, we will start with some of the individual items that make up this micro architecture.
Geometry Processing to the Next Level
In both the Cypress and Barts cores, there is a single unified graphics engine that is accessed through the main Command Processor. Cayman on the other hand uses a true “dual engine” architecture which breaks up the fixed function stages into a pair of identical engines. Not only does this setup lead to more efficient dispatch calls to be issued throughout the core but it also allows for two primitives to be processed per clock and a doubling-up of the number of tessellators and geometry / vertex assemblers. The dual rasterizers also allow for up to 30 pixels per clock to be processed through the two fixed function stages.
The tessellators themselves have been upgraded once again to what AMD calls an “eighth generation” design. These allow for off-chip buffering which allows geometry data from tessellation workloads to be stored in the DRAM if the on-chip cache becomes saturated. There have been other minor improvements made throughout the architecture in order to address the way tessellation is processed and this leads to a near threefold increase in high level geometry performance over the Cypress series.
The additional geometry processing horsepower which can be achieved through the new fixed function pipeline is significant when compared to the outgoing Cypress series. The dual tessellators and their ability to defer certain workloads allow for improved and much more consistent performance across all tessellation levels instead of just focusing upon lower levels as the Barts series did.
|Latest Reviews in Featured Reviews|