Under the Hood: Trinity’s Architecture
Under the Hood: Trinity’s Architecture
As we have mentioned previously, the basic layout of Trinity isn’t that far removed from the first generation of APUs but there are some significant changes. In its highest end A10 configuration, Trinity will include up to two Piledriver compute modules with two cores each and 2MB of L2 cache for quick data access. Lower end derivatives will only include a single module with two cores and in some cases only 512KB of memory per core, thus reducing power consumption even more.
The two 64-bit memory controllers have also been updated to support a broader range of P-states which isn’t particularly important for most desktop systems but the on-the-fly memory frequency scaling could be beneficial in some low-power STB configurations. There’s still built in support for 1.5V DIMMs and like in Llano, the controllers still support up to 64GB in desktop systems in dual channel mode.
The major changes in the Trinity architecture are buried within the new Unified Northbridge which represents AMD’s first attempt at creating an all-in-one communication solution for their present and future APUs. Within it, a dedicated PCI-E link replaces Hypertransport protocol to the chip’s main I/O devices, APU power management can be regulated on the fly and memory controller requests can be effectively shared between the processing stages and the GPU.
The links between each section of the APU follow in the same footsteps as the previous generation but AMD has refined certain interconnects with the goal of speeding up information transfers. The AMD Fusion Compute Link is still considered to be a medium bandwidth connection which manages the complex interaction between the onboard GPU, the CPU’s cache and the system memory. Unlike in the past, AMD has finally refined this interconnect, giving the GPU direct access to a coherent memory space while the CPU can now directly access the GPU’s dedicated framebuffer if needed. This is one of the primary reasons why Trinity’s theoretical data throughput has jumped from 572 GFLOPS to 736 GFLOPS.
The Radeon Memory Bus on the other hand is the all-important link between the onboard graphics coprocessor and the primary on-chip memory controller. Rather than acting like a traffic cop (a la Fusion Compute Link) which tries to direct the flow of information, this memory bus is all about the GPU having unhindered high bandwidth access to the system’s memory controllers.
In the previous generations of AMD IGPs, before Llano came around, the Northbridge’s graphics processor had to jump through a series of hoops before gaining access to onboard memory which is partially why 128MB of “SidePort” memory was sometimes added. However, the APU’s single chip, all in one solution allows for the elimination of many potential bottlenecks.
The graphics core within Trinity should look familiar since we last saw this layout back in the HD 6000 days. Instead of using the newer GCN Southern Islands architecture, Trinity’s SIMD engines rely on the slightly older Northern Islands with its VLIW4 instruction set. The only exception is the new Video Codec Engine which acts as one stop shop for hardware encoding via the GPU’s compute engine and provides a highly parallel scalable pipeline for many high definition tasks. It can also provide additional benefits for transcoding and output tasks.
There are a number of differentiating factors between Trinity’s “HD 7000” series IGPs and those which graced Llano. First and foremost, the previous generation APUs housed an updated Redwood core –code named Sumo- that used a VLIW5 design that would be considered outdated and inefficient by today’s standards. While dynamic power gating has been retained in the VLIW4 design housed within these new APUs, it has been updated to support lower idle power consumption and engine speeds. VLIW4 also brings with it a number of rendering pipeline improvements which should allow these new APUs to pull significantly ahead of their predecessors.
Unfortunately, in order to maintain a clear generational differentiation, AMD’s spin doctors have given these graphics cores a HD 7000 series name even though they contain an older architectural design. Nonetheless, they still come equipped with some serious graphics processing muscle. In an A10 APU, there will be 384 stream processing units (or cores), 24 texture units and 8 ROPs, mirroring the layout of a HD 66xx desktop part.
|Latest Reviews in Processors|