What's new
  • Please do not post any links until you have 3 posts as they will automatically be rejected to prevent SPAM. Many words are also blocked due to being used in SPAM Messages. Thanks!

AMD Mullins & Beema Mobile APUs Preview

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,840
Location
Montreal
With the mobile market expanding at a rapid pace, the blurring of lines between different segments has become commonplace. In order to cope with these changing conditions, AMD has been evolving their platforms accordingly. Last year we saw the Richland APUs make their way into a number of successful mobile platforms while the next generation APUs, code named Kaveri, showed up in the desktop space.

While low power versions of Kaveri for the notebook and ultra portable markets have been conspicuous by their absence to compete against Intel’s Haswell offerings, AMD’s focus is now upon the low power and mainstream segments. This is where two new products will be introduced, code named Beema and Mullins.

AMD-BEEMA-MULLINS-1.PNG

These new APUs are actually part of an inter-generational lineup of broadly targeted architectures that date back to 2011, when AMD’s Heterogeneous System Architecture approach was still in its infancy. Back then, Brazos’ (along with its 2.0 stepping) Bobcat architecture and the associated Desna, Zacate and Ontario processors brought integrated CPU / GPU devices to the low power mobile market. Brazos was a success on some levels but ultimately proved itself to be a proof of concept rather than an outstanding seller since only a handful of design wins were ever associated with it.

In many ways Temash and Kabini represented a giant step forward for the APU scene when they were introduced last year. By successfully combining additional HSA features with GCN graphics, all of the I/O functions and four cores, the Jaguar microarchitecture was created and proved to be a serious contender. In modified form Jaguar even went on to feature prominently in the Playstation 4 and XBox One, becoming an overnight hit but it ultimately failed to win much market share from Intel.

AMD is hoping Mullins and Beema, their third generation low power and mainstream APUs will finally make some major inroads within their intended niches. With mobile versions of Kaveri now delayed past the first quarter of this year, they’re AMD’s best hope for making a dent in Intel’s dominant position. The focus this time around is to refine the Jaguar architecture to the point where it delivers enhanced performance per watt despite remaining on the 28nm manufacturing process. The result is a “new” microarchitecture called Puma+.

AMD-BEEMA-MULLINS-2.PNG

Sitting at the top of AMD’s mainstream offerings are the new Beema A-series APUs which effectively replace the Kabini processors of yesteryear. The A6-6310 sits atop the product stack and improves upon the outgoing A6-5200 with a part that operates at higher maximum CPU and GPU frequencies while supporting bandwidth-enhancing DDR3L-1866 DRAM. Meanwhile, since the core architecture hasn’t changed, the GCN-based HD 8000-series hasn’t been modified (though it has been rebranded for clarity’s sake) and still receives 128 processing cores backstopped by eight TMUs and a quartet or ROPs.

The A4-6210 boasts roughly the same specifications and four cores but hits a lower cost through reduced frequencies and utilizing a DDR3L-1600 memory interface. TDP for this part will remain at 15W, much like the APUs it will replace.

You will notice both of these APUs operate at substantially higher frequencies than their outgoing compatriots while operating at a lower or similar 15W TDP. We’ll detail how this was accomplished in the Architecture section but to see an approximate 30% speedup without negatively impacting power needs is extremely impressive. This will become a recurring theme since improving performance per watt was one of the primary goals of AMD’s engineers when creating Puma+ for mainstream slim and light notebooks.

AMD-BEEMA-MULLINS-3.PNG

The low cost mainstream segment has received a facelift as well with new E-Series APUs. The E2-6110 is a quad core processor that boosts CPU and GPU speeds by about 20% over its predecessor, the E2-3800 while maintaining the same power requirements. Meanwhile AMD’s E1-6010 is an interesting dual core combination that operates at 1.35GHz and 350MHz on the CPU and GPU respectively or about 50MHz lower than the E1-2500 but its TDP is an incredible 10W. Both of these APUs once again receive rebranded GCN graphics cores with the R2 series designation.

For the time being the Beema A-series and E-series are being set up as primarily competition against Intel’s mobile Haswell-U Pentium CPUs and the Bay Trail M powered Pentium N and Celeron N SKUs. This puts them in line to become prime candidates for convertible tablets, mainstream notebooks and ultra portables. However, later this year AMD will find themselves competing against Intel’s 14nm Broadwell architecture which may pose an issue for APUs that are based off of an older 28nm manufacturing process, despite their advances in performance per watt optics.

AMD-BEEMA-MULLINS-5.PNG

Mullins is the current runt of AMD’s APU litter but that doesn’t mean it isn’t fully capable of delivering a relatively high degree of performance. Aimed directly at the low power market, it competes against Intel’s Core i5 / i3 Haswell “Y” series and their latest Atoms powered by Bay Trail T and is designed for tablets and lower end ultra portables. The only minor hiccup may be AMD’s naming scheme which runs adds the odd “Micro” moniker while spanning a broad A10 to E1 product designation range.

As a direct replacement for Temash, Mullins seems to succeed past everyone’s wildest dreams since APUs in this segment arguably have the most to gain from AMD’s new architectural refinements. It offers an astonishing 30% to 60% more performance while lowering TDP to 4W to 4.5W. Take the A10 Micro- 6700T and A4 Micro-6400T, both of which provide substantial frequency benefits over their predecessors, support DDR3L-1333 memory and only require about 4.5W of power. Their sustained power envelope is actually closer to 3W, making them prime candidates for fanless tablets and set top boxes.

AMD’s E-series designation makes a comeback here as well with the E1-6200T. This APU is actually a replacement for the A4-1200, though it will effectively outcompete the A4-1250 in the majority of applications. Expect to see it used in entry-level, very basic small form factor notebooks and low end Windows-based tablets.

AMD-BEEMA-MULLINS-4.PNG

In order to properly support these new APUs, AMD has assembled a robust backbone of ISV partners with supporting software features. Many of these are carry-overs from the Elite Mobility feature set that was pioneered and offered for free with Richland and Trinity. Basically, everything seen here is built to take advantage of AMD’s GPU compute algorithms for accelerated performance. For example Quick Stream is a quality of service technology that manages internet bandwidth to prioritize high bandwidth tasks so they are properly buffered ahead of secondary requests.

One interesting addition is a partnership with Bluestacks which offers a virtualized Android environment that sits atop Windows. This allows for file sharing between Android and Windows along with a number of other interesting benefits.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,840
Location
Montreal
The Puma+ Architecture; Refining Jaguar

The Puma+ Architecture; Refining Jaguar


As an evolutionary architecture, Puma+ seems to have everything it takes to become a success. While Jaguar ushered in a new core design that introduced an integrated controller hub and onboard GCN graphics, AMD’s latest steps have distilled those design benchmarks into a more refined product. As a result, Puma+’s primary goals were to reduce power, offer better platform security and enhance performance without adding to TDP. That’s more easily said than done but those targets were achieved in a variety of ways.

AMD-BEEMA-MULLINS-6.PNG

From a core design perspective there’s actually quite a bit to distinguish Puma+ from past Jaguar iterations, though it is functionally identical to allow for broader platform compatibility with existing notebook designs. However, the amount of block movement has been substantial despite the fact that the majority of die space is still used for GPU and CPU cores. For example, the VCE encoder is located close to the display block to enable a high speed low latency interface between the two which results in power savings through reduced computational overhead. The same goes for the proximity between the GPU cores and the video decode unit they accelerate.

One of the major additions this time around is the integration of a dedicated Platform Security Processor or PSP. Based off of an ARM Cortex A5 SoC, it grants Beema and Mullins a comprehensive, hardware-level security framework, which is a great addition considering the prevalence of viruses like Heartbleed in today’s computational ecosystem. We’ll dive a bit deeper into this substantial step forward later in this article.

In memory subsystem there is a new voltage mode logic design which allows for higher native bandwidth but using the same power envelope as lower performance modules. Due to this, certain Puma+ based APUs can offer expanded memory speeds without exceeding a given TDP target. This is a key addition for low power fanless tablets.

AMD-BEEMA-MULLINS-7.PNG

While the high level CPU architecture has remained largely untouched, there has been a major push towards power optimization by streamlining the way Puma+ handles processing requests. The core enhancements largely center on scheduling has been sped up in most workloads and there’s now a clearer path through various computational branches.

AMD hasn’t described all the improvements built into Puma+ but they have been able to balance performance per watt to the point of reducing leakage by roughly 19% on the CPU cores alone. This has allowed both Mullins and Beema to reach significantly higher frequencies than their predecessors.

AMD-BEEMA-MULLINS-8.PNG

The GCN-based graphics cores have also undergone some major overhauls through the use of a more refined 28nm manufacturing process and several other minor evolutionary steps. However, there really wasn’t much to address here since GCN’s communication pathways have already been tuned for a balance of power and performance. Nonetheless, AMD’s engineers have been able to squeeze a 38% leakage reduction out of this existing architecture.

One major feature set that’s missing from Puma+ is AMD’s expanded HSA implementation from Kaveri. Currently Jaguar doesn’t support technologies like Heterogeneous Queuing or a Heterogeneous Unified Memory Architecture. We’re expecting those to be rolled into next generation mainstream and low power parts.

Puma+’s changes are actually quite minimal from a microarchitecture standpoint simply because many of its performance and frequency improvements have been derived through finely-tuned power management routines. That’s what we’ll deal with on the next page.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,840
Location
Montreal
Power Management Becomes System Aware

Power Management Becomes System Aware


Eking out a bit more performance out of an existing core design without boosting TDP is a top priority for AMD. This shouldn’t come as any surprise considering they can’t really hope to match Intel from a manufacturing process standpoint. Without many drastic changes to Jaguar’s core architecture, most of Puma+’s improvements are derived from a number of innovative upgrades to the APU’s power management routines.

The first step in what ended up being a multi-dimensional approach to addressing thermal characteristics and battery life was for the APU to become aware of its environment rather than just its own internal temperatures. This has led to the development of a broad system-aware approach to managing clock frequencies in relation to the platform’s overall characteristics.

AMD-BEEMA-MULLINS-10.PNG

When a system is installed into a standard notebook chassis, managing temperatures is relatively easy since heat can be effectively dispersed by using a larger heatsink and spreading any excess buildup over the external skin. Those characteristics change in a big way when moving to the thin and light market where Beema and Mullins ply their wares. Without vertical space for a large heatsink and sometimes a lack of an internal fan, the APU’s heat is transferred almost directly to the device’s chassis.

In the past, an APU’s maximum T-Junction or the temperature at which it began to throttle was set based upon internal parameters and typically topped out at 100°C. Now in the low power space the limiter isn’t the silicon but rather the platform itself. For example, if enough power is fed into an APU for it to hit the silicon’s 100°C, that heat has to go somewhere and would result in extremely hot exterior case temperatures when used in an ultra portable or tablet.

For sustained operation in these confined environments, AMD has determined a silicon junction temperature of about 60°C will result in manageable external temperatures. Therefore chassis sensors and a number of predictive algorithms were instituted to create Skin Temperature Aware Power Management. STAPM monitors the external chassis temperatures in relation to the APU’s current and upcoming thermal output in an effort to balance performance and user comfort. As you can imagine, this has a drastic impact upon frequency algorithms.

AMD-BEEMA-MULLINS-11.PNG

Since it takes a while for the device’s skin temperature to actually reach high temperatures, there is actually an opportunity for the APU to flex some of its clock speed overhead in sustained bursts. This is where AMD’s new boost strategy gets factored into this equation and much of Mullins and Beema’s improvements over the previous generation are derived from this formula.

When skin temperature is cooler, more power is pumped into the APU for a short amount of time (typically a maximum of 10 minutes sustained load) and CPU / GPU clock speeds are increased exponentially. Once the chassis hits its limit the APU dials down to a lower power state but still delivers more than adequate performance.

Some of you may be wondering how this will factor into real-world performance and that’s a bit hard to determine. Due to the short nature of most benchmarks (read: 3DMark, PCMark, BenchmarkCL, etc.), we will likely see excellent results but actual usage scenarios may be affected differently. AMD’s STAPM approach seems to be well tailored for most highly demanding everyday tasks like photo editing, webpage loading or YouTube video buffering which require relatively short bursts of application-specific bandwidth. However, we can’t help feeling like it may pose an issue for gaming which imposes a constant high load scenario rather than requiring a short, limited adrenalin boost.

AMD-BEEMA-MULLINS-12.PNG

With STAPM enabled, it may sound like battery life may be reduced but the true situation is the exact opposite. Through AMD’s energy aware boost feature, the APU doesn’t waste any excess power by accelerating secondary tasks. Rather, battery life is actually improved since tasks are done quicker which decreases core runtime and allows the APU to enter an idle state before it normally would.

Higher frequencies do require more voltage, thus expending additional energy but the additional battery life is also attained through powering down other components like RAM and I/O sections. This balances out the amount of additional power the APU will need for its expanded boost envelope.

AMD-BEEMA-MULLINS-13.PNG

Intelligent Boost Control is another feature AMD has rolled into Mullins and Beema. At a basic level it monitors the heuristics of workloads and determines whether or not higher frequencies would be needed to complete a specific request faster. This means if an application doesn’t require extreme CPU or GPU speeds, battery life and internal TDP targets are maintained even if there is thermal and power headroom. Once again this could have a dramatic trickle-down upon battery life since power can be conserved if an application doesn’t necessarily demand the APU’s full attention.

AMD-BEEMA-MULLINS-14.PNG

All of the technologies listed above have been distilled down into a basic Battery Boost / Energy Aware Boost formula. Through it, AMD has been able to address both serial and parallel processing requests with a lower power envelope, less battery drain and better benchmark scores in some scenarios. Mullins and Beema are able to run within just the right power targets to get the job done quicker, while maintaining the battery life expected of today’s devices. Provided this works as promised, it’s hard to argue against what’s been accomplished here.

AMD-BEEMA-MULLINS-9.PNG

So what does this mean for the power consumption of these new SoC’s? At times, the differences are quite dramatic with substantially lower requirements in standard everyday tasks.

With Intel moving quickly towards 14nm, AMD needs to find power savings through innovative software and features rather than manufacturing process optimizations. This means the items we described above are only the tip of a very large iceberg. The evolution will continue into 2015 with additional steps like integrated voltage regulation on-die (already a component of Haswell), frame power gating for display output tasks and a number of other advances. Will this be enough for AMD to finally achieve a significant number of design wins in what is quickly becoming a key segment? Only time will tell.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,840
Location
Montreal
AMD & ARM Partner for Better Security

AMD & ARM Partner for Better Security


In a time when bugs like Heartbleed are gaining international attention, our data is stored in increasingly insecure cloud-based services and online shopping is a norm, security systems seem to be struggling to keep up. Obviously software-based solutions have more hole than Swiss cheese but where do we go from here? AMD (and by extension, Intel) have turned their focus to providing a true hardware-based security environment for local data security and access to cloud-based personal storage.

AMD-BEEMA-MULLINS-22.PNG

With Mullins and Beema, AMD is actually taking a multi-pronged approach to providing a next generation security ecosystem. This means providing the building blocks for a Trusted Execution Environment that boasts secure boot capabilities and cryptographic acceleration. Secure web transactions like banking and online shopping, anti malware and the possibility for biometric security across interoperable protocols have all been taken into account with this new architecture.

In order to achieve these goals, AMD deftly avoided creating a proprietary solution and instead turned towards ARM’s TrustZone which is an open standards-based security architecture. This allowed them to start off with a solid foundation that’s supported by a large number of partners and guarantees wide-ranging support from day one. It also creates a common interface across multiple security providers for smartphones, tablets, notebooks and other devices.

AMD-BEEMA-MULLINS-21.PNG

Living at the heart of ARM’s TrustZone is a 32-bit ARM Cortex A5 primary processor with a secondary cryptographic co-processor, both of which are integrated directly into the APU die package. The SoC has direct access to the system’s various resources and fully supports cryptographic acceleration. All of its operations are kept transparent to the end user since acceleration is accomplished without putting any load on the system’s other resources and barely drain on the battery.

AMD-BEEMA-MULLINS-20.PNG

Within this environment, operations are broken into standard and secure “zones”, the latter of which is handled by the ARM SoC and TrustZone. The first step of this approach is to create an execution environment which secures system at boot and protects all data rather than just the device. More importantly, this setup’s transpired protection layer can till interact with Windows, though it walls off security-related tasks into an effective quarantine zone.

The setup is certainly an interesting one and with partners already supporting the standard, AMD seems to have begun a comprehensive approach towards protecting users from threats.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,840
Location
Montreal
Performance Comparisons & Initial Thoughts

Performance Comparisons & Initial Thoughts


Other than a few announcements from OEM partners, there really isn’t much information about the performance of Mullins and Beema within actual systems. While we didn’t receive a sample, AMD does have some claims they wanted to put forward, many of which show each respective architecture competing well against their Intel counterparts. However, many of the following in-house results from AMD are likely achieved within the architecture’s first few minutes of Boost time when the chassis’ external temperature hasn’t hit a point where the APU will throttle downwards. Therefore, they represent a best case scenario.

AMD-BEEMA-MULLINS-30.PNG

In the low power range, Mullins improves upon Temash from nearly every conceivable standpoint with 4.5W parts easily able to match or exceed the previous generation’s achievements. All of this has been accomplished while consuming significantly less power, though AMD hasn’t really stated which Temash SKUs they are comparing here.

AMD-BEEMA-MULLINS-31.PNG

The Micro-6000 series is also supposed to go head to head against Intel’s Haswell Y and Bay Trail T while costing OEMs quite a bit less. The higher end i5-5200Y will likely put the screws to Mullins while the i3-4010Y comes very close from a number of standpoints but according to AMD, their architecture will offer more versatility alongside native hardware-based security protocols. One interesting comparison is TDP; where AMD’s chips are specified at 4.5W, Intel’s hit 6W.

The final cross comparison is between AMD’s E1 Micro and the lowly Bay Trail T Atom which really can’t keep up. This is actually an optimal situation for Mullins but we also have to remember that the cart above compares so-called “mainstream” Intel offerings (other than the Z3770 that is) with low power APUs.

AMD-BEEMA-MULLINS-32.PNG

Within AMD’s own mainstream Beema lineup, things once again look quite promising with their newest APUs offering across-the-board improvements over Kabini. Considering the clock speeds Beema is running, this shouldn’t come as any surprise.

AMD-BEEMA-MULLINS-33.PNG

AMD’s last comparative chart is somewhat odd to say the least. On the left we have mainstream 15W APUs from the A6, A4 and E-series families lining up against sub-10W tablet-focused offerings from Intel. The Bay Trail M CPUs are rated at just 4.3W and they play in a completely different league. It’s like comparing apples to oranges and highlights why we were so hesitant about posting these results.


With Mullins and Beema, AMD seems to have launched a pair of architectures which address the TDP and performance concerns of Temash and Kabini. However, as we have seen time and again, talk is cheap. AMD desperately needs design wins, and not within outlying secondary SKUs from the likes of Lenovo, ASUS and MSI. They need killer notebooks, tablets and other devices which can highlight the APU’s strengths while minimize its weaknesses. PS4 and Xbox One highlighted the Jaguar architecture’s benefits but now it’s up to AMD to sell the evolutionary Puma+ to partners and user alike in the hope that supporting devices will finally reach beyond the lowest common denominator, budget segment they’ve been residing in for the last few years.
 

Latest posts

Top