Before Qualcomm could open this year’s Consumer Electronics Show with its keynote, NVIDIA had taken the stage at a resort down the street to pre-empt Qualcomm’s announcement of a new Snapdragon System on a Chip with a refresh of its SoC – the Tegra 4.
As benchmarks released on Sunday have demonstrated, the Tegra 4 is fast. Really fast. It blows the Apple A6 powered iPhone 5 out of the water, clocking almost twice the speed in performance tests.
Late Sunday NVIDIA released a series of whitepapers breaking down the architecture of the Tegra 4, giving readers an intimate technical view into the SoC.
Let’s take a look what makes the Tegra 4 tick.
Two Tegras, One Architecture
Like the previous iterations of the Tegra family, NVIDIA chose to license a CPU design from ARM rather than build its own chip from the ground up. Tegra 4 was released in two variants: the Tegra 4 (Cortex-A15) and Tegra 4i (Cortex-A9 R4). The latter of the two has an embedded LTE baseband from Icera built in.
It should be noted that the Tegra 4’s processing architecture looks very similar to the NV40’s on paper, but because of the different uses of the systems the comparisons end there.
Tegra 4 has four 1.9GHz CPU cores compared to Tegra 3’s four 40nm Cortex A9s clocked at 1.2 Ghz. The fifth core, the ‘companion core’ runs somewhere between 700 and 800 MHz. Tegra 4’s GPU cores operate at a max clock speed of 672 MHz, up from the 520 MHz ceiling found in Tegra 3.
NVIDIA was able to reduce the power requirements for the Tegra 4, of up to 40%, by not increasing the number of CPU cores in the chipset but rather adding more (power-efficent) GPUs.
Comparing the Tegra 4 chip to its predecessors, one can see the smart evolution of the architecture. With the Tegra 2, the chip was laid out into eight cores split between four pixel and four vertex section. The Tegra 3 chipset was a study in incrementalism for vertex units; while NVIDIA increased the CPU cores it kept Tegra 2’s block of four vertex units while doubling the pixel shader units to eight. Jumping forward to Tegra 4, we see that it has six vertex unites – five were added – and two more pixel units for a total of four (all while increasing the number of cores per pixel unit).
In total, the Tegra 4 has 72 cores split between 48 pixel shaders and 24 vertex units. Diagrams in the whitepapers provided by NVIDIA show that the Tegra 4 is capable of outputting four pixels per pipe.
Tegra 4’s LTE-capable little brother, the Tegra 4i, has 60 cores split between 48 pixel shaders and 12 vertex shaders. The 4i’s Cortex-A9 R4 CPU is clocked at 2.3 GHz. It should be noticed that substantial die-space on the Tegra 4i is taken up by the i500 modem.
The Tegra 4i does have some other noticeable downgrades from the Tegra 4 as putting an LTE baseband on the chip comes at a cost: the six vertex units become three and the four pixel pipes become two. In addition, memory bandwidth only has a single 32-bit channel rather than a dual-channel pipe.
Direct X and OpenGL
This is where some will be disappointed: the Tegra 4 isn’t quite OpenGL ES 3 compatible. It’s just a few features shy from making the leap from ES 2 to ES 3, namely FP32 precision and ETC2 RGB compression.
In Direct X land, the Tegra 4 rests at the Direct X 9_1 feature level.
To compensate for these lower feature levels, NVIDIA does have a number of tricks up its sleeve with anti-aliasing: it has full 2x and 4x MSAA support with colour and z compression.
King of the Smartphones?
By any metric, Tegra 3 was a highly successful SoC. However Tegra 3′s success came only from Tablet sales (largely the Nexus 7) and was virtually non-existent in the smartphone world. With the Tegra 4i, NVIDIA has a fighting chance of getting a foothold into the smartphone game with a competitive (but admittedly older) ARM chip — the Cortex-A9 R4 — and an integrated LTE baseband.