NVIDIA GeForce GTX 480 Review

Author: Michael "SKYMTL" Hoenig
Date: March 25, 2010
Product Name: Michael "SKYMTL" Hoenig
Share |

A Closer Look at the Raster & PolyMorph Engines

In the last few pages you may have noticed mention of the PolyMorph and Raster engines which are used for highly parallel geometry processing operations. What NVIDIA has done is effectively grouped all of the fixed function stages into these two engines, which is one of the main reasons drastically improved geometry rendering is being touted for GF100 cards. In previous generations these functions used to be outside of the core processing stages (SMs) and NVIDIA has now brought them inside the core stages to ensure proper load balancing. This in effect will help immeasurably with tessellated scenes which feature extremely high triangle counts.

We should also note here and now that the GTX 400 series’ “core” clock numbers refer to the speed at which these fixed function stages run.

Within the PolyMorph engine there are five stages from Vertex Fetch to the Stream Output which each process data from the Streaming Multiprocessor they are associated with. The data then gets output to the Raster Engine. Contrary to past architectures which featured all of these stages in a single pipeline, the GF100 architecture does all of the calculations in a completely parallel fashion. According to our conversations with NVIDIA, their approach vastly improves triangle, tessellation, and Stream Out performance across a wide variety of applications.

In order to further speed up operations, data goes from one of 16 PolyMorph engines to another and uses the on-die cache structure for increased communication speed.

After the PolyMorph engine is done processing data, it is handed off to the Raster Engine’s three pipeline stages that pass off data from one to the next. These Raster Engines are set up to work in a completely parallel fashion across the GPU for quick processing.

Both the PolyMorph and Raster engines are distributed throughout the architecture which increases parallelism but are distributed in a different way from one another. In total, there are 16 PolyMorph engines which are incorporated into each of the SMs throughout the core while the four Raster Engines are placed at a rate of one per GPC. This setup makes for four Graphics Processing Clusters which are basically dedicated, individual GPUs within the core architecture allowing for highly parallel geometry rendering.

Now that we are done with looking at the finer details of this architecture, it’s time to see how that all translates into geometry and texture rendering. In the following pages we take a look at how the new architecture works in order to deliver the optimal performance in a DX11 environment.

Latest Reviews in Video Cards
November 15, 2018
The AMD Radeon RX590 is a "new" GPU that's based on old technology. But our performance benchmarks may tell a different story when the RX590 is compared against the RX580 and GTX 1060 6GB....
October 15, 2018
The RTX 2070 launch is upon us and this time we have the $500 ASUS RTX 2070 Turbo and $550 EVGA RTX 2080 XC to check out....
September 18, 2018
The NVIDIA RTX 2080 and RTX 2080 Ti's promise to be two of the fastest graphics cards ever produced. Let's see if our suite of performance benchmarks backs up that claim....