| ||||
The GF100’s Modular Architecture Scaling The GF100’s Modular Architecture ScalingWhen the GT200 series was released, there really wasn’t much information presented regarding how the design could be scaled down to create lower-end cards to appeal to a wide variety of price brackets. Indeed, the GT200 proved extremely hard to scale due to its inherit design properties which is why we saw the G92 series of cards stay around for much longer than was originally planned. NVIDIA was lambasted for their constant product renames but considering the limitations of their architecture at the time, there really wasn’t much they could do. Lessons were learned the hard way and the GF100 actually features an amazingly modular design which can be scaled down from its high-end 512SP version to a nearly infinite number of smaller, less expensive derivatives. In this section we take a look at how these lower-end products can be designed. The GPC’s: Where it All Starts Before we begin, it is necessary to take a closer look at one of the four GPCs that make up a fully-endowed GF100. ![]() By now you should all remember that the Graphics Processing Cluster is the heart of the GF100. It encompasses a quartet of Streaming Multiprocessors and a dedicated Raster Engine. Each of the SMs consists of 32 CUDA cores, four texture units, dedicated cache and a PolyMorph Engine for fixed function calculations. This means each GPC houses 128 cores and 16 texture units. According to NVIDIA, they have the option to eliminate these GPCs as needed to create other products but they are also able to do additional fine tuning as we outline below. ![]() Within the GPC are four Streaming Multiprocessors and these too can be eliminated one by one to decrease the die size and create products at lower price points. As you eliminate each SM, 32 cores and 4 texture units are removed as well. It is also worth mentioning that due to the load balancing architecture used in the GF100, it’s possible to eliminate multiple SMs from a single GPC without impacting the Raster Engine’s parallel communication with the other engines. So in theory, one GPC can have one to four SMs while all the other GPCs have their full amount without impacting performance one bit. So what does this mean for actual specifications of GF100 cards aside from the 512 core version? The way we look at this, additional products would theoretically be able to range from 480 core, 60 texture unit high-end cards to 32 core, 4 TMU budget-oriented products. This is assuming NVIDIA sticks to the 32 cores per SM model they currently have. Since we want to be as realistic as possible here, we expect NVIDIA to keep some separation between some product ranges and release GF100-based cards with either two or four SPs disabled. This could translate into products with 448(cores) + 56 (texture), 384 + 48, 320 + 40, etc for a wide range of possible solutions. The current GTX 480 has a single SM diasabled. ROP, Framebuffer and Cache Scaling You may have noticed that we haven’t discussed ROPs and memory scaling yet and that’s because these items scale independently from the GPCs. ![]() Focusing in on the ROP, Memory and Cache array we can see that while placed relatively far apart on the block diagram, they are closely related and as such they must be scaled together. In its fullest form, the GT100 has 48 ROP units grouped into six groups of eight and each of these groups is served by 128KB of L2 cache for a total of 768KB. In addition, every ROP group has a dedicated 64-bit GDDR5 memory controller. This all translates into a pretty straightforward solution: once you eliminate a ROP group, you also have to eliminate a memory controller and 128KB of L2 cache. ![]() Scaling of these three items happens in a linear affair as you can see in the chart above since in the GF100 architecture, you can’t have ROPs without an associated amount of L2 cache or memory interface and vice versa. One way or another, the architecture can scale down all the way down to 8 ROPs with a 64-bit memory interface. Meanwhile, the memory on lower end versions could scale in a linear fashion as well in accordance with the elimination of a 64-bit interface with every group of ROPs that is removed. So, the possibility of a 1.28GB, 320-bit card, a 1GB, 256-bit product and so on does exist and has happened with the GTX 470’s specifications. | ||||
| |
| Latest Reviews in Video Cards | |||||||||
|