ASUS GeForce GTX 465 1GB Review

Author: Michael "SKYMTL" Hoenig
Date: May 30, 2010
Product Name: ASUS GeForce GTX 465 1GB Voltage Tweak Edition
Share |

Efficiency Through Caching

There are benefits to having dedicated L1 and L2 caches as this approach not only helps when it comes to GPGPU computing but also for storing draw calls so they are not passed off to the memory on the graphics card. This is supposed to drastically streamline rendering efficiency, especially in situations with a lot of higher-level geometry.

Above we have an enlarged section of the cache and memory layout within each SM. To put things into perspective, a Streaming Multiprocessor has 64KB of shared, programmable on-chip memory that can be configured in one of two ways. It can either be laid out as 48 KB of shared memory with 16 KB of L1 cache, or as 16 KB of Shared memory with 48 KB of L1 cache. However, when used for graphics processing as opposed to GPGPU functions, the SM will make use of the 16 KB L1 cache configuration. This L1 cache is supposed to help with access to the L2 cache as well as streamlining functions like stack operations and global loads / stores.

In addition, each texture unit now has its own high efficiency cache as well which helps with rendering speed.

Through the L2 cache architecture NVIDIA is able to keep most of the rendering function data like tessellation, shading and rasterizing on-die instead of going to the framebuffer (DRAM) which would slow down the process. Caching for the GPU benefits bandwidth amplification and alleviates memory bottlenecks which normally occur when doing multiple reads and writes to the framebuffer. In total, the GF100 has 768KB of L2 cache which is dynamically load balanced for peak efficiency.

It is also possible for the L1 and L2 cache to do loads and stores to memory and pass data from engine to engine so nothing moves off chip. Unfortunately, one of the issues with this approach is that significant die area is taken up by doing geometry processing in a parallel and scalable way while not using DRAM bandwidth.

When compared with the new GeForce GF100, the previous architecture is inferior in every way. The GT200 only used cache for textures and featured a read-only L2 cache structure whereas the new GPU’s L2 is rewritable and caches everything from vertex data to textures to ROP data and nearly everything in between.

By contrast, with their Radeon HD 5000-series, ATI dumps all of the data from the geometry shaders to the memory and then pulls it back into the core for rasterization before output. This causes a drop in efficiency and therefore performance. Meanwhile, as we discussed before, NVIDIA is able to keep all of their functions on-die in the cache without having to introduce memory latency into the equation and hogging bandwidth.

So what does all of this mean for the end-user? Basically, it means vastly improved memory efficiency since less bandwidth is being taken up by unnecessary read and write calls. This can and will benefit the GF100 in high resolution, high IQ situations where lesser graphics cards’ framebuffers can easily become saturated.

Latest Reviews in Video Cards
September 8, 2016
The GTX 1080 and GTX 1070 STRIX OC may be ASUS' flagship products in their respective lineups and offer massive performance but their prices also undercut the competition by a large margin....
August 18, 2016
Zotac's GTX 1070 AMP! Extreme is clearly overkill but we happen to like it that way!...
August 3, 2016
AMD's newest creation, the RX 470 4GB seems to have what it takes to upend the budget-focused GPU market but is its $179 to $199 price too close to comfort to the RX 480 4GB?...