ASUS GeForce GTX 465 1GB Review

SKYMTL · May 30, 2010

Back when ATI was in the midst of launching their HD 5000 series, there was a lot of noise made about their ability to launch a top to bottom DX11 lineup in only a few months. It was an impressive feat which left NVIDIA looking flat footed with cards that didn’t support the new API and the discontinuation of the GTX 295, GTX 285, GTX 275 and GTX 260 216 basically handed ATI the high end market without so much as a whimper. Without products to sell at higher margins NVIDIA- exclusive board partners were left scrounging in the mainstream market with the GTS 250 and the lower-end GT 200 series cards. Some suffered enough that they had to move away from selling GPUs altogether while others defected to ATI’s side of the fence. For those that stayed on board, this bout of revisionist history is behind us and NVIDIA is quickly making up for lost time.

The GTX 400 series stormed onto the market in late March and has been clawing back market share ever since. For all intents and purposes, the GTX 480 and GTX 470 accomplished their tasks of offering up some competition for ATI while actually retailing for an acceptable price. However, what may look to be a good price for enthusiasts will likely be too expensive for many others. Even the $370 GTX 470 retails for more than what most are willing to pay for first generation DX11 hardware so what NVIDIA needed was a product that would cater to all the DX11 hold-outs yet didn’t alienate them with a stratospheric price. Enter the GTX 465 1GB.

The GTX 465 is basically a cut-down version (no die shrinks yet, folks) of the GTX 480 / GTX 470 which sports 352 CUDA cores and 1GB of GDDR5 memory. Back when we reviewed the HD 5830, we had mentioned that the ATI card’s performance was a bit disappointing because such a large performance gap was left between it and the HD 5850. Naturally, NVIDIA saw the exact same opening and is diving in head first with the GTX 465 leading the way. Priced at $279, it is placed right between the $240 HD 5830 and the $300 HD 5850 while slanting towards the higher-end card.

Even though the price should give you some idea of where this card should be performing, there are several other things which NVIDIA hopes will distinguish it from the competition. The most noteworthy of these is the GF100 architecture’s supposed strength in DX11 rendering; particularly when it comes to tessellation. Another is its ability to be used (when in SLI) for 3D Vision Surround. Granted, supporting drivers for this technology haven’t been released yet and two GTX 465s may or may not have the performance necessary to properly display games across a trio of 3D monitors but it is still an exciting addition to any sub-$300 graphics card.

In this review we will be looking at the ASUS GeForce GTX 465 1GB Voltage Tweak Edition. As is usual for ASUS, this card is clocked at the reference speeds but the voltage tweaking software that is included should allow us to push the architecture to its limits. Just remember that even though we consider this the “Voltage Tweak Edition”, all of the reference-based ASUS cards come with this software.

The GTX 465 does look to be an interesting product so without further ado, let’s see what this card is all about.

SKYMTL · May 30, 2010

In-Depth GF 100 Architecture Analysis (Core Layout)

In-Depth GF 100 Architecture Analysis (Core Layout)

The first stop on the whirlwind tour of the GeForce GF100 is an in-depth look at what makes the GPU tick as we peer into the core layout and how NVIDIA has designed it.

Many people incorrectly believed that the Fermi architecture was primarily designed for GPU computing applications and very little thought was given to the graphics processing capabilities. This couldn’t be further from the truth since the computing and graphics capabilities were determined in parallel and the result is a brand new architecture tailor made to live in a DX11 environment. Basically, NVIDIA needed to apply what they had learned from past generations (G80 & GT200) to the GF100.

What you are looking at above is the heart and soul of any GF100 card: the core layout. While we will go into each section in a little more detail below, from the overall view we can see that the main functions are broken up into four distinct groups called Graphics Processing Clusters or GPCs which are then broken down again into individual Streaming Multiprocessors (SMs), raster engines and so on. To make matters simple, think of it way: in its highest-end form, a GF100 will have four GPCs, each of which is equipped with four SMs for a total of 16 Streaming Multiprocessors broken up into groups of four. Within each of these SMs are 32 CUDA Cores (or shader processors from past generations) for a total of 512 cores in total. However, the current GTX 480 and GTX 470 cards make do with slightly less cores (480 and 448 respectively) while the GTX 465 cuts things down even more.

On the periphery of the die is the GigaThread Engine along with the memory controllers. The GigaThread Engine performs the somewhat thankless duty of reading the CPU’s commands over the host interface and then fetching data from the system’s main memory bank. The data is then copied over onto the framebuffer of the graphics card itself before being passed along to the designated engine within the core. Meanwhile, in its fullest form the GF100 incorporates a total of six 64-bit GDDR5 memory controllers for a total of 384-bits. The massive amount of bandwidth created by a 384-bit GDDR5 memory interface will provide extremely fast access to the system memory and eliminate any bottlenecks seen in past generations.

Each Streaming Multiprocessor holds 32 CUDA cores along with 16 load / store units which allows for a total of 16 threads per clock to be processed. Above these we see Warp Schedulers along with the associated dispatch units that process 32 concurrent threads (called Warps) to the cores.

Finally, closer to the bottom of the SM is the L1 / L2 cache, Polymorph Engine and the four texture units. In total, the maximum number of texture units in this architecture is 64 which should come as a surprise considering the older GT200 architecture supported up to 80 TMUs. However, NVIDIA has implemented a number of improvements with the way the architecture handles textures which we will go into in a later section. Suffice to say that the texture units are now integrated into the SP without having multiple SPs addressing a common texture cache.

Independent of the SM structure is six dedicated partitions of eight ROP units for a total of 48 ROPs as opposed to the 32 units from the GT200 architecture. Also different from the GT200 layout is that instead of backing up directly into the memory bus, the ROPs interface with the shared L2 cache which provides a quick interface for data storage.

SKYMTL · May 30, 2010

Efficiency Through Caching

Efficiency Through Caching

There are benefits to having dedicated L1 and L2 caches as this approach not only helps when it comes to GPGPU computing but also for storing draw calls so they are not passed off to the memory on the graphics card. This is supposed to drastically streamline rendering efficiency, especially in situations with a lot of higher-level geometry.

Above we have an enlarged section of the cache and memory layout within each SM. To put things into perspective, a Streaming Multiprocessor has 64KB of shared, programmable on-chip memory that can be configured in one of two ways. It can either be laid out as 48 KB of shared memory with 16 KB of L1 cache, or as 16 KB of Shared memory with 48 KB of L1 cache. However, when used for graphics processing as opposed to GPGPU functions, the SM will make use of the 16 KB L1 cache configuration. This L1 cache is supposed to help with access to the L2 cache as well as streamlining functions like stack operations and global loads / stores.

In addition, each texture unit now has its own high efficiency cache as well which helps with rendering speed.

Through the L2 cache architecture NVIDIA is able to keep most of the rendering function data like tessellation, shading and rasterizing on-die instead of going to the framebuffer (DRAM) which would slow down the process. Caching for the GPU benefits bandwidth amplification and alleviates memory bottlenecks which normally occur when doing multiple reads and writes to the framebuffer. In total, the GF100 has 768KB of L2 cache which is dynamically load balanced for peak efficiency.

It is also possible for the L1 and L2 cache to do loads and stores to memory and pass data from engine to engine so nothing moves off chip. Unfortunately, one of the issues with this approach is that significant die area is taken up by doing geometry processing in a parallel and scalable way while not using DRAM bandwidth.

When compared with the new GeForce GF100, the previous architecture is inferior in every way. The GT200 only used cache for textures and featured a read-only L2 cache structure whereas the new GPU’s L2 is rewritable and caches everything from vertex data to textures to ROP data and nearly everything in between.

By contrast, with their Radeon HD 5000-series, ATI dumps all of the data from the geometry shaders to the memory and then pulls it back into the core for rasterization before output. This causes a drop in efficiency and therefore performance. Meanwhile, as we discussed before, NVIDIA is able to keep all of their functions on-die in the cache without having to introduce memory latency into the equation and hogging bandwidth.

So what does all of this mean for the end-user? Basically, it means vastly improved memory efficiency since less bandwidth is being taken up by unnecessary read and write calls. This can and will benefit the GF100 in high resolution, high IQ situations where lesser graphics cards’ framebuffers can easily become saturated.

SKYMTL · May 30, 2010

A Closer Look at the Raster & PolyMorph Engines

A Closer Look at the Raster & PolyMorph Engines

In the last few pages you may have noticed mention of the PolyMorph and Raster engines which are used for highly parallel geometry processing operations. What NVIDIA has done is effectively grouped all of the fixed function stages into these two engines, which is one of the main reasons drastically improved geometry rendering is being touted for GF100 cards. In previous generations these functions used to be outside of the core processing stages (SMs) and NVIDIA has now brought them inside the core stages to ensure proper load balancing. This in effect will help immeasurably with tessellated scenes which feature extremely high triangle counts.

We should also note here and now that the GTX 400 series’ “core” clock numbers refer to the speed at which these fixed function stages run.

Within the PolyMorph engine there are five stages from Vertex Fetch to the Stream Output which each process data from the Streaming Multiprocessor they are associated with. The data then gets output to the Raster Engine. Contrary to past architectures which featured all of these stages in a single pipeline, the GF100 architecture does all of the calculations in a completely parallel fashion. According to our conversations with NVIDIA, their approach vastly improves triangle, tessellation, and Stream Out performance across a wide variety of applications.

In order to further speed up operations, data goes from one of 16 PolyMorph engines to another and uses the on-die cache structure for increased communication speed.

After the PolyMorph engine is done processing data, it is handed off to the Raster Engine’s three pipeline stages that pass off data from one to the next. These Raster Engines are set up to work in a completely parallel fashion across the GPU for quick processing.

Both the PolyMorph and Raster engines are distributed throughout the architecture which increases parallelism but are distributed in a different way from one another. In total, there are 16 PolyMorph engines which are incorporated into each of the SMs throughout the core while the four Raster Engines are placed at a rate of one per GPC. This setup makes for four Graphics Processing Clusters which are basically dedicated, individual GPUs within the core architecture allowing for highly parallel geometry rendering.

SKYMTL · May 30, 2010

Image Quality Improvements

Image Quality Improvements

Even though additional geometry could end up adding to the overall look and “feel” of a given scene, methods like tessellation and HDR lighting still require accurate filtering and sampling to achieve high rendering fidelity. For that, you need custom anti-aliasing (AA) modes as well as vendor-specific anisotropic filtering (AF) and everything in between. As the power of GPUs rapidly outpaces the ability of DX9 and even DX10 games to feed them with information, a new focus has been turned to image quality adjustments. These adjustments do tend to impact upon framerates but with GPUs like the GTX 400 series there is much less of a chance that increasing IQ will result in a game becoming unplayable.

Quicker Jittered Sampling Techniques

Many of you are probably scratching your head and wondering what in the world jittered sampling is. Basically, it is a shadow sampling method that has been around since the DX9 days which allows for realistic, soft shadows to be mapped by the graphics hardware. Unfortunately, this method is extremely resource hungry so it hasn’t been used very often regardless of how good the shadows it produces may look.

In the picture above you can see what happens with shadows which don’t use this method of mapping. Basically, for a shadow to look good it shouldn’t have a hard, serrated edge.

Soft shadows are the way to go and while past generations of hardware were able to do jittered sampling, they just didn’t have the resources to do it efficiently. Their performance was adequate with one light source in a scene but when asked to produce soft shadows from multiple light sources (in a night scene for example), the framerate would take an unacceptably large hit. With the GF100, NVIDIA had the opportunity to vastly improve shadow rendering and they did just that.

To do quicker, more efficient jittered sampling, NVIDIA worked with Microsoft to implement hardware support for Gather4 in DX11. Instead of doing four texture fetches per cycle, the hardware is now able to specify one coordinate with an offset and fetch four textures instead of having to fetch all four separately. This will significantly improve the shadow rendering efficiency of the hardware and is still able to work as a standard Gather4 instruction set if need be.

With this feature turned on, NVIDIA expects a 200% improvement in shadow rendering performance when compared to the same scene being rendered with their hardware Gather4 turned off.

SKYMTL · May 30, 2010

Image Quality Improvements (pg. 2)

Image Quality Improvements

32x CSAA Mode for Improved AA

In our opinion, the differences between the AA modes above 8x are minimal at best unless you are rendering thin items such as grass, a chain-link fence or a distant railing. With the efficiency of the DX11 API in addition to increased horsepower from cards like the GTX 400 series, it is now possible to use geometry to model vegetation and the like. However, developers will continue using the billboarding and alpha texturing methods from DX9 which allow for dynamic vegetation, but it will continue to look jagged and under-rendered. In such cases, anti-aliasing can be applied but high levels of AA are needed in order to properly render these items. This is why NVIDIA has implemented their new 32x Coverage Sample AA.

In order to accurately apply AA, three things are needed: coverage samples, color samples and levels of transparency. To put this into context, GT 200 had 8 color samples and 8 coverage samples which means a total rate of 16 samples on edges. However, this only allowed for only 9 levels of transparency. This led to edges which still looked jagged and without proper blending so dithering was implemented to mask the banding.

The GF100 on the other hand features 24 coverage samples and 8 color samples for a total of 32 samples (hence the 32x CSAA moniker). This layout also offers 33 levels of transparency for much smoother blending of the anti-aliased edges into the background and increased performance as well.

With increased efficiency comes decreased overhead when running complex AA routines and NVIDIA specifically designed the GF100 to cope with high IQ settings. Indeed, on average this new architecture only loses about 7% of its performance when going from 8x AA to 32x CSAA.

TMAA and CSAA: Hand in Hand

No matter how much AA you apply in DX9, there will still invariably be some issues with distant, thin objects that are less than a pixel wide due to the method older APIs use to render these. Transparency Multisample AA (TMAA) allows the DX9 API to convert shader code to effectively use alpha to coverage routines when rendering a scene. This, combined with CSAA, can greatly increase the overall image quality.

It may be hard to see in the image above but without TMAA, the railing in the distance would have its lines shimmer in and out of existence due to the fact that the DX9 API doesn’t have the tools necessary to properly process sub-single pixel items. It may not impact upon gaming but it is noticeable when moving through a level.

Since coverage samples are used as part of GF100’s TMAA evaluation, much smoother gradients are produced. TMAA will help in instances such as this railing and even with the vegetation examples we used in the last section.

SKYMTL · May 30, 2010

Touching on NVIDIA Surround / 3D Vision Surround

Touching on NVIDIA Surround / 3D Vision Surround

During CES, NVIDIA unveiled their answer to ATI’s Eyefinity multi-display capability: 3D Vision Surround and NVIDIA Surround. These two “surround” technologies from NVIDIA share common ground but in some ways their prerequisites and capabilities are at two totally different ends of the spectrum. We should also mention straight away that both of these technologies will become available soon and will support bezel correction management from the outset.

NVIDIA Surround

Not to be confused with NVIDIA’s 3D Vision Surround, their standard Surround moniker allows for three displays to be fed concurrently via an SLI setup. Yes, you need an SLI system in order to run three displays at the same time but the good news is that NVIDIA Surround is backwards compatible with GTX 200-series cards in addition to forwards compatible with all current GTX 400 series parts including the GTX 465. This method can display information across three 2560 x 1600 screens and allows for a mixture of monitors to be used as long as they all support the same resolutions.

The reason why SLI is needed is because both the GT200 series and the GF100 / 400-series cards are only capable of having a pair of display adapters active at the same time. In addition, if you want to drive three monitors at reasonably high detail levels, you’ll need some serious horsepower and that’s exactly what a dual or triple card system gives you.

This does tend to leave out the people who may want to use three displays for professional applications but that’s where NVIDIA’s Quadro series comes into play.

3D Vision Surround

We all know by now that immersive gaming has been taken to new levels by both ATI, with their HD 5000-series’ ability to game on up to three monitors at once, and NVIDIA’s own 3D Vision which offers stereoscopic viewing within games. What has now happened is a combining of these two techniques under the 3D Vision Surround banner, which brings stereo 3D to surround gaming.

This is the mac-daddy of display technologies and it is compatible with SLI setups of GTX 400 cards and older GT200-series. The reasoning behind this is pretty straightforward: you need a massively powerful system for rendering and outputting what amounts to six high resolution 1920 x 1080 images (two to each of the three 120Hz monitors). Another thing you should be aware of is the fact that all three monitors MUST be of the same make and model in order to ensure uniformity.

All in all, we saw NVIDIA’s 3D Vision Surround in action and while it was extremely impressive to say the least, we can't give any more thoughts about it since more testing on our part must be done.

SKYMTL · May 30, 2010

NVIDIA’s GTX 465; Specs, Background & Market Positioning

NVIDIA’s GTX 465; Specs, Background & Market Positioning

The premise behind the GTX 465 is actually twofold: to increase NVIDIA’s stable of marketable DX11 graphics cards and to ensure that any cores that don’t pass the binning necessary to make it into the GTX 470 or GTX 480 are still used in some way. If rumors are to be believed, NVIDIA pays their chip foundry (TSMC) by the wafer regardless of how many useable cores each of these pieces of silicon holds. This means there’s a need to make the most out of each wafer by using as many of the dies as possible. Since each wafer will always hold some dies that will have a good portion of their CUDA cores working, these are used for lower-end cards like the GTX 470 and GTX 465. Even the flagship GTX 480 was released with one Streaming Multiprocessor (32 CUDA cores) disabled due to yields of 512-core parts not being high enough for a widespread launch. Knowing this, let’s check out what the GTX 465 is saddled with.

When it comes to the core and graphics clock speeds, the GTX 465 is a spitting image of its bigger brother; the GTX 470. However, that’s where the similarities stop since the newest NVIDIA card makes due with 1GB of GDDR5 clocked at a mere 3.206Ghz QDR. Coupled with a narrower 256-bit memory bus than the 320-bit one that graces the 470, the GTX 465 is only able to muster a mere 102.6GB/s of memory bandwidth. To give you some perspective, the older GTX 275 with its GDDR3 interface was good for 127GB/s while ATI’s HD 5850 and HD 5830 have 128GB/s of bandwidth due to their use of GDDR5 clocked at 4Ghz. The result of this bandwidth situation could be anything from low framerates in high resolution scenarios to lackluster performance in games requiring large amounts of quickly accessed texture memory. On the other hand, most of the gamers this card targets will likely never play with anything above a 24” monitor anyways.

Graphical representation of the GTX 465 core architecture.

Back when we first previewed the architecture, we stated that NVIDIA would likely disable two Streaming Multiprocessors (64 CUDA cores and 8 TMUs) every time they wanted to create a new product in order to keep some performance differentiation between market segments. This hasn’t happened and the core layout of the GTX 465 illustrates how you can go from a full 512 core GPU to something altogether different. Instead of going with the predicted 384 CUDA cores and 48 TMUs for this new card, NVIDIA took out the proverbial trimmers and cut out an additional SM or 32 cores. The result is 11 SMs spread over three Graphics Processing Clusters for a total of 352 cores and 44 Texture Units. Luckily, the Raster Engine features dynamic load balancing so this odd number on the third GPC won’t be an issue.

Since the ROP, Cache and memory controller array scales separately from the SMs and GPCs, NVIDIA was a bit less liberal in their cutting here. What we are left with are four 64-bit GDDR5 memory controllers for a 256-bit interface along with 32 ROPs and 256KB of L2 Cache. The cache itself should partially alleviate some of the performance drop-off this card experiences because of its relatively low memory bandwidth.

Earlier, we mentioned that disabling cores can be beneficial when it comes to maximizing the returns from a single wafer. Unfortunately, this can also act as a double edged sword so to speak due to a convoluted law of diminishing returns when it comes to die sizes, performance and power consumption. It is feasible to have the vast majority of cores on a given die disabled in order to make a lower-end product but what you are left with is a massive number of transistors that are still consuming power (albeit a fraction of what they would if all the cores would be enabled) which makes such a proposition unappealing to consumers and board partners alike. This is one of the main reasons why the GTX 465 has a similar TDP to that of a GTX 470 while coming with significantly less horsepower. What’s the solution to this? A cut-down die like ATI has been using for their HD 5770 and lower end cards but for the time being, NVIDIA hasn’t released any specifications about what this might look like.

SKYMTL · May 30, 2010

Packaging and Accessories

Packaging and Accessories

It looks to us like ASUS has gone back to the usual dark knight and his steed for the box of their GTX 465. Along with this mascot comes the indication of ASUS’ Voltage Tweak software as well as a complete feature listing on the back of the box.

Within the exterior sleeve is a black box with gold trimmed writing that contains compartments for the graphics card, CDs and accessories. The card itself is placed in a bed of form-fitting high density foam and wrapped in an anti static bag for additional protection.

ASUS includes the bare necessities when it comes to accessories. You get a driver CD, an instruction manual, a single dual Molex to 6-pin adaptor, a DVI to VGA dongle and finally a horribly coloured DVI to HDMI adaptor. This adaptor is actually colour-coded for installation on the secondary DVI output on the back of the card.

SKYMTL · May 30, 2010

A Closer Look at the ASUS GTX 465 1GB

A Closer Look at the ASUS GTX 465 1GB

Since the vast majority of NVIDIA’s partners will be using a reference GTX 470 PCB and heatsink for their GTX 465 cards, there really isn’t anything here we haven’t seen already. There’s the usual full-length fan shroud topped by a blower-style fan and a large ASUS logo strategically placed to get your attention. The sticker ASUS uses looks a lot like carbon fiber and goes a long way to making their GTX 465 look as understated as possible.

The side of the card shows us the two 6-pin PCI-E power connectors as well as small holes which house the clips needed for keeping the heatsink shroud in place. The backplate meanwhile shows us a pair of DVI connectors (one of which is yellow to indicate the location for the included DVI to HDMI adaptor) as well as a mini HDMI port.

When lined up with a reference GTX 470, trying to find the differences between the two cards’ PCBs is like playing a Where’s Waldo game of epic proportions. Honestly, there is no difference that we could find other than a few missing memory traces on the GTX 465. Even though it is a cooler-running card, the GTX 465 still makes use of the fan cut out in the PCB for additional cooling.

Length-wise the GTX 465 is identical to the GTX 470 at 9” which means it won’t cause any issues for those of you with more cramped cases.

Search

ASUS GeForce GTX 465 1GB Review

SKYMTL

HardwareCanuck Review Editor

SKYMTL

HardwareCanuck Review Editor

In-Depth GF 100 Architecture Analysis (Core Layout)

SKYMTL

HardwareCanuck Review Editor

Efficiency Through Caching

SKYMTL

HardwareCanuck Review Editor

A Closer Look at the Raster & PolyMorph Engines

SKYMTL

HardwareCanuck Review Editor

Image Quality Improvements

SKYMTL

HardwareCanuck Review Editor

Image Quality Improvements

SKYMTL

HardwareCanuck Review Editor

Touching on NVIDIA Surround / 3D Vision Surround

SKYMTL

HardwareCanuck Review Editor

NVIDIA’s GTX 465; Specs, Background & Market Positioning

SKYMTL

HardwareCanuck Review Editor

Packaging and Accessories

SKYMTL

HardwareCanuck Review Editor

A Closer Look at the ASUS GTX 465 1GB

Latest posts

About Us

Online statistics

Follow Us On Social Media

Contact