View Single Post
  #1 (permalink)  
Old February 18, 2009, 04:25 PM
frontier204 frontier204 is offline
Hall Of Fame
F@H
 
Join Date: Nov 2008
Location: Ottawa, ON
Posts: 1,230

My System Specs

Default Conquered my UNSTABLE_MACHINE after ~4 months :)

Hi all,

I just wanted to share my troubleshooting story, in case someone else is having a problem with their ATI card in F@H. I was not able to fold ATI GPU work units since late October because of the "nonzero force sum on GPU" error.
Yesterday, my weekly attempt to solve this error actually worked! The key seemed to be performing the following steps to revert to Catalyst 8.12 from 9.1:
1. Run ATI Catalyst uninstall, uninstalled everything (including chipset drivers; I have an AMD rig). Restart.
2. Run Driver Sweeper on GPU drivers. Run CCleaner registry cleaner. Restart.
3. Install ATI Catalyst 8.12 chipset drivers (southbridge, then AHCI). Restart.
4. Install ATI Catalyst 8.12 GPU drivers (including Northbridge filter driver). Restart.
5. Copy all the amdcal*.dll from \syswow64\ to the FAH folder.

The rest of my "magic formula" is the following:

RIG and OS, other related software:
Phenom 9950 OC, 205 x 14 = 2.87 GHz, VID is 1.300V
Asus M3A78-T motherboard, modded with Pentium II CPU fan attached to northbridge
4 GB OCZ Fatal1ty RAM OC'd to 810 at 5-4-4-15, 2.1V
Windows Vista Business x64, SP1
Catalyst 8.12 drivers, for chipset, AHCI, IDE, and GPU
Running FAH off USB key
FAHGPU and 3x BOINC
CCC is active
CoolerMaster Elite 330 case modded with 4 fans: 1x 80mm intake, 1x 120mm intake, 1x 120mm exhaust, 1x 80mm exhaust
Seasonic M12 600W modular PSU

GPU:
Single Diamond Radeon 4850
- Stock single-slot fan at 60%, the label peeled off a few months ago due to heat hehehe
- Replaced all thermal compound with Arctic Silver 5 and Zerotherm compound (zerotherm for most chips, AS5 for VRMs because it is thicker)
- It seems I can leave the fan at its normal setting (which holds GPU temps at ~80C) and still complete WUs

FAH Client settings:
Console client (6.23)
Disabled CPU affinity lock
-forceasm
Priority higher than normal
UAC on, NO admin mode but XP compatibility mode is on
FAHCore_11 is allowed through Windows firewall (or else it would cause NANs error)
Copied all CAL DLLs from SYSWOW64: amdcalcl.dll, amdcaldd.dll, amdcalrt.dll.

From what I've experienced, if you see the "nonzero force sum on GPU" error coming up frequently on your FAH rig, DO NOT immediately suspect hardware instability. It's more likely than not an incompatibility with drivers or DirectX-hogging programs that you are also running. As a side note, when I intentionally OC'd my GPU to become unstable, I actually got the "NANs detected on GPU" error.

Hope this helps anyone else who is trying to troubleshoot GPU folding. I've had 100% success since 10:00PM EST yesterday. As an aside, I really hope Stanford can stop throwing code into their FAH cores at random, so debugging FAH will be a well-defined process rather than "black magic".

I'm afraid to restart my computer because the UNSTABLE_MACHINE might come back, but now Windows Update is bugging me to restart

EDIT: Revised my formula because it started EUEing again; maybe it just wanted me to poke the settings
__________________
"The computer programmer says they should drive the car around the block and see if the tire fixes itself." [src]

Last edited by frontier204; March 1, 2009 at 04:04 PM.
Reply With Quote