Go Back   Hardware Canucks > HARDWARE > CPU's and Motherboards

    
Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old March 30, 2017, 04:51 PM
SKYMTL's Avatar
HardwareCanuck Review Editor
 
Join Date: Feb 2007
Location: Montreal
Posts: 13,587
Default ECC Memory & AMD's Ryzen - A Deep Dive Comment Thread

The debate about ECC memory support and Ryzen has been raging for the last few weeks. In this article we detail what you can expect right now from ECC compatibility on AMD's Ryzen processors.

Read more here: ECC Memory & AMD's Ryzen - A Deep Dive
__________________
Reply With Quote
  #2 (permalink)  
Old March 30, 2017, 06:31 PM
Hall Of Fame
 
Join Date: Oct 2009
Location: Ontario
Posts: 1,014

My System Specs

Default

L1 Tech's Wendell has tested and reported findings in 2 videos I've seen. I wanted more information from another source to corroborate or refute the findings so I'm glad to written article that covered this thoroughly from HWC. Thanks!

Edit - To be clear, this review is far more comprehensive, conclusive, and the information easy to find. I see Wendell already retweeted it instantly
__________________


Last edited by Vittra; March 30, 2017 at 07:08 PM.
Reply With Quote
  #3 (permalink)  
Old March 30, 2017, 07:08 PM
ZZLEE's Avatar
Hall Of Fame
F@H
 
Join Date: May 2009
Location: KANATA
Posts: 2,403

My System Specs

Default

Makes a build fore something like video encoding look promising.
__________________
"EVGA hunted down the last dozen or so expats living in Karachi." SKY
Reply With Quote
  #4 (permalink)  
Old March 31, 2017, 12:22 AM
EmptyMellon's Avatar
Allstar
 
Join Date: Nov 2010
Posts: 517

My System Specs

Default

Informative and useful, thank you. Looking forward to seeing what AMD has in store for us with their HEDT processors/platform.
Reply With Quote
  #5 (permalink)  
Old March 31, 2017, 05:18 AM
Rookie
 
Join Date: Mar 2017
Posts: 4
Default

Quote:
Originally Posted by HardwareCanucks
That is an uncorrected error (UE), otherwise known as a multi-bit error or a hard error. Multi-bit errors cannot be corrected by ECC memory. What is supposed to happen when they occur is that they should be detected, logged and the system should be immediately halted.
I disagree. An uncorrectable error does not mean that the system must be halted.

A machine check exception (MCE) will be raised and the operating system will be informed about the error. The operating system can then look up how the affected memory region is used, and take action based on this.

If it is kernel memory or dirty cache, then usually the system must be halted to prevent further corruption.
If it is unused memory, then nothing needs to happen.
If the memory belongs to a process, then that process can be terminated.
If it is non-dirty cache memory, then the cache page can be discarded.
Reply With Quote
  #6 (permalink)  
Old March 31, 2017, 06:39 AM
MAC's Avatar
MAC MAC is offline
Associate Review Editor
 
Join Date: Nov 2006
Location: Montreal
Posts: 1,085
Default

Quote:
Originally Posted by chithanh View Post
I disagree. An uncorrectable error does not mean that the system must be halted.
I can certainly check my documentation again, but multiple sources had listed that the default (or just most basic) response to a UE is a kernel panic / system halt. There might certainly be other alternatives, but considering how rare a UE is - and how it's mostly associated with hardware failure - a system halt was highlighted as the most optimal response.
Reply With Quote
  #7 (permalink)  
Old March 31, 2017, 06:59 AM
Rookie
 
Join Date: Mar 2017
Posts: 4
Default

At least for Linux, the default behaviour is to terminate the affected process with SIGBUS if the uncorrectable error happens in userspace, and panic if the error happens in kernel space.
See also linux/Documentation/x86/x86_64/boot-options.txt and linux/arch/x86/kernel/cpu/mcheck/mce.c

You can boot with mce=0 kernel parameter to always cause a kernel panic on uncorrectable errors.
Reply With Quote
  #8 (permalink)  
Old March 31, 2017, 07:23 AM
MAC's Avatar
MAC MAC is offline
Associate Review Editor
 
Join Date: Nov 2006
Location: Montreal
Posts: 1,085
Default

Interesting, although even when I set the Stress test to use all available memory, or caused enough instability to start corrupting Ubuntu, nothing happened aside from the UEs being detected. So based on my understanding of ECC I made the caution-minded conclusion that nothing was being done behind the scenes with respect to correcting multi-bit errors.

Last edited by MAC; April 1, 2017 at 08:33 AM.
Reply With Quote
  #9 (permalink)  
Old March 31, 2017, 09:03 AM
Trial Member
 
Join Date: Mar 2017
Posts: 1
Default

Nice article on using ECC w/Ryzen. While I wasn't expecting much in the way of actual support for ECC yet, I was initially curious about the existence of 16GB single rank chips such as this Kingston ECC module. Would be nice to see an evaluation done on those, from a system performance standpoint.
Reply With Quote
  #10 (permalink)  
Old March 31, 2017, 10:09 AM
MAC's Avatar
MAC MAC is offline
Associate Review Editor
 
Join Date: Nov 2006
Location: Montreal
Posts: 1,085
Default

Quote:
Originally Posted by heisryzen View Post
Nice article on using ECC w/Ryzen. While I wasn't expecting much in the way of actual support for ECC yet, I was initially curious about the existence of 16GB single rank chips such as this Kingston ECC module. Would be nice to see an evaluation done on those, from a system performance standpoint.
Hey, regrettably that modules is Registered ECC, so it won't work on this platform.
Reply With Quote
Reply


Thread Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
Intel's Optane & DC P4800X; A Deep Dive - Comment Thread SKYMTL Storage 7 March 28, 2017 03:11 PM
The AMD Ryzen CPU Preview; Zen Matures - Comment Thread SKYMTL CPU's and Motherboards 23 December 27, 2016 08:54 PM