Go Back   Hardware Canucks > HARDWARE CANUCKS NEWS > Suggestions & Feedback

    
Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old May 10, 2012, 06:10 PM
aristhrottle's Avatar
Top Prospect
 
Join Date: Oct 2009
Location: Toronto
Posts: 163
Default "Margin of error" - why not just do a variance test?

A lot of tech reviewers and sites seem to have this ethereal margin of error that comes out now and then when explaining data, e.g. "Product 1 performed X, and Product 2 performed Y, but those results are similar and could fall within the margin of error".

I've always wondered: why not just do a variance test? You would already have the data to do it so it wouldn't be an extra step or anything. I don't know, maybe no one cares about this sort of stuff but me, but I think it would be useful.
__________________
"CFL receiver Sylvain Girard announced his retirement today. His party will be held at The Keg, right after he and some other players finish their shifts there." - Air Farce
Reply With Quote
  #2 (permalink)  
Old May 10, 2012, 06:34 PM
sswilson's Avatar
Moderator
F@H
 
Join Date: Dec 2006
Location: Moncton NB
Posts: 14,793

My System Specs

Default

I understand where you're coming from, but with the amount of different hardware that's used as reference, and the fact that not only hardware changes, but drivers do as well, it'd be pretty intensive to provide exact variance for all possible configurations.

In general, anybody who's run 3DMark knows that there is a slight change from one run to another even if the variables haven't changed. The same can be said for pretty well any test that's run, but getting a handle on each one would be almost impossible.
__________________
MSI Z87I Gaming AC / i7 4770K / 2X 8G Gskill 1866 Sniper / XFX XTR 750 / EVGA GTX 680 SC+ 2GB / Intel DC S3700 200G / Seagate Barracuda 1TB
Inwin 904 / Swiftech MCP655-b / Alphacool NexXxos XT45 120 Rad / 2X Scythe GT AP-15 / EK Supreme HF / Dell UltraSharp U2412M

Asrock AM1H-ITX / AM1 Athlon 5350 / 2X4G Gskill PC3-14900 / Intel 6235 Wi-Fi / 90W Targus Power Brick / 250G Samsung 840 Series SSD / Mini-Box M350 / 1X 22" Dell IPS / 1X 22" HP
Reply With Quote
  #3 (permalink)  
Old May 10, 2012, 07:37 PM
Hall Of Fame
 
Join Date: Oct 2007
Posts: 1,588
Default

Pretty much the above. Hell using twitter on your computer at the same time a test runs could affect benchmark scores. It's impossible to perform variance testing on something with way, way too many variables.
Reply With Quote
  #4 (permalink)  
Old May 10, 2012, 08:12 PM
AkG's Avatar
AkG AkG is offline
Hardware Canucks Reviewer
 
Join Date: Oct 2007
Posts: 4,392
Default

Bingo.

But to expand on it a little bit. In an modern OS - even a stripped down one - there are litterally dozens of background processes running...any one of which can push numbers one way or the other just a bit. So its not even applications you have to worry about but the actual OS as well!

Then you have variances in the parts themselves. Take SSDs. For all intents and purposes a Corsair Force GT and a OCZ Vertex 3 are the same. Same NAND, same controller same firmware. Sooo in a perfect world the results would be EXACTLY the same...but they are not. Sample A vs Sample B...can have slight variances in the NAND performance as they come from different batches...just like dif batches of CPUs have slightly different performance capabilities. Add this in with the minor background "noise" and even human error (in some types of testing) and you get a margin of error that even averaging results can not overcome with 100% assurance.

This is why its "our" job to EXPLAIN the results. Sometimes its counter-intuitive but the explanation can sometimes actually be more important than the data in the chart....as you the reader have to trust that we know what we are doing (ie are "the experts") otherwise....why bother reading the review in the first place as ALL the results will be suspect...and yes this is why I gave up reading certain reviewers on some sites looooong before becoming one myself....hell it is that level of arrogance that allows me to BE a reviewer in the first place! :P
__________________
"If you ever start taking things too seriously, just remember that we are talking monkeys on an organic spaceship flying through the universe." -JR

“if your opponent has a conscience, then follow Gandhi. But if you enemy has no conscience, like Hitler, then follow Bonhoeffer.” - Dr. MLK jr
Reply With Quote
  #5 (permalink)  
Old May 10, 2012, 08:21 PM
ZZLEE's Avatar
Hall Of Fame
F@H
 
Join Date: May 2009
Location: KANATA
Posts: 2,144

My System Specs

Default

"A society grows great when old men plant trees whose shade they know they shall never sit in." - Greek Proverb


Love that^

I have watched live overclockiing on the web several times.

some parts rock some parts are and then theres human error
__________________
"EVGA hunted down the last dozen or so expats living in Karachi." SKY
Reply With Quote
  #6 (permalink)  
Old May 10, 2012, 09:26 PM
aristhrottle's Avatar
Top Prospect
 
Join Date: Oct 2009
Location: Toronto
Posts: 163
Default

Before I go on I should clarify that this is in no way a dig at tech reviewers, because I definitely enjoy and appreciate your work (otherwise I wouldn't be providing my suggestions). My problem with the current application of margin of error is that it 1) is arbitrary, and 2) obfuscates (for lack of a better word) potential differences and non-differences - there's a boring explanation for this that I am saving unless requested.

The examples of unpredicted variables affect scores is exactly what a variance test is for. For reference I was thinking of a simple t-test (again, saving the boring explanation unless someone really wants it).

AkG, you make a good point about variations within products. My immediate response would be to get more samples, but obviously that is not practical. Of course the question is then whether it is the reviewers job to review the individual product (i.e. your Vertex 3 sample) or line of products (i.e. all Vertex 3's)? I'm guessing it is a bit of both, but more of the former, because if not then why even bother reviewing products that are the same on paper?
__________________
"CFL receiver Sylvain Girard announced his retirement today. His party will be held at The Keg, right after he and some other players finish their shifts there." - Air Farce
Reply With Quote
  #7 (permalink)  
Old May 10, 2012, 09:41 PM
AkG's Avatar
AkG AkG is offline
Hardware Canucks Reviewer
 
Join Date: Oct 2007
Posts: 4,392
Default

I would love to get a sample size of 300 (though even this is too small and 3K would be better)...but that aint happening. We can only review what we are given, but you can see trends develop IF you have the experience to know what to look for. BUT as with anything, we can only report on what we have and what we can prove. There is some leeway and thus (like close air support) margin of error covers a multitude of sins. ;)

BUT the small sample size is why I always say take everything ANYONE - including me- says with a grain of salt. Read as many reviews on a product as possible. We could get a good one, someone else a bad one and third an average. By looking at them all you will get an idea for what the best case, worst case and most likely case scenarios will be....and then can make an informed decision.

And yes "margin of error" can seem pretty vague but it really does vary from test to test and from generation to generation....and is impossible to state MoE == 1 second or .1mb/s or etc etc.... IE nothing in life is guaranteed...but we do the best we can.
__________________
"If you ever start taking things too seriously, just remember that we are talking monkeys on an organic spaceship flying through the universe." -JR

“if your opponent has a conscience, then follow Gandhi. But if you enemy has no conscience, like Hitler, then follow Bonhoeffer.” - Dr. MLK jr
Reply With Quote
  #8 (permalink)  
Old May 11, 2012, 09:54 AM
Hall Of Fame
 
Join Date: Oct 2007
Posts: 1,588
Default

and even a sample size of 300 would have a margin of error for it's margin of error.

For example, and because AkG relates everything to an SSD these days, benchmark scores might be affected due to a degraded SSD. If you have an SSD at 100%, and you're using the swap for whatever reason for a large background process, it's going to be able to process that intruding background process faster then a degraded SSD, say at 30%. The benchmark might shift downward as a result, and the margin might be wider. Heck the margin of error could expand or narrow based on the type of processor used, a nice quick efficient Intel vs a not so quick in the pipeline AMD.

The point being that there's just way too many variables to consider, but if you think about it the reason tech reviewers are able to give a decent description of performance is because of bottlenecks. Something somewhere is going to give it's 100% best in a computer system, be it the tested device itself or another component. It's the portion of the bottlenecked device that can't give it's 100% for some reason (like resource starvation) that give us a majority of variance in a benchmark. If that component was able to perform at 100%, 100% of the time, there would be a lot less variance in benchmark testing. A different set up is going to bottleneck in a different way, on both the hardware and software side.

AkG is right though, one of the best things to do though is look at the best, worst and average cases available, kind of like a pseudo margin. It's unfortunately the best most reviews can practically offer. Also, I'm just kidding AkG, I just found it funny you did an SSD analogy, being the SSD guy. I had to take a stab at it. I did terribly in comparison :(
Reply With Quote
Reply


Thread Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
"BF3.3" Level 10 GT Case Mod is "Locked and Loaded" Mnpctech Cases 2 July 12, 2012 01:18 PM
How do i know if my "Drive Controller" is running in "AHCI mode"? Teo3201 Storage 2 October 8, 2011 10:54 PM
"Error: System fan has failed" phantom478 Air Cooling 16 September 29, 2009 06:02 AM
Continuous "Serious System ERror" BSOD, Reboots jazaddict Troubleshooting 20 December 31, 2008 09:26 AM
"Compute Error" encorp HardwareCanucks F@H Team 23 September 18, 2008 08:18 AM