Go Back   Hardware Canucks > HARDWARE > Storage

    
Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old September 5, 2011, 10:05 AM
MpG's Avatar
MpG MpG is offline
Hall Of Fame
 
Join Date: Aug 2007
Location: Kitchener, ON
Posts: 3,143
Default Is this the dreaded SF BSOD?

Just built myself a new rig (Gene-Z, 2600k, GTX580, Mushkin DDR3-1600), and decided to splurge on a pair of Corsair Force 3 120GB drives for some raid0 fun. Unfortunately, about 3 times in the last two weeks, I've had the computer crash out of nowhere, ending in a BSOD.

As the computer resets and goes through the reboot process, Intel Storage manager reports that raid 0 array made up of those drives has failed. In order to get things going again, I need to do a full power-off and restart that way. Once that happens, things are back in working order again.

Are these the symptoms of the current mysterious sandforce issue? Or does it sound like a different issue? I'm not getting anything specific pertaining to my system via Google at this point.
__________________
i7 2600K | ASUS Maximus IV GENE-Z | 580GTX | Corsair DDR3-2133
Reply With Quote
  #2 (permalink)  
Old September 5, 2011, 10:15 AM
Top Prospect
 
Join Date: Sep 2009
Location: Canada
Posts: 240

My System Specs

Default

well you've added RAID into the mix. so it could be your RAID control as well.
or one drive could be bad and the other is fine

so unfortunately this just became a large pain to try and figure out
any data on there you care about, can you get it off ?

then try and do non-raid and format the drives and have one as OS and other as whatever you want

might have to use just the 1 drive for a bit before having them both in
Reply With Quote
  #3 (permalink)  
Old September 5, 2011, 10:28 AM
AkG's Avatar
AkG AkG is offline
Hardware Canucks Reviewer
 
Join Date: Oct 2007
Posts: 4,319
Default

Without known the exact BSOD error code...no way of knowing if its SF related or not.
Honestly, running SF2281s in raid 0 only starts to pay for itself when you do 3 or more. You are running them in a degraded state, thus getting almost none of their full potential.

IF it was me I'd break the array, run one as your C the other as your D (with 2ndary apps installed on it).

Turn OFF hot plugging in the BIOS. See if the BSOD happens again. IF it does write it down. The "SF bug" is one of a couple things. Most likely its the partial sleep issue that they can enter (even though its part of the new Intel PCH and one of the advertised features), when they shouldnt then the OS tries to access the drive....and it errors out as the drive is sleeping. Reboot solves it as the system tells the drive to wake up on hard reboot.
__________________
"If you ever start taking things too seriously, just remember that we are talking monkeys on an organic spaceship flying through the universe." -JR

“if your opponent has a conscience, then follow Gandhi. But if you enemy has no conscience, like Hitler, then follow Bonhoeffer.” - Dr. MLK jr
Reply With Quote
  #4 (permalink)  
Old September 5, 2011, 10:49 AM
Hall Of Fame
 
Join Date: Feb 2011
Location: Ontario
Posts: 1,382

My System Specs

Default

You haven't defragged have you?

Edit: I should probably say this before you try it, don't do it, it can seriously mess up your drives... make sure automoatic defrag scheduling is off.
Reply With Quote
  #5 (permalink)  
Old September 5, 2011, 10:54 AM
Top Prospect
 
Join Date: Sep 2009
Location: Canada
Posts: 240

My System Specs

Default

just a side note : TRIM doesn't work in RAID

unless they've fixed that and i'm not up to date on that
Reply With Quote
  #6 (permalink)  
Old September 5, 2011, 11:00 AM
AkG's Avatar
AkG AkG is offline
Hardware Canucks Reviewer
 
Join Date: Oct 2007
Posts: 4,319
Default

Quote:
Originally Posted by roh_ultima View Post
just a side note : TRIM doesn't work in RAID

unless they've fixed that and i'm not up to date on that
Yup, you are indeed correct. RAID means no trim. Thus they are almost guaranteed to be in a degraded state (though maybe not as he didnt have the rig up and running long enough to REALLY hammer them..so maybe just partially degraded).
__________________
"If you ever start taking things too seriously, just remember that we are talking monkeys on an organic spaceship flying through the universe." -JR

“if your opponent has a conscience, then follow Gandhi. But if you enemy has no conscience, like Hitler, then follow Bonhoeffer.” - Dr. MLK jr
Reply With Quote
  #7 (permalink)  
Old September 5, 2011, 11:02 AM
MpG's Avatar
MpG MpG is offline
Hall Of Fame
 
Join Date: Aug 2007
Location: Kitchener, ON
Posts: 3,143
Default

Interesting. I was actually told to enable hot-plugging, but maybe that was a mistake. I'll give that a try. Wish this problem was more common, so I could actually replicate it. I thought it might be the controller, but I've also got a pair of 1TB drives also in raid0, and they show up as just fine, which led me to think the problem was with the drives themselves. It's a boot/OS/game drive, and I have no experience with imaging drives to move them around, so was hoping to avoid that. Might not be an option, tho.

And yeah, no worries. Definitely no defragging happening on them.
__________________
i7 2600K | ASUS Maximus IV GENE-Z | 580GTX | Corsair DDR3-2133
Reply With Quote
  #8 (permalink)  
Old September 5, 2011, 11:06 AM
MpG's Avatar
MpG MpG is offline
Hall Of Fame
 
Join Date: Aug 2007
Location: Kitchener, ON
Posts: 3,143
Default

Quote:
Originally Posted by AkG View Post
Yup, you are indeed correct. RAID means no trim. Thus they are almost guaranteed to be in a degraded state (though maybe not as he didnt have the rig up and running long enough to REALLY hammer them..so maybe just partially degraded).
Yeah, that was kind of a calculated risk, I admit. Lots of free space, and it just serves as an evening gamer/browser, so it's not a heavy workload. I figure I'll check it out in a month, and see if the limited garbage collection has been able to keep up or not. If not, guess I'll suck it up and reconfigure.
__________________
i7 2600K | ASUS Maximus IV GENE-Z | 580GTX | Corsair DDR3-2133
Reply With Quote
  #9 (permalink)  
Old September 5, 2011, 11:23 AM
AkG's Avatar
AkG AkG is offline
Hardware Canucks Reviewer
 
Join Date: Oct 2007
Posts: 4,319
Default

Quote:
Originally Posted by MpG View Post
Interesting. I was actually told to enable hot-plugging, but maybe that was a mistake. I'll give that a try. Wish this problem was more common, so I could actually replicate it. I thought it might be the controller, but I've also got a pair of 1TB drives also in raid0, and they show up as just fine, which led me to think the problem was with the drives themselves. It's a boot/OS/game drive, and I have no experience with imaging drives to move them around, so was hoping to avoid that. Might not be an option, tho.

And yeah, no worries. Definitely no defragging happening on them.
I posted this in a dif thread, but will post it here. Its from Tony on OCZ and has plenty of tips on best pracitces when you run into issues.

Quote:
here you go..my last theory on this and probably my most honest.

1 Hotplugging is the route to all evil...P67, Z68 nightmare. It was so messed up initially you HAD to have it enabled to get any stability..now I feel it is the cause of our instability.

2 How OCZ initially flash the drives and then the platforms they go on once sold....we could be seeing an issue here BUT we feel we have a cure for this...read on

3 P67 and Z68 is far from a 100% working solution, look at the chipset errata, look at the changelogs on drivers...Intel are battling to get it working correctly

4 unstable overclocked systems, especially at IDLE. This is mainly an Intel CPU issue, people test on load, they NEVER test IDLE stability. I found my CPU will cause BSOD at 1.6GHZ with any voltage lower than 1V. if I leave everything on auto and speedstep down from an overcloked turbo mode it was going below 1V...I now set voltages manually with idle and load having the same vcore.

5 people flash drives and forget they need to reset the boards (cmos reset) The boards receive info from windows on how the drives are to be configured and run, if you change something on the drive you have to let windows see this fresh info...this means when we say you need to reset cmos...you need to reset cmos

so back to No1

Hotplugging = we now think BAD...why?

If you boot a board up from new, you immediately go to bios, set hotplugging enabled on the sata ports and connect one of our SSD's; I feel this may set up the board to behave like it has a USB drive connected. Now add to this you may have flashed FW to the drive with hotplugging enabled also...this may permanently set up the drive so it mimics a removable device.

What I feel happens is the following...this is my theory, no one else's (apart from me&er ) and it is why we have been testing with the following in mind.

A SF2281 controller (used on Vtx3 AGT3 etc) supports automatic partial to slumber. What i feel may be happening is the drive is transitioning from partial to slumber BUT because hotplugging is live/enabled (possibly the drive flashed with it enabled) when the drive hits slumber the host controller (P67 PCH) thinks the drive in some circumstances has been disconnected....hence we get a BSOD with a disconnection error.

Remember this would only happen most of the time during an idle or low drive activity period.

So...we have been testing the following way.

1 remove the drive
2 clear the cmos
3 set hotplugging to disabled, set up bios then f10 and exit,....now power off
4 attach the drive
5 reflash the FW to the drive with hotplugging disabled
6 once reflashed, power down the system, remove power to the drive for 1 minute, reconnect the drive and power on.
7 boot to windows... go to device manager, navigate to disk drives, navigate to the SSD drive, right click on it...uninstall
9 you will now be asked to reboot
10 boot back to windows, let the drive be installed again and reboot
11 boot back to windows, make sure power scheme is how you want it ( i run balanced with sleep and hybrid enabled and LSPM to moderate)
12 reboot then run WEI
13 reboot
14 test


No6 is MOST important, just had this confirmed by SandForce, the drive right after being flashed HAS to see a full power cycle...i want you guys to be ultra thorough about this, if you can remove the drive or minimum take the power cable off the drive for at least 30 seconds to a minute.


on a new system I would initially clear the cmos on a new board, attach the drive, set hotplugging to disabled, immediately then reflash the drives FW then consider a clear cmos again with a drive reconnect following and again set hotplugging to disabled...then install

I'm not sure the second cmos clear is needed but it can do no harm to be thorough.

What HAS to be in place with this though is Orom 10.6 or newer and RST 10.6 or newer used (Intel platforms) anything earlier is risky (Orom 10.5 looks OK BUT we were warned you needed 10.6 to be sure)

The point is many end users do not want to do all this, they want a FW flash to just cure it....ain't going to happen im afraid. The board needs to see a drive flashed with hotplugging disabled from a fresh boot with a cleared cmos, so the freshly flashed drive passes its acpi table down to the board and the board correctly runs and sets up the drive.

Now, im not saying this is a fix, i have worked with a few who had issues that have now gone away. Some had BSOD but now have slight stutter...not sure if this is related or could just be Orom 10.5 and the updated Orom will cure it...overall though it for sure is better.

My opinion is this though, P67/Z68 has some weird issues,when you look at the frequency of Orom and driver releases from Intel you can not help but think they are dealing with many problems.
I find from my personal testing AMD's sata 3 host controller is more mature, and while a little slower, is a lot more stable. We still have guys who see issues on AMD systems also, I feel the bulk of these are not the same bug as seen on Intel systems...and we will work with those end users to get a working drive on their platform.

I have studied the ACPI table on the Intel 510, it does not support Auto Partial to Slumber, and the drive supports but has disabled set for APM...now APM is old and may not be to relevant, BUT Auto Partial to Slumber is very new and is pretty much an Intel invention, and it strikes me very odd they are not using it on their new drives and new PCH chipset. Now...im thinking some PCH actually can work with drives that are transitioning from wake to partial to slumber just fine...but I also feel some PCH have an issue with this, and hence some have issues.

If you want to try hotplugging off...its not going to help you much if you just go to bios, set disabled and hope thats all that is needed....read back and follow the steps i outlined.

Again im not saying this is the cure, but its what we have been looking at away from the public forum
You can read the rest here: Where we are with Vertex3/Agility3/Solid3 drives
__________________
"If you ever start taking things too seriously, just remember that we are talking monkeys on an organic spaceship flying through the universe." -JR

“if your opponent has a conscience, then follow Gandhi. But if you enemy has no conscience, like Hitler, then follow Bonhoeffer.” - Dr. MLK jr
Reply With Quote
  #10 (permalink)  
Old September 5, 2011, 12:06 PM
Top Prospect
 
Join Date: Sep 2009
Location: Canada
Posts: 240

My System Specs

Default

interesting. the hotswap actually improves performance on drives

12 reboot then run WEI <- what is WEI, i tried a search on it, was getting a much different result than expected, was about a company
Reply With Quote
Reply


Thread Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
BSOD kazman O/S's, Drivers & General Software 0 April 16, 2011 01:53 PM
BSOD Dar_ell Troubleshooting 2 August 12, 2010 12:51 PM
Getting BSOD's like a ***** Kayen Troubleshooting 20 October 31, 2009 12:07 AM
BSOD help! no_pulse O/S's, Drivers & General Software 5 September 20, 2009 06:23 PM
Help with BSOD plz Rocco_360 Troubleshooting 9 June 21, 2009 08:34 AM