Need help trying to diagnose BSOD (NTFS)

Flapjack

2[H]4U
Joined
Apr 29, 2000
Messages
3,207
I've been struggling with this one for months. The system is a home theater PC, which also serves all the media/apps on my domain. I've made images of most of my DVDs and rips of all our CDs to store there, so they'll be available from anywhere on the network. I'm using a Rosewill 5-bay SATA enclosure and have dumped the crappy 100% software PCI card that came with it in favor of a RocketRAID 2314. The bay is populated with (5) 2TB drives in a RAID5 array (4TB available after parity space).

I upgraded everything in the system about a year ago and the BSODs started shortly after.

It appears to have something to do with NTFS.sys, but a Chkdsk /f /r revealed nothing out of the ordinary. I've also ran MemTest on the system for over 24 hours without any errors.

My gut tells me it's something do with the RAID, but I can't seem to figure it out. The array always says "healthy" and I've never received any errors. The reason I feel it's the RAID is that sometimes I get errors when copying large files. I don't actually see an error (transfer completes as normal), but I do have missing bits and such. If I download a large file (such as an OS distribution) via torrent, uTorrent will report it as "100%", but extraction throws an error. If I force a re-check, the file will then show anywhere from 97 to 99.7% complete.

I'm stumped on this one fellas (and gals). Any help here would be awesome.

Here's a copy/paste of the minidump read using WinDbg and the correct Win7 symbols:

Code:
Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Users\*******\Desktop\011112-43259-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: SRV*C:\Windows\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 7 Kernel Version 7600 MP (2 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 7600.16841.amd64fre.win7_gdr.110622-1503
Machine Name:
Kernel base = 0xfffff800`01e01000 PsLoadedModuleList = 0xfffff800`0203ee70
Debug session time: Wed Jan 11 11:47:56.045 2012 (UTC - 7:00)
System Uptime: 0 days 20:07:33.521
Loading Kernel Symbols
...............................................................
................................................................
...............................
Loading User Symbols
Loading unloaded module list
......
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 24, {1904fb, fffff88001fbf708, fffff88001fbef70, 20000}

Probably caused by : Ntfs.sys ( Ntfs!NtfsDeleteScb+108 )

Followup: MachineOwner
---------
 
Can you use !analyze -v

Also, do a memory test, as this sounds like bad RAM.
 
I'm unclear as to how to use the analyze command.

I do have extra memory, so I can try that. That did get changed out with the new build, so that would explain it.... though I'm not sure how it could run MemTest for 24 hours without an issue.
 
I'm unclear as to how to use the analyze command.

I do have extra memory, so I can try that. That did get changed out with the new build, so that would explain it.... though I'm not sure how it could run MemTest for 24 hours without an issue.

A couple things point to bad memory - file copies failing, random corruption of downloads, this crash.

Using the analyze command is as simple as opening the dump file, setting up symbols, and typing !analyze -v in the command window, then pressing enter.
 
A couple things point to bad memory - file copies failing, random corruption of downloads, this crash.

Using the analyze command is as simple as opening the dump file, setting up symbols, and typing !analyze -v in the command window, then pressing enter.
Duh. Didn't realize there was a command line box right underneath the results. Thanks.

Here is the result of the analyze command:

Code:
1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

NTFS_FILE_SYSTEM (24)
    If you see NtfsExceptionFilter on the stack then the 2nd and 3rd
    parameters are the exception record and context record. Do a .cxr
    on the 3rd parameter and then kb to obtain a more informative stack
    trace.
Arguments:
Arg1: 00000000001904fb
Arg2: fffff88001fbf708
Arg3: fffff88001fbef70
Arg4: 0000000000020000

Debugging Details:
------------------


EXCEPTION_RECORD:  fffff88001fbf708 -- (.exr 0xfffff88001fbf708)
ExceptionAddress: 0000000000020000
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000008
   Parameter[1]: 0000000000020000
Attempt to execute non-executable address 0000000000020000

CONTEXT:  fffff88001fbef70 -- (.cxr 0xfffff88001fbef70)
rax=00000000ffffffff rbx=0000000000000001 rcx=fffff8a00adcac88
rdx=fffffa80036bc040 rsi=fffff8a00adcac70 rdi=0000000000000000
rip=0000000000020000 rsp=fffff88001fbf948 rbp=0000000000000130
 r8=0000000000009d08  r9=00000000000000c0 r10=fffff80001e01000
r11=0000000000000310 r12=0000000000000705 r13=0000000000000000
r14=fffff8a00adcac88 r15=fffff8a00adcaed8
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010286
00000000`00020000 3f              ???
Resetting default scope

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  0

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_PARAMETER1:  0000000000000008

EXCEPTION_PARAMETER2:  0000000000020000

WRITE_ADDRESS: GetPointerFromAddress: unable to read from fffff800020a90e0
 0000000000020000 

FOLLOWUP_IP: 
Ntfs!NtfsDeleteScb+108
fffff880`012d7cd8 488b03          mov     rax,qword ptr [rbx]

FAULTING_IP: 
+108
00000000`00020000 3f              ???

FAILED_INSTRUCTION_ADDRESS: 
+108
00000000`00020000 3f              ???

BUGCHECK_STR:  0x24

LAST_CONTROL_TRANSFER:  from fffff8000215d5ae to 0000000000020000

STACK_TEXT:  
fffff880`01fbf948 fffff800`0215d5ae : 00000000`00000001 fffff880`012d82b7 fffff8a0`0adcada0 fffff880`0124e471 : 0x20000
fffff880`01fbf950 fffff880`012d7cd8 : fffff8a0`0adcac70 fffffa80`036bc040 fffff880`01fbfa28 00000000`00000706 : nt!FsRtlTeardownPerStreamContexts+0xe2
fffff880`01fbf9a0 fffff880`012d79d9 : 00000000`01010000 00000000`00000000 fffff800`02016500 00000000`00000001 : Ntfs!NtfsDeleteScb+0x108
fffff880`01fbf9e0 fffff880`0124da50 : fffff8a0`0adcab70 fffff8a0`0adcac70 fffff800`02016500 fffff880`01fbfb52 : Ntfs!NtfsRemoveScb+0x61
fffff880`01fbfa20 fffff880`012d53ec : fffff8a0`0adcab40 fffff800`020165a0 fffff880`01fbfb52 fffffa80`05bf02f0 : Ntfs!NtfsPrepareFcbForRemoval+0x50
fffff880`01fbfa50 fffff880`01256602 : fffffa80`05bf02f0 fffffa80`05bf02f0 fffff8a0`0adcab40 fffff880`01fbfc00 : Ntfs!NtfsTeardownStructures+0xdc
fffff880`01fbfad0 fffff880`012ec8f3 : fffffa80`0526b180 fffff800`020165a0 fffff8a0`6366744e 00000000`00000009 : Ntfs!NtfsDecrementCloseCounts+0xa2
fffff880`01fbfb10 fffff880`012c6c9f : fffffa80`05bf02f0 fffff8a0`0adcac70 fffff8a0`0adcab40 fffffa80`0526b180 : Ntfs!NtfsCommonClose+0x353
fffff880`01fbfbe0 fffff800`01e7e7e1 : 00000000`00000000 fffff880`012c6b00 fffffa80`036bc001 00000000`00000002 : Ntfs!NtfsFspClose+0x15f
fffff880`01fbfcb0 fffff800`021116fa : 00000000`00000000 fffffa80`036bc040 00000000`00000080 fffffa80`036b3040 : nt!ExpWorkerThread+0x111
fffff880`01fbfd40 fffff800`01e4fb46 : fffff880`009c6180 fffffa80`036bc040 fffff880`009d0f40 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`01fbfd80 00000000`00000000 : fffff880`01fc0000 fffff880`01fba000 fffff880`01fbf9f0 00000000`00000000 : nt!KiStartSystemThread+0x16


SYMBOL_STACK_INDEX:  2

SYMBOL_NAME:  Ntfs!NtfsDeleteScb+108

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: Ntfs

IMAGE_NAME:  Ntfs.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4d79996d

STACK_COMMAND:  .cxr 0xfffff88001fbef70 ; kb

FAILURE_BUCKET_ID:  X64_0x24_BAD_IP_Ntfs!NtfsDeleteScb+108

BUCKET_ID:  X64_0x24_BAD_IP_Ntfs!NtfsDeleteScb+108

Followup: MachineOwner
---------
 
From the WRITE_ADDRESS area, it definitely looks like its a memory issue. I recommend doing a full memory test outside of Windows.
 
From the WRITE_ADDRESS area, it definitely looks like its a memory issue. I recommend doing a full memory test outside of Windows.
Is there something better than MemTest I should be using? If not, did I possibly choose the wrong option? Any certain way I should run it? It's been a year, so I'll redownload it to make sure I'm getting the latest version.
 
Is there something better than MemTest I should be using? If not, did I possibly choose the wrong option? Any certain way I should run it? It's been a year, so I'll redownload it to make sure I'm getting the latest version.

Hi, Flapjack,

I'd use Memtest86+. You can get an ISO version or a USB version.

Hope this helps.

Chuklr
 
You mentioned that you upgraded everything in it? Did you do motherboard/cpu too? If so, did that come with a reinstall of windows, or did you use what was already on the HD?
 
I just wanted to check something, does your motherboard support running 4TB of hard drives.

I have not really messed with raid, but I am wondering if you are hitting the 3TB+ max that older motherboards cant use unless you are running the uefi bios that allows you to run more than 3TB of hard drive on a single boot partition. Just a thought on my part though.
 
Hi, Flapjack,

I'd use Memtest86+. You can get an ISO version or a USB version.

Hope this helps.

Chuklr
That is what I've been using. I said MemTest just to shorten it. I'll make sure to use the latest version.

You mentioned that you upgraded everything in it? Did you do motherboard/cpu too? If so, did that come with a reinstall of windows, or did you use what was already on the HD?
I did upgrade everything in it. Clean install on a brand new hard drive (system volume).

I just wanted to check something, does your motherboard support running 4TB of hard drives.
The drives are not running off the motherboard. They're running off a Rosewill RSV-S5 connected to a RocketRAID 2314 PCI-E x4 card.

I have not really messed with raid, but I am wondering if you are hitting the 3TB+ max that older motherboards cant use unless you are running the uefi bios that allows you to run more than 3TB of hard drive on a single boot partition. Just a thought on my part though.
I'm not booting off a large partition. The motherboard is not old and supports UEFI anyways. I've had no problems addressing the 4TB partition, even on the old system. It works great, except for the random corruption I've been getting.
 
Try the build in windows memory checker since you are running windows 7.
 
I think you can assume your memory is fine if it passed that many test.

I'm thinking the problem might be software related. a few things to try if you can't already. Run a virus scan, then a malware scan with a few program (malwarebytes, antispyware,...)

then try to update your chipset drivers and any drivers for RocketRAID 2314 controller you have along with its firmware.

I'm starting to wonder though if your problem isn't related to the RocketRAID 2314 controller itself. I was doing some searching online and found a lot of people with Mac OS X having issues with crashes and kernel panics.

How often does the crashes happen? maybe try to remove the card for awhile and see if the crashes stop.
 
I think you can assume your memory is fine if it passed that many test.

I'm thinking the problem might be software related. a few things to try if you can't already. Run a virus scan, then a malware scan with a few program (malwarebytes, antispyware,...)

then try to update your chipset drivers and any drivers for RocketRAID 2314 controller you have along with its firmware.

I'm starting to wonder though if your problem isn't related to the RocketRAID 2314 controller itself. I was doing some searching online and found a lot of people with Mac OS X having issues with crashes and kernel panics.

How often does the crashes happen? maybe try to remove the card for awhile and see if the crashes stop.
That has been my gut for a while, but I should probably do at least one more MemTest86+ run to verify. I wish I knew more about reading the kernel dumps. Others sounded pretty sure it was memory based on the WinDbg results.

I have an HP microserver on the way from a SlickDeal I got a few days ago. It'll be here tomorrow. I'll setup the card in there and see how it goes. The other option is to attach it to my FreeNAS box, but I'm a little worried about setting it up. If it doesn't work right outta the box, I'm not exactly a pro with Linux. I can get most things going, but it'll probably take me more time than most Linux gurus.
 
sounds like that card is just shit from what I was reading, people where saying to scrap it and get a new one on a few of the sites I read. And it would seem that this one is like your old one and is doing RAID in software not hardware

the error that you are seeing is about it having an issue reading a memory address yes, but you don't know why. Everything that is in memory has to come from somewhere, if the controller gives the memory bad data then when it tries to use it you get the error about it being unable to read address whatever. That doesn't mean exactly that the memory is actually bad just that the data at that space is bad. Which means that either the stick is bad and is corrupting it, which would have been detected probably by the many runs of the testing, that the harddrive is bad and corrupting that data before it is sent to the RAM. a file itself is corrupt and just has bad data at that given location. A program has a bug and is trying to read memory that isn't there or doesn't belong to it. That your controller or motherboard is not copying that data correctly between various parts...

There are a lot of different things that can result in a memory address not being read.
 
That has been my gut for a while, but I should probably do at least one more MemTest86+ run to verify. I wish I knew more about reading the kernel dumps. Others sounded pretty sure it was memory based on the WinDbg results.

I have an HP microserver on the way from a SlickDeal I got a few days ago. It'll be here tomorrow. I'll setup the card in there and see how it goes. The other option is to attach it to my FreeNAS box, but I'm a little worried about setting it up. If it doesn't work right outta the box, I'm not exactly a pro with Linux. I can get most things going, but it'll probably take me more time than most Linux gurus.

Hi, Flapjack,

Just in case you are not aware Tawnos is one of the MS people that and he worked on Windows 7.

Chuklr
 
I have read lots and lots about sptd.sys being the cause of random BSODs, as well as BSOD loops where you cannot even get into the PC at all. The driver is used with Daemon Tools, Alcohol, and I'm pretty sure SlySoft Virtual Clone drive (which I myself use).

I use the program a lot, as most of my media library is in .iso format, as HDD storage is cheap enough to just image the DVDs down and store them in my RAID5 array. Just the thought of going back and converting those ISOs into .mkv gives me shudders.

So I could remove the software, which might take care of the random reboots. But I haven't heard of anyone experiencing issues with sptd.sys causing data corruption. Has anyone here?

**EDIT**
Virtual Clone Drive does not use SPTD. I did have another system that occasionally BSODs and reboots, so I removed SPTD from there. So now I'm back to the RAM in the HTPC with the data corruption issues, or possibly the RR2314 card itself.

I'm open for recommendations. I got the 2314 because it was the best card I could afford. I knew it wasn't 100% CPU independent, but it was pretty close.... and a helluva lot faster than POS 100% FakeRAID card that came with the enclosure.

What would be a step up from the RR2314 that supports port replicating?
 
Last edited:
I have read lots and lots about sptd.sys being the cause of random BSODs, as well as BSOD loops where you cannot even get into the PC at all. The driver is used with Daemon Tools, Alcohol, and I'm pretty sure SlySoft Virtual Clone drive (which I myself use).

I use the program a lot, as most of my media library is in .iso format, as HDD storage is cheap enough to just image the DVDs down and store them in my RAID5 array. Just the thought of going back and converting those ISOs into .mkv gives me shudders.

So I could remove the software, which might take care of the random reboots. But I haven't heard of anyone experiencing issues with sptd.sys causing data corruption. Has anyone here?

**EDIT**
Virtual Clone Drive does not use SPTD. I did have another system that occasionally BSODs and reboots, so I removed SPTD from there. So now I'm back to the RAM in the HTPC with the data corruption issues, or possibly the RR2314 card itself.

I'm open for recommendations. I got the 2314 because it was the best card I could afford. I knew it wasn't 100% CPU independent, but it was pretty close.... and a helluva lot faster than POS 100% FakeRAID card that came with the enclosure.

What would be a step up from the RR2314 that supports port replicating?

How often does the BSOD occur? I'm guessing you have a HD for the OS and programs and the RAID for data correct? if so what if you move a few files over to the OS drive, then remove the RAID for awhile and see how it responds without that card in there. If it makes it twice as long as what you normally get between crashes I would say that would be a good indication then that the card is the issue.
 
How often does the BSOD occur? I'm guessing you have a HD for the OS and programs and the RAID for data correct? if so what if you move a few files over to the OS drive, then remove the RAID for awhile and see how it responds without that card in there. If it makes it twice as long as what you normally get between crashes I would say that would be a good indication then that the card is the issue.
That is actually my next step. Because of my back surgery, I've had to take it easy. But I may be able to get my wife or kids to help.

After that, I'll try some different RAM. Even thought it's passed MemTest86+, there may be something going on that isn't being picked up.

Lastly, I'll try running the WD diagnostics on the system drive and then possibly imaging it to a new drive to see how it goes. Who knows, maybe it's time for a clean install as well.
 
So, even though the RAM tested in the PC with the issues, it passed MemTest86+ again with flying colors. I put each separate chunk of memory, one at at time, into a barebones HP N40L microserver and tested each stick for 8 hours each. No errors.

I decided to just switch out the old memory (PNY DDR 1333) and put some OCZ Gold DDR 1333 in. Glad I did!

I downloaded several 4GB-8GB of the ISOs via torrent and also via the MSDN site. Both methods showed zero data corruption, regardless of how they were downloaded.

Thanks for all the help. Hopefully this helps someone else also.
 
Boot Linux form a Desktop CD, if everything works then the problem is software (Windows). If the problem also exists in Linux then the problem is hardware.
 
Boot Linux form a Desktop CD, if everything works then the problem is software (Windows). If the problem also exists in Linux then the problem is hardware.
Just because it doesn't happen in Linux doesn't mean it's the fault of windows. The BSODs/HDD corruption didn't happen enough for booting off a Linux disc to be helpful. The system would run 24+ hours on MemTest86+. There's really no telling what was triggering the memory error, but I'm fairly certain torrents had something to do with it (though it happened occasionally without them running).
 
Back
Top