File Server Performance Issues

Joined
Jul 23, 2009
Messages
36
First the server specs...

Dell PowerEdge T110 ii
Intel Xeon E3 1270
8GB RAM
Windows SBS 2011 Essentials
2 x 1TB 7200RPM SATA Drives RAID 1

Server Roles -

AD DS
DHCP
DNS

Now the environment...

25 Workstations
Gigabit Switch
Basic DD-WRT Router
Ubiquiti UniFi AP's

How they use it, and the issues...

So this is a medical office, they track the patient visits and misc metrics on Word, Excel and PDF files, and bill insurance from these forms. The front office staff pulls these documents up for the providers in the exam rooms via VNC from the front desk. The largest folder from which they pull these documents up has 3,965 folders and 59k files.

They also have appointment software and practice management software running on the server.

What is happening is they open some of there office documents on the server and they can't save them again, it gives them a permission error. They have to save as with a different name and then delete the old file.

I have gone threw various steps to try to fix the issue, disable Windows Desktop Search (reported issue) adjust the RAID controller settings for speed (write back cache, etc...), installed a Gigabit switch.

They are also having what appears to be speed performance issues.

In the networking tab on task manager they are getting spikes up to 35% utilization.

I am thinking the issue is a bottleneck due to the slower hard drives, I just need someone else to give me there opinion before we start purchasing new hardware.
 
Check this out: http://support.microsoft.com/kb/2589410/en-us

I've seen this before, but it ended up being related to the customer disabling opportunistic locking on their NAS device.

I would say that you definitely have a bottleneck in the disks, though. I can't imagine those two disks, in RAID1 (so good read, meh writes) would be optimal.

You'll need some perfmon data. Install PAL (http://pal.codeplex.com/) and hit the "Threshold File" tab. Click "Export to Perfmon template file." and save the XML to the server.

On the server, run:
Logman import -name SystemOverview -xml SystemOverview.xml
Logman start SystemOverview


Let that run during the business day, then run logman stop systemoverview.

Once you have the counter log, open it up and look for Physical Disk. Under that, add:

Avg. Disk sec/Read
Avg. Disk sec/Write
Avg. Disk Queue Length
% Idle Time

Make sure to select the proper disk from instances at the bottom. Click OK and check out the sec/Read and Write. If you see crazy numbers, beyond 0.050 (50ms) sustained, you've got disk latency issues. 50ms on a RAID1 set means you're only pushing about 40 reads/sec and just 20 writes (assuming cache is saturated). Also check out the disk queue. Anything sustained over 2 is bad news as well.

Also, be careful forcing write caching on disks that host NTDS.DIT for the DC. A hard power outage could result in corruption of the database if the disks don't honor forced unit access (FUA).

And sorry about the verbosity :) I work with customers a lot who require it, so it's habit now.
 
If your largest folder has nearly 4K nested folders and 59K files behind that, you might want to look into restructuring your folder system, especially if this is being accessed via network share.

There is a lot of metadata that needs to be pulled in before Explorer is ready to go. This is an interesting insight into what you might be experiencing - granted you don't have 1.8 million files, but as it is described, Explorer is pulling in data only by the hundreds at a time. http://technet.microsoft.com/en-us/magazine/hh395477.aspx
How is CPU and memory consumption on the machine opening up the folder?

SBS2011, iirc is based on Windows 2008 R2 tech, so if you open up Server Manager, expand Roles, expand File Services. On the right-hand side, find the Best Practices Analyzer section and click on "Scan This Role" to the right. Report back anything under non-compliant.

You also mention permission errors - You can assign custom permissions which do not grant append to a file, but do allow delete.
If you can provide the output of the permissions on one of the files this has happened to, it might be helpful. To do that, open up command prompt and go to the location and type in "icacls filename.ext" and paste the output here.
 
Check this out: http://support.microsoft.com/kb/2589410/en-us

I've seen this before, but it ended up being related to the customer disabling opportunistic locking on their NAS device.

Interesting... I will look into this one.

The exact error message they get is "Word cannot complete the save due to a file permission error"

I would say that you definitely have a bottleneck in the disks, though. I can't imagine those two disks, in RAID1 (so good read, meh writes) would be optimal.

You'll need some perfmon data. Install PAL (http://pal.codeplex.com/) and hit the "Threshold File" tab. Click "Export to Perfmon template file." and save the XML to the server.

On the server, run:
Logman import -name SystemOverview -xml SystemOverview.xml
Logman start SystemOverview


Let that run during the business day, then run logman stop systemoverview.

Once you have the counter log, open it up and look for Physical Disk. Under that, add:

Avg. Disk sec/Read
Avg. Disk sec/Write
Avg. Disk Queue Length
% Idle Time

Make sure to select the proper disk from instances at the bottom. Click OK and check out the sec/Read and Write. If you see crazy numbers, beyond 0.050 (50ms) sustained, you've got disk latency issues. 50ms on a RAID1 set means you're only pushing about 40 reads/sec and just 20 writes (assuming cache is saturated). Also check out the disk queue. Anything sustained over 2 is bad news as well.

Thanks, I didn't know about this utility and was looking for something that could give me some metrics. I have started it on there server so we will see what it says tomorrow.

Also, be careful forcing write caching on disks that host NTDS.DIT for the DC. A hard power outage could result in corruption of the database if the disks don't honor forced unit access (FUA).

And sorry about the verbosity :) I work with customers a lot who require it, so it's habit now.

We just installed a new UPS and I configured the shutdown software, so the risk of that should be mitigated. I was trying everything to improve performance, I also disabled SMB signing as I read it caused a 10%-15% overhead.

I appreciate the verbosity, it has been a big help!
 
First the server specs...
2 x 1TB 7200RPM SATA Drives RAID 1

There's the root cause: hardware issue..

The largest folder from which they pull these documents up has 3,965 folders and 59k files.

Here's the source of the problem. Your drives are choking processing all of this.
 
There's the root cause: hardware issue..



Here's the source of the problem. Your drives are choking processing all of this.

I agree, but it's always nice to have hard data to show the customer/client to justify upgrading.
 
I agree, but it's always nice to have hard data to show the customer/client to justify upgrading.

I know, and having now read the replies after I replied, it looks like you all pointed out how to get that data :)
 
I didn't get a chance to stop this last night so it ran all night.



Avg. Disk sec/Read 0 C: 0.003 0 0.27
Avg. Disk sec/Write 0 C: 0.004 0 0.123
Avg. Disk sec/Read 1 Z: 0.009 0 0.359
Avg. Disk sec/Write 1 Z: 0.006 0 0.191

Z is data C is OS

It looks like it is definitely an issue with the hard drives.
 
The exact error message they get is "Word cannot complete the save due to a file permission error"

I know, but that is meaningless.

Can you please run the command I asked you to on one of the files that generates this error from the command prompt: "icacls filename.ext" and report back the output.
This will provide the full set of permissions associated with the file.

Also, provide the user account that was used when the file was not saveable.
 
I know, but that is meaningless.

Can you please run the command I asked you to on one of the files that generates this error from the command prompt: "icacls filename.ext" and report back the output.
This will provide the full set of permissions associated with the file.

Also, provide the user account that was used when the file was not saveable.

I can't, the error is sporadic and the users delete the file after they save a new copy of it.

I reset all of the file permissions on that entire folder.
 
Sporadic, as in that after having this problem on a file, let's call it 'a.docx', the end user then saves it as 'a-copy.docx', then deletes 'a.docx', renames 'a-copy.docx' to 'a.docx' and then never again has this problem saving over the file again?

Also, what specifically do you mean you reset permissions? Did you break permission inheritance on these folders or files?

You need to instruct the user at if this happens again, to NOT delete the original file until you had a chance to run that icacls command on that specific file.
 
Sporadic, as in that after having this problem on a file, let's call it 'a.docx', the end user then saves it as 'a-copy.docx', then deletes 'a.docx', renames 'a-copy.docx' to 'a.docx' and then never again has this problem saving over the file again?

I can't say for sure.

Also, what specifically do you mean you reset permissions? Did you break permission inheritance on these folders or files?

Yes I disabled inheritance from the parent folder, set the OU the users are in the have all permisions.

You need to instruct the user at if this happens again, to NOT delete the original file until you had a chance to run that icacls command on that specific file.

I will do that, does it look to you that the hard drives aren't performing fast enough?
 
Yes I disabled inheritance from the parent folder, set the OU the users are in the have all permisions.
I'm hoping you meant group verses OU. You can't assign an OU to an ACL. There is a concept called "ghost groups", but that's a purely manual thing.

I will do that, does it look to you that the hard drives aren't performing fast enough?

I don't know yet - honestly, the error message generated by Office is generic, so yes. it could be the performance of the disks. It could also be real permission problems. Getting the actual permissions on a file that is having this problem will answer this question.
 
I can't, the error is sporadic and the users delete the file after they save a new copy of it.

Then pick a random file.

I reset all of the file permissions on that entire folder.

There are two sets of permissions at work here. Firstly there's the file permissions, but there's also share level permissions. For the latter you would typically set up the share with the Server\Administrators group having Full Control and the Server\Users group having Change access.
 
There are two sets of permissions at work here. Firstly there's the file permissions, but there's also share level permissions. For the latter you would typically set up the share with the Server\Administrators group having Full Control and the Server\Users group having Change access.
While it was a risk (you know, assumptions), I'm thinking that it is not a share permission issue. Share permissions are very simplistic: read or read/write or read/write/change share permissions.

Being able to delete and create a new file requires exactly the same minimum share permission, Change.
 
While it was a risk (you know, assumptions), I'm thinking that it is not a share permission issue. Share permissions are very simplistic: read or read/write or read/write/change share permissions.

Most likely, but it's worth checking, especially if Deny rights are in the mix.
 
Back
Top