For the most par the FileSystemWatcher does a reasonably good job. That is until you start adding lots of files at once to the directories etc that are being 'watched'. That's when all hell breaks loose and you get lovely error messages like:
System.IO.InternalBufferOverflowException: Too many changes at once in directorySo, there are a few things that you can do to address these issues, the first is to make the Buffer bigger by setting the FileSystemWatcher.InternalBufferSize property to something larger than its default 8192, with the caveat that it comes from non-paged memory, so it is never swapped when the process is running and it should be in multiples of the page size of where ever you're running it.
Setting the buffer did help us eliminate some of the problems, but as I tested more and more aggressively, trying to check memory usage of our new feature and its robustness, it was still quite easy to replicate the buffer overflow. The next plan is to catch only the file changed events and then trigger our program to go and check all the directories and do its thing. Supposedly only File changed events need to be used as when a file is created it is done so in steps, first created and then the contents added, thus it triggers two events, Added and then modified. I'm not sure how well this'll work when deleting as that seems like it should only take a single operation.
While the buffer overflow is still a problem it is not insurmountable and once the program gets passed the test bench is unlikely to cause a problem. What is more of a problem is the BSOD that I started getting when I started testing the memory usage and processor impact of having many, many folders in our monitored directory. After four BSODs in controlled circumstances I was pretty sure it was related to the FileSystemWatcher when I was adding large numbers of folders to the monitored directory (with an awesomely hacky batch script, for loop + xcopy = win).
So, after spending the appropriate amount of time googling and wandering around Stackoverflow I determined that I had to dig out the minidumps (C:\Windows\Minidump) and have a look around. Right, so I naively double click on it only for Visual Studio 2010 to spit out this useless message.
Ahh, so how about I actually setup the windows debugging tools, which you can grab from here. So I installed it and then tried to run it, what I need the Symbols? Hmmm, so I pointed at the windows symbol server and run the "!analyze -v" command and get another useful error message.
Debugging Details:
------------------
***** Kernel symbols are WRONG. Please fix symbols to do analysis.Right, so I need symbols for Windows Server 2008R2 specifically, so I find my way to the download and start downloading them, it just happens to be that this 400Mb download is the slowest that I've ever had from a MS server. Actually thats when I started writing this blog post, waiting for the symbols to download. So I install the symbols and run it again.
I get the same error! WTF? I decide I better read into the details a bit more.
Unable to load image \SystemRoot\system32\DRIVERS\eamonm.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for eamonm.sys
*** ERROR: Module load completed but symbols could not be loaded for eamonm.sysIt was just telling me it couldn't find the symbols for 'eamonm.sys', when I actually look into the output a bit more it starts to make sense it is eamonm.sys that is causing me grief. This it turns out is one of the drivers for ESET's Nod32 Antivirus software, so apparently it is the antivirus software that is causing me problems. To confirm this I disabled the real time file system protection in Nod32.
Then I bludgeoned the directory with lots and lots of changes and then lots and lots more changes and it all looked good. Then I remembered that I needed to have my program running as well. Whoops! Better clear out the directory and do it again. Damn it looks like I've just been hit with the curse of the SSD, really slow delete times, been meaning to look into that as well.....
Ok, so program running, real time protection disabled and.... the problem is gone. Now to make sure excluding just that directory also fixes it. Well that and making sure no one runs the app I'm working on and adds 200 files to the folder....
For all of you that care (and if you're still reading when you get to here) here is the output from the analysis of the minidump.
MODULE_NAME: eamonm
FAULTING_MODULE: fffff80001649000 nt
DEBUG_FLR_IMAGE_TIMESTAMP: 4c18e225
READ_ADDRESS: 0000000000001f68
CURRENT_IRQL: 0
FAULTING_IP:
eamonm+833b
fffff880`0482e33b 493908 cmp qword ptr [r8],rcx
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: DRIVER_FAULT_SERVER_MINIDUMP
BUGCHECK_STR: 0xD1
LAST_CONTROL_TRANSFER: from fffff800016b8b69 to fffff800016b9600
STACK_TEXT:
fffff880`06ff1118 fffff800`016b8b69 : 00000000`0000000a 00000000`00001f68 00000000`00000002 00000000`00000000 : nt+0x70600
fffff880`06ff1120 00000000`0000000a : 00000000`00001f68 00000000`00000002 00000000`00000000 fffff880`0482e33b : nt+0x6fb69
fffff880`06ff1128 00000000`00001f68 : 00000000`00000002 00000000`00000000 fffff880`0482e33b fffff880`048429ee : 0xa
fffff880`06ff1130 00000000`00000002 : 00000000`00000000 fffff880`0482e33b fffff880`048429ee 00000000`00000000 : 0x1f68
fffff880`06ff1138 00000000`00000000 : fffff880`0482e33b fffff880`048429ee 00000000`00000000 00000000`00000000 : 0x2
STACK_COMMAND: .bugcheck ; kb
FOLLOWUP_IP:
eamonm+833b
fffff880`0482e33b 493908 cmp qword ptr [r8],rcx
SYMBOL_NAME: eamonm+833b
FOLLOWUP_NAME: MachineOwner
IMAGE_NAME: eamonm.sys
Worth it!!!??
ReplyDeleteNo, its really not. Need a better file watcher, its like trying to blow up a dam with a hammer.
ReplyDelete