Archive for the 'Debugging' Category



Debugging a Debugger Deadlock

While I still use VisualStudio 2005 Team System for most of my development, I want to make sure that cfix works properly with VisualStudio 2008 as well.

To test that, I recently started a Windows 2003 Server VM, installed VCExpress 2008 and cfix and attempted to run an example project in the VC debugger. As long as no assertions fired, everything seemed fine. I then altered the example’s source code so that one of the assertion would fail, ran it in the debugger — and waited. Nothing happened.

When an assertion fires, cfix 1.1 attempts to capture the stack trace to make debugging easier. For this to work, the respective debugging symbols must be loaded, and — if not yet available — be downloaded from the symbol server. It is thus not uncommon that it takes a couple of seconds before the failure message appears and the debugger breaks in when an assertion fails. But after about a minute, still nothing happened — a deadlock seems to have occured.

To make the situation worse, I then noticed that while cfix was hanging, Internet Explorer would not properly start any more. After creating its main window, it would hang as well.

So I rebooted the machine and hooked up a kernel debugger. Luckily, the scenario was easy to reproduce and I could take a closer look…

Debugging the deadlock

First, let’s find the cfix32 process:

kd> !process 0 0 cfix32.exe
PROCESS 8254e490  SessionId: 0  Cid: 06b0    Peb: 7ffde000  ParentCid: 0658
    DirBase: 06e90000  ObjectTable: e170ee58  HandleCount:  69.
    Image: cfix32.exe

Now that we have the VA, we can switch to the process and load its user mode symbols:

kd> .process /r /p 8254e490  
Implicit process is now 8254e490
.cache forcedecodeuser done
Loading User Symbols
...............................

Looking at this process’ threads, it is easy to see that one stack looks suspicious:

kd> !process 8254e490  
PROCESS 8254e490  SessionId: 0  Cid: 06b0    
	Peb: 7ffde000  ParentCid: 0658
    DirBase: 06e90000  ObjectTable: e170ee58  HandleCount:  69.
    Image: cfix32.exe
    VadRoot 82455628 Vads 76 Clone 0 Private 491. 
    Modified 8. Locked 0.
    DeviceMap e148f3c0

    [...]

THREAD 826276f8  Cid 06b0.06b4  Teb: 7ffdd000 Win32Thread: 
  e1734a38 WAIT: (Unknown) KernelMode Non-Alertable
	f4422b44  SynchronizationEvent
Not impersonating
DeviceMap                 e148f3c0
Owning Process            8254e490       Image:         cfix32.exe
Wait Start TickCount      13474          Ticks: 9138 (0:00:01:31.511)
Context Switch Count      219                 LargeStack
UserTime                  00:00:00.010
KernelTime                00:00:00.010
Win32 Start Address cfix32!wmainCRTStartup (0x010048cf)
Start Address kernel32!BaseProcessStartThunk (0x77e617f8)
Stack Init f4423000 Current f4422aa0 Base f4423000 Limit f441f000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0

ChildEBP RetAddr  
f4422ab8 808202b6 nt!KiSwapContext+0x25
f4422ad0 8081fb6e nt!KiSwapThread+0x83 
f4422b14 809bae63 nt!KeWaitForSingleObject+0x2e0 
f4422bf4 809bc06d nt!DbgkpQueueMessage+0x178 
f4422c18 8096ba9a nt!DbgkpSendApiMessage+0x45 
f4422cc8 80909942 nt!DbgkMapViewOfSection+0xcf 
f4422d34 8082350b nt!NtMapViewOfSection+0x269 
f4422d34 7c8285ec nt!KiFastCallEntry+0xf8 
00069b94 7c82728b ntdll!KiFastSystemCallRet 
00069b98 7c831e05 ntdll!NtMapViewOfSection+0xc
00069bdc 7c831fd6 ntdll!LdrpMapViewOfDllSection+0x64 
00069ccc 7c833027 ntdll!LdrpMapDll+0x390 
[...]

The thread is blocked — it is waiting on en event. Events do not have an owner so we have to do a little more to find out, what it is waiting for. Unfortunately, !thread crops the trace, so let us first get the full one:

kd> .thread 826276f8  
kd> kn100
 # ChildEBP RetAddr  
00 f4422ab8 808202b6 nt!KiSwapContext+0x25
01 f4422ad0 8081fb6e nt!KiSwapThread+0x83
02 f4422b14 809bae63 nt!KeWaitForSingleObject+0x2e0
03 f4422bf4 809bc06d nt!DbgkpQueueMessage+0x178
04 f4422c18 8096ba9a nt!DbgkpSendApiMessage+0x45
05 f4422cc8 80909942 nt!DbgkMapViewOfSection+0xcf
06 f4422d34 8082350b nt!NtMapViewOfSection+0x269
07 f4422d34 7c8285ec nt!KiFastCallEntry+0xf8
08 00069b94 7c82728b ntdll!KiFastSystemCallRet
09 00069b98 7c831e05 ntdll!NtMapViewOfSection+0xc
0a 00069bdc 7c831fd6 ntdll!LdrpMapViewOfDllSection+0x64
0b 00069ccc 7c833027 ntdll!LdrpMapDll+0x390
0c 00069f30 7c8330f5 ntdll!LdrpLoadImportModule+0x17c
0d 00069f70 7c8330a4 ntdll!LdrpHandleOneNewFormatImportDescriptor+0x4d
0e 00069f8c 7c833248 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d
0f 0006a014 7c833049 ntdll!LdrpWalkImportDescriptor+0x195
10 0006a264 7c8330f5 ntdll!LdrpLoadImportModule+0x1cb
11 0006a2a4 7c8330a4 ntdll!LdrpHandleOneNewFormatImportDescriptor+0x4d
12 0006a2c0 7c833248 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d
13 0006a348 7c83427d ntdll!LdrpWalkImportDescriptor+0x195
14 0006a5e0 7c834065 ntdll!LdrpLoadDll+0x241
15 0006a85c 77e41bf3 ntdll!LdrLoadDll+0x198
16 0006a8c4 77e41dbd kernel32!LoadLibraryExW+0x1b2
17 0006a8d8 77e41df3 kernel32!LoadLibraryExA+0x1f
18 0006a8f8 46a7870c kernel32!LoadLibraryA+0xb5
19 0006a954 46a93b3e WININET!__delayLoadHelper2+0xfc
1a 0006a994 46a93950 WININET!_tailMerge_RASAPI32_dll+0xd
1b 0006a9a8 46a93a4e WININET!DoConnectoidsExist+0x2b
1c 0006a9d4 46a93abc WININET!GetRasConnections+0x34
1d 0006a9f0 46a8c559 WININET!IsDialUpConnection+0xa9
1e 0006aa0c 46a97a44 WININET!FixProxySettingsForCurrentConnection+0x31
1f 0006b5e4 46aa3774 WININET!InternetQueryOptionA+0xa47
20 0006b748 01d12dc6 WININET!InternetQueryOptionW+0x1fa
21 0006b98c 01d12583 symsrv!StoreWinInet::dumpproxyinfo+0x46
22 0006be04 01d1290a symsrv!StoreWinInet::connect+0x273
23 0006c040 01d05ae7 symsrv!StoreWinInet::find+0x3a
24 0006c134 01d06c47 symsrv!cascade+0x87
25 0006c684 01d06a57 symsrv!SymbolServerByIndexW+0x127
26 0006c8b4 0302e30e symsrv!SymbolServerW+0x77
27 0006ccf4 03018eed dbghelp!symsrvGetFile+0x12e
28 0006d9dc 03019f57 dbghelp!diaLocatePdb+0x33d
29 0006dc58 03041ade dbghelp!diaGetPdb+0x207
2a 0006de7c 0303ff15 dbghelp!GetDebugData+0x2be
2b 0006e324 03040516 dbghelp!modload+0x305
2c 0006e7a4 0304068e dbghelp!LoadModule+0x3f6
2d 0006e9e8 03044eaf dbghelp!GetModule+0x4e
2e 0006ea30 03044bda dbghelp!NTGetProcessModules+0x16f
2f 0006eae8 03032e80 dbghelp!GetProcessModules+0x4a
30 0006ed70 60f03f7a dbghelp!SymInitializeW+0x320
31 0006f17c 60f032b0 cfix!CfixpCaptureStackTrace+0x117 
32 0006f598 100419b5 cfix!CfixPeReportFailedAssertion+0xc5 
WARNING: Stack unwind information not available. 
Following frames may be wrong.
33 0006f7c0 10040f40 VsSample!__CfixFixturePeSimpleAdderTest+0x5ab1
34 0006f8b4 10040db6 VsSample!__CfixFixturePeSimpleAdderTest+0x503c
35 0006fa68 10040a8a VsSample!__CfixFixturePeSimpleAdderTest+0x4eb2
36 0006fb48 60f02b64 VsSample!__CfixFixturePeSimpleAdderTest+0x4b86
37 0006fb84 60f02be6 cfix!CfixsRunTestRoutine+0x33 
38 0006fb94 60f038e9 cfix!CfixsRunTestCaseMethod+0x27 
39 0006fbac 60f03a06 cfix!CfixsRunTestCase+0x25 
3a 0006fbcc 60f03ce5 cfix!CfixsRunTsexecActionMethod+0xfb 
3b 0006fbf0 0100e135 cfix!CfixsRunSequenceAction+0x122 
3c 0006fc2c 0100d5c2 cfix32!CfixrunpRunFixtures+0x90 
3d 0006fc40 0100d85c cfix32!CfixrunsMainWorker+0x3f 
3e 0006fe7c 010046b8 cfix32!CfixrunMain+0x1b9 
3f 0006fee0 0100485e cfix32!wmain+0x80 
40 0006ffc0 77e6f23b cfix32!_wmainCRTStartup+0x12b 
41 0006fff0 00000000 kernel32!BaseProcessStart+0x23

Whoa, what a trace! cfix!CfixpCaptureStackTrace tries to assemble a stack trace, for which it has to initialize dbghelp.dll first. dbghelp!SymInitializeW seeks help of symsrv.dll, which in turn tries to connect to the Microsoft symbol server. Before it can so, it obviously attempts to get its proxy settings straight, which in turn leads to some DLL (rasapi.dll, in case you wonder) being loaded. The loader then calls into the debugging subsystem (nt!Dbgk*). It may be assumed that the loader is notifying the debugger about the DLL having been loaded.

Turining our attention to Internet Explorer, we look at iexplore.exe’s threads:

kd> !process 0 0 iexplore.exe
PROCESS 824ec3b0  SessionId: 0  Cid: 07f0    Peb: 7ffdb000  ParentCid: 054c
    DirBase: 0307c000  ObjectTable: e15186b8  HandleCount: 225.
    Image: iexplore.exe
    
kd> .process /r /p 824ec3b0  
Implicit process is now 824ec3b0
.cache forcedecodeuser done
Loading User Symbols
...............................................

Now, iexplore has lots of threads, but skimming over them, one looked interesting:

kd> !process 824ec3b0  
PROCESS 824ec3b0  SessionId: 0  Cid: 07f0    
	Peb: 7ffdb000  ParentCid: 054c
    DirBase: 0307c000  ObjectTable: e15186b8  HandleCount: 225.
    Image: iexplore.exe
    VadRoot 824991c8 Vads 168 Clone 0 Private 643. 
    Modified 44. Locked 0.
    DeviceMap e148f3c0

    [...]
    
THREAD 82431980  Cid 07f0.00b8  Teb: 7ffd5000 Win32Thread: 00000000 
  WAIT: (Unknown) UserMode Non-Alertable
	82631eb0  Mutant - owning thread 826276f8
Not impersonating
DeviceMap                 e148f3c0
Owning Process            824ec3b0       Image:         iexplore.exe
Wait Start TickCount      21924          Ticks: 688 (0:00:00:06.889)
Context Switch Count      1             
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address ntdll!RtlpWorkerThread (0x7c839efb)
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init f40a3000 Current f40a2c78 Base f40a3000 Limit f40a0000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0

ChildEBP RetAddr  
f40a2c90 808202b6 nt!KiSwapContext+0x25 
f40a2ca8 8081fb6e nt!KiSwapThread+0x83 
f40a2cec 8090e64e nt!KeWaitForSingleObject+0x2e0 
f40a2d50 8082350b nt!NtWaitForSingleObject+0x9a 
f40a2d50 7c8285ec nt!KiFastCallEntry+0xf8 
01e5fdd0 7c827d0b ntdll!KiFastSystemCallRet 
01e5fdd4 77e61d1e ntdll!NtWaitForSingleObject+0xc 
01e5fe44 77e61c8d kernel32!WaitForSingleObjectEx+0xac 
01e5fe58 46a8c54d kernel32!WaitForSingleObject+0x12 
01e5fe74 46a7eeca WININET!FixProxySettingsForCurrentConnection+0x25 
01e5fe8c 46a7ee3f WININET!CFsm_HttpSendRequest::RunSM+0x61 
01e5fea4 46a7efa3 WININET!CFsm::Run+0x39 
01e5fed4 77da5938 WININET!CFsm::RunWorkItem+0x79 
01e5feec 7c83a827 SHLWAPI!ExecuteWorkItem+0x1d 
01e5ff44 7c83aa0b ntdll!RtlpWorkerCallout+0x71 
01e5ff64 7c83aa82 ntdll!RtlpExecuteWorkerRequest+0x4f 
01e5ff78 7c839f60 ntdll!RtlpApcCallout+0x11 
01e5ffb8 77e64829 ntdll!RtlpWorkerThread+0x61 
01e5ffec 00000000 kernel32!BaseThreadStart+0x34 

Now we are getting somewhere. We have seen FixProxySettingsForCurrentConnection in cfix’s trace already — but in this case, it is waiting on something. Let’s see…


kd> !object 82631eb0  
Object: 82631eb0  Type: (827a5550) Mutant
    ObjectHeader: 82631e98 (old version)
    HandleCount: 3  PointerCount: 6
    Directory Object: e1496420  Name: WininetProxyRegistryMutex

And 826276f8, that’s the cfix32 thread we have already assessed. Obviously, iexplore waits for cfix to release the WininetProxyRegistryMutex, and cfix waits on someone else.

Turning over to VC, we can find a stack that also contains a call to FixProxySettingsForCurrentConnection on its stack. Again, blocking on WininetProxyRegistryMutex.

kd> k100
ChildEBP RetAddr  
f4492c90 808202b6 nt!KiSwapContext+0x25
f4492ca8 8081fb6e nt!KiSwapThread+0x83
f4492cec 8090e64e nt!KeWaitForSingleObject+0x2e0
f4492d50 8082350b nt!NtWaitForSingleObject+0x9a
f4492d50 7c8285ec nt!KiFastCallEntry+0xf8
065b95c8 7c827d0b ntdll!KiFastSystemCallRet
065b95cc 77e61d1e ntdll!NtWaitForSingleObject+0xc
065b963c 77e61c8d kernel32!WaitForSingleObjectEx+0xac
065b9650 46a8c54d kernel32!WaitForSingleObject+0x12
065b966c 46a7eeca WININET!FixProxySettingsForCurrentConnection+0x25
065b9684 46a7ee3f WININET!CFsm_HttpSendRequest::RunSM+0x61
065b969c 46a7fefa WININET!CFsm::Run+0x39
065b96b4 46ab0a67 WININET!DoFsm+0x25
065b96dc 46aa1092 WININET!HttpWrapSendRequest+0x148
065b9714 06b231da WININET!HttpSendRequestW+0x5e
065b973c 06b22ea8 SYMSRV!StoreWinInet::request+0x2a
065b9770 06b226cc SYMSRV!StoreWinInet::fileinfo+0x18
065b9780 06b22741 SYMSRV!StoreWinInet::get+0x7c
065b9fc4 06b229a3 SYMSRV!StoreWinInet::open+0x41
065ba204 06b15ae7 SYMSRV!StoreWinInet::find+0xd3
065ba2f8 06b16c47 SYMSRV!cascade+0x87
065ba848 06b16a57 SYMSRV!SymbolServerByIndexW+0x127
065baa78 51412896 SYMSRV!SymbolServerW+0x77
065bb8cc 51413383 mspdb80!LOCATOR::SYMSRV::SymbolServer+0x190
065bbf10 514136f8 mspdb80!LOCATOR::FLocatePdbSymsrv+0x75
065bbf38 514139ce mspdb80!LOCATOR::FLocatePdbPathHelper+0x179
065bc96c 51413cbe mspdb80!LOCATOR::FLocatePdbPath+0x105
065bccb4 51414371 mspdb80!LOCATOR::FLocatePdb+0x1ad
065bd9a8 458cc1e8 mspdb80!PDBCommon::OpenValidate5+0xab
065bd9ec 45959d4c msenc90!enc::EncImageEdit::
                  `scalar deleting destructor'+0x4d
065bda34 45958e62 NatDbgDE!OLPDBOpen+0x93
065be6c0 45958f0a NatDbgDE!OLStart+0x107
065be6fc 45958fae NatDbgDE!LoadOmfForReal+0x23
065be714 45959019 NatDbgDE!LoadSymbols+0x43
065be72c 459590d9 NatDbgDE!OLLoadOmf+0x55
065be75c 45959154 NatDbgDE!SHLoadDll+0xd5
065be7ac 45959247 NatDbgDE!CSymbolHandlerX::SHLoadDll+0x5a
065be844 4595937c NatDbgDE!CModule::Load+0x1a1
065be8ac 4594d002 NatDbgDE!CNativeProcess::NotifyModLoad+0xc8
065be9ec 4594cf6d NatDbgDE!EngineCallback+0xb3
065bea18 45958d3f NatDbgDE!EMCallBackDB+0x4c
065bf050 4594d0dc NatDbgDE!LoadFixups+0x218
065bf0ac 4594d289 NatDbgDE!DebugPacket+0x213
065bfdb4 4594cf39 NatDbgDE!EMFunc+0x40f
065bfddc 4594d73d NatDbgDE!TLCallBack+0x1e
065bfdf4 4594d711 NatDbgDE!TLClientLib::Local_TLFunc+0xc8
065bfe3c 4594d85c NatDbgDE!DMSendPacket+0x121
065bfee8 45959b1d NatDbgDE!NotifyEM+0x3ae
065bff0c 4594d663 NatDbgDE!ProcessLoadDLLEvent+0x47
065bff44 4594d686 NatDbgDE!ProcessDebugEvent+0x30d
065bffb8 77e64829 NatDbgDE!DmPollLoop+0x3c
065bffec 00000000 kernel32!BaseThreadStart+0x34

But — looking closely, it becomes obvious that this thread must be the one handling debug events, and in fact, the call to ProcessLoadDLLEvent is a strong indication for that this thread is currently handling a DLL load event. And now we have closed the loop — this thread must be handling the DLL load event for rasapi.dll, the DLL which cfix was about to load. And to do this, VC attempts to acquire the WininetProxyRegistryMutex, which is owned by the original cfix thread. Deadlock.

What is interesting about this situation is that neither party — cfix, iexplore or VCExpress, and also none of the modules clearly is the culprit and behaving wrong. It is more like a combination of special circumstances that bring up the deadlock as discussed.

It is also notable that I am not using any particular proxy settings on this machine and automatic proxy configuration has been turned off.

So far, I have not experienced the same problem with VS 2003 and VS 2005 — I thus assume that only VS 2008 is affected by this.

Although I am pretty sure that cfix is not really at fault here, I have to adapt it to avoid this deadlock in the future. Until an updated version is available, you can use this workaround.

cfix 1.1 may experience hangs when used in the VisualStudio 2008 debugger

When used in conjunction with the VisualStudio 2008 debugger, cfix may hang indefinitely as soon as an assertion fails. The reason for this behavior is a Symbol Server-caused deadlock between the debugger and cfix. I am going to discuss the details of this deadlock in a separate post.

Until a new version of cfix is available, you can work around this problem as follows: Go to the cfix installation directory and rename or delete symsrv.dll. This will stop cfix from contacting the symbol server. While cfix now will not be able to capture proper stack traces any more, this will prevent the mentioned deadlock situation to arise.

The problem only applies to cfix 1.1 and VS 2008. The problem does not occur when debugging with VS 2003, VS 2005, or cdb/ntsd/WinDBG.

Thread 0:0 is special

Thread IDs uniquely identify a thread — this certainly holds for user mode threads and should also hold for kernel mode threads. But there is one kind of thread where the ID does not uniquely identify a KTHREAD — the Idle thread.

On a uniprocessor system, there is only one Idle thread and this idle thread will have the thread ID 0 (in process 0). On a multiprocessor system, however, Windows creates one Idle thread per CPU. That makes sense — however, what may be surprising at first is that although all Idle threads have their own KTHREAD structure, all share the same thread ID 0 (CID 0:0). That is, each multiprocessor system will have multiple threads with ID 0, which in turn means that CID 0:0 does not uniquely identify a single thread.

This is easily verified using the kernel debugger on a multiprocessor system (a quad core in this case):

0: kd> !running

System Processors f (affinity mask)
  Idle Processors f
All processors idle.

0: kd> !pcr 0
KPCR for Processor 0 at ffdff000:
[...]

	      CurrentThread: 8089d8c0
	         NextThread: 00000000
	         IdleThread: 8089d8c0


0: kd> !thread 8089d8c0
THREAD 8089d8c0  Cid 0000.0000  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 0
[...]

0: kd> !pcr 1
KPCR for Processor 1 at f7727000:
[...]

	      CurrentThread: f772a090
	         NextThread: 00000000
	         IdleThread: f772a090
	         
0: kd> !thread f772a090
THREAD f772a090  Cid 0000.0000  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 1
[...]

...and so on.

Now, the idle threads are usually not of major intest. Still, it is possible that these threads can become relevant — such as when doing certain kinds of analysis such as tracing interrupts. If such analysis groups events by the CID of the thread they were captured on (rather than the VA of the KTHREAD structure), the results for CID 0:0 will be wrong.

Not a big thing. Still, it took me a while to figure out that this was indeed the reason for some of my traces containing timestamps in strange order: The traces had been created on a quadcode machine and I did make the mistake to correlate the events by their CID during later analsis. As a result, the traces of the four idle threads were all mixed up…

Reaching beyond the top of the stack — illegal or just bad style?

The stack pointer, esp on i386, denotes the top of the stack. All memory below the stack pointer (i.e. higher addresses) is occupied by parameters, variables and return addresses; memory above the stack pointer must be assumed to contain garbage.

When programming in assembly, it is equally easy to use memory below and above the stack pointer. Reading from or writing to addresses beyond the top of the stack is unusual and under normal circumstances, there is little reason to do so. There are, however, situations — rare situations — where it may tempting to temporarily use memory beyond the top of the stack.

That said, the question is whether it is really just a convention and good style not to grab beyond the stack of the stack or whether there are actually reasons why doing so could lead to problems.

When trying to answer this question, one first has to make a distinction between user mode and kernel mode. In user mode Windows, I am unable to come up with a single reason of why usage of memory beyond the top of the stack could lead to problems. So in this case, it is probably merely bad style.

However, things are different in kernel mode.

In one particular routine I recently wrote, I encountered a situation where temporarily violating the rule of not reaching beyond the top of the stack came in handy. The routine worked fine for quite a while. In certain situations, however, it suddenly started to fail due to memory corruption. Interestingly enough, the routine did not fail always, but still rather frequently.

Having identified the specific routine as being the cuplrit, I started single stepping the code. Everything was fine until I reached the point where the memory above the stack pointer was used. The window span only a single instruction. Yet, as soon as I had stepped over the two instructions, the system crashed. I tried it multiple times, and it was prefectly reproduceable when being single-stepped.

So I took a look at the stack contents after every single step I took. To my surprise, as soon as I reached the critical window, the contents of the memory location just beyond the current stack pointer suddenly became messed. Very weird.

After having been scratching my head for a while, that suddenly started to made sense: I was not the only one using the stack — in between the two instructions, an interrupt must have occured and been dispatched. As my thread happened to be the one currently running, it was my stack that has been used for dispatching it. This also explains why it did not happened always unless I was single-stepping the respective code.

When an interrupt occurs and no privilege-level change has to be performed, the CPU will push the EFLAGS, CS and EIP registers on the stack. That is, the stack of whatever kernel thread happens to be the one currently running on this CPU is reused and the memory locations beyond the stack pointer will be overwritten by these three values. So what I initially interpreted as garbage, actually were the contents of EFLAGS, CS and EIP.

On Windows NT, unlike some other operating systems (FreeBSD, IIRC), handling the interrupt, which involves runing the interrupt service routine (ISR) occurs on the same stack as well. The following stack trace, taken elsewhere, shows an ISR being executed on the stack of the interrupted thread:

f6bdab4c f99bf153 i8042prt!I8xQueueCurrentMouseInput+0x67
f6bdab78 80884289 i8042prt!I8042MouseInterruptService+0xa58
f6bdab78 f6dd501a nt!KiInterruptDispatch+0x49
f6bdac44 f6dd435f driver!Quux+0x11a 
f6bdac58 f6dd61db driver!Foobar+0x6f 
...

Morale of the story: Using memory beyond the current stack pointer is not only bad practice, it is actually illegal when done in kernel mode.

Explicit Symbol Loading with dbghelp

When working with symbols, the default case is that you either analyze the current process, a concurrently running process or maybe even the kernel. dbghelp provides special support for these use cases and getting the right symbols to load is usually easy — having access to the process being analyzed, dbghelp can obtain the necessary module information by itself and will come up with the matching symbols.

Things are not quite as easy when analyzing symbols for a process (or kernel) that is not running any more or executes on a different machine. In this case, manual intervention is necessary — without your help, dbghelp will not be able to decide which PDB to load.

An example for a case where this applies is writing a binary tracefile. To keep the trace file small and to make writing entries as efficient as possible, it makes sense to write raw instruction pointer and function pointer addresses to the file. This way, resolving symbols and finding the names of the affected routines is delayed until the file is actually read.

However, when the trace file is read, the traced process (or kernel) may not be alive any more, so dbghelp’s automatisms cannot be used. Moreover, the version of the OS and the individual modules involved may not be the same on the machine where the trace is obtained and the machine where the trace file is read. It is thus crucial that the trace file contains all neccessary information required for the reading application to find and load the matching symbols.

Capturing the debug information

That sounds easy enough. But once again, the documentation on this topic is flaky enough to make this a non-trivial task. This thread on the OSR WinDBG list revealed some important insights, yet the summary is not entirely correct. So I will try to explain it here. Note that the discussion equally applies to user and kernel mode.

Two basic pieces of information need to be captured and stored in the trace file: The load address and the debug data of each module involved. The load address is required in order to later be able to convert from virtual addresses (VAs) to relative virtual addresses (RVAs). The debug data is neccessary for finding the PDB exactly matching the respective module.

The debug data is stored in the PE image and can be obtained via the debug directory. The following diagram illustrates the basic layout of the affected parts of a PE image:

Via the _IMAGE_DATA_DIRECTORY structure, which is at index IMAGE_DIRECTORY_ENTRY_DEBUG of the DataDirectory, we can obtain the RVA of the debug directory — as always, RVAs are relative to the load address. The debug directory is a sequence of IMAGE_DEBUG_DIRECTORY structures. Although the CodeView entry should be the one of primary interest, expect more than one entry in this sequence. The entry count can be calculated by dividing _IMAGE_DATA_DIRECTORY::Size by sizeof( IMAGE_DEBUG_DIRECTORY ).

The IMAGE_DEBUG_DIRECTORY structures do not contain all necessary information by themselves — rather, they refer to a blob of debug data located elsewhere in the image. IMAGE_DEBUG_DIRECTORY::AddressOfRawData contains the RVA (again, relative to the load address) to this blob. Note that the blobs are not necessarily laid out in the same order as the corresponding IMAGE_DEBUG_DIRECTORY structures are. (The role of the PointerToRawData member in this context remains unclear.)

Summing up, we need to write to the trace file:

  • The module load address
  • all IMAGE_DEBUG_DIRECTORY entries
  • all debug data blobs referred to

In order to maintain the relation between the IMAGE_DEBUG_DIRECTORY entries and their corresponding blobs, the AddressOfRawData and PointerToRawData members have to be updated before being written to the file. In order to make loading symbols a bit easier, it makes sense to save the entire sequence of IMAGE_DEBUG_DIRECTORY structs (with updated RVAs) contigously to the file, immediately followed by the debug data blobs.

Loading symbols

Having all neccessary information in the trace file, we can now load the symbols. The key is to use SymLoadModuleEx rather than SymLoadModule64 and passing it a MODLOAD_DATA struct:

  • MODLOAD_DATA::data receives a pointer to the sequence of IMAGE_DEBUG_DIRECTORY structs. For each struct, the AddressOfRawData member should be set to 0 and PointerToRawData should receive the RVA of the corresponding debug data blob relative to the beginning of the respective IMAGE_DEBUG_DIRECTORY struct.

    As indicated before, it therefore makes sense to write the IMAGE_DEBUG_DIRECTORY structs and the blobs consecutievly to the file and — at the time of writing — adjust the AddressOfRawData and PointerToRawData members. This way, you can just assign the entire memory chunk to MODLOAD_DATA::data.

  • MODLOAD_DATA::size receives the size of the entire memory chunk, including both the sequence of IMAGE_DEBUG_DIRECTORY structs and all blobs.

There you go, dbghelp should now be able to identify the proper PDB and load the symbols.

Fun with low level SEH

Most code that uses Structured Exception Handling does this with the help of the compiler, e.g. by using __try/__except/__finally. Still, it is possible to do everything by hand, i.e. to provide your own exception handlers and set up the exception registration records manually. However, as this entire topic is not documented very well, doing so opens room for all kind of surprises…

Although more than 10 years old, the best article on this topic still seems to be Matt Pirtrek’s A Crash Course on the Depths of Win32™ Structured Exception Handling, which I assume you have read. However, note that this article as well as this post refer to i386 only, albeit both to user and kernel mode.

Exception Registration Record Validation

On the i386, SEH uses a linked list of exception registration records. The first record is pointed to by the first member of the TIB. In user mode, the TIB is part of the TEB, in kernel mode it is part of the KPCR — in any case, it is at fs:[0]. Each record, besides containing a pointer to the next lower record, stores a pointer to an exception handler routine.

Installing an exception registration record is thus straightforward and merely requires adjusting the TIB pointer and having the new record point to next lower record. So I set up my custom exception registration record, registered it properly, verified that all pointers are correct and tried using it. However, I was unpleasently surprised that exeption handling totally failed as soon as my exception registration got involved. !exchain reported an “Invalid exception stack”, although checking the pointers manually again seemed to show that the chain of exception registration records was fine and my record seemed ok.

Digging a little deeper I found the reason for that — and in fact I cannot remember ever having heard or read about this requirement before: Windows requires all EXCEPTION_REGISTRATION_RECORDs to be located on the stack. Both RtlDispatchException and RtlUnwind check the location of each EXCEPTION_REGISTRATION_RECORD against the stack limits and abort exception handling as soon as a record is found to be not stack-located. Aborting exception handling in this case means that RaiseException/ExRaiseStatus will just return and execution will be resumed at the caller site as if nothing happened.

This requirement is fair enough, actually, but in my case it totally wrecked my design. I did not have the 8 spare bytes to store this record and thus therefore put the record on some dedicated place on the heap. Urgh. Anyway…

As an interesting side note, Windows Server 2003 performs this stack check against both limits — minimum and maximum address of the stack. Vista, however, only checks against the maximum address (i.e. bottom of the stack) and does not care whether the minimum address (i.e. top of stack) has been exceeded.

Moreover, there is another restriction on exception records that only applies to user mode: The handler routine pointed to by the exception record is verified to not point into the stack. This is obviously another security measure to avoid SEH records to point into some overflown buffers.

SafeSEH

It is worth pointing out that all these checks are unrelated to SafeSEH and are performed regardless of whether your module is SafeSEH compatible or not. Not before these checks have all passed, the exception handler has to undergo the SafeSEH validation: The image base is calculated, the table listing the trusted SEH handlers is looked up and it is checked whether the handler routine pointed to by the current exception record is located in this table.

SafeSEH Handler Registration

Using SafeSEH is a good thing and I link all my modules with /SafeSEH. So when you use low level SEH, i.e. without using the __try/__except compiler support, the obvious question is how to get your SEH handler to be recognized as a trusted handler and be included in the SafeSEH table. After all, the compiler will not be able to recognize that the routine you have just written will in fact be used as an exception handler. The C compiler does not seem to offer support for that — luckily however, ml does by providing the .SAFESEH directive.

If you like writing your exception handler in assmbler, this is all you need. If, however, you prefer C, this is somewhat unsatisfying. The documentation of .SAFESEH states that it can be used with an extrn proc, but that does not seem to work. My solution was thus to write the actual routine in C and write a little thunk in assembler, which I was then able to register using the .SAFESEH directive:

.586               
.model flat, stdcall
option casemap :none

extrn RealExceptionHandlerWrittenInC@16

...

ExceptionHandlerThunk proto
.SAFESEH ExceptionHandlerThunk

...

.code

ExceptionHandlerThunk proc
	jmp RealExceptionHandlerWrittenInC@16
ExceptionHandlerThunk endp

Stupid things you should not do

Finally, there is another little quirk that bit me: Do not use EXCEPTION_CONTINUE_SEARCH where ExceptionContinueSearch would have been appropriate. The EXCEPTION_* constants are for use by exception filters as used for __except statements, whereas the Exception* values have to be used for low level exception handlers. Should be obvious, right? :)

Having chosen the wrong group of constants, I returned EXCEPTION_CONTINUE_SEARCH from my exception handler to indicate that the handler is unable to handle certain exceptions. However, as it turns out, EXCEPTION_CONTINUE_SEARCH has the value 0 and is thus interpreted as ExceptionContinueExecution. Now, returning ExceptionContinueExecution when being requested to handle an exception raised by ExRaiseStatus is obviously a bad idea and in this case led to a STATUS_NONCONTINUABLE_EXCEPTION. After a few of those had stacked up (in kernel mode), VirtualPC crashed with an unrecoverable CPU error. Nice :)

The case of the mysterious JVM x64 crashes

To date, I did all my Java development uding 32 bit JVMs. After all, as long as you do not have extreme memory requirements, the 64 bit JVM should not buy you much. Today I installed the Java 6 Update 6 x64 JDK on my Vista x64 machine and tried to run some of my JUnit tests on this VM.

With little success:

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000772b219b, pid=4188, tid=2272
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode windows-amd64)
# Problematic frame:
# C  [ntdll.dll+0x5219b]
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

The project uses a native library via JNI, so of course I immediately suspected this to be the problem. So I placed a Java breakpoint on the respective System.loadLibrary call with the intent of attaching WinDBG as soon as this breakpoint is hit. In WinDBG, I could then break on the exception and see what the problem is.

But to my surprise, the Java breakpoint was not hit — the VM crashed immediately and I received the same output about an unexpected error having occured. That seemed strange to me — maybe it was not the fault of the JNI library after all? So I created a simple Hello World application and ran it — that worked. Then I created this innocent JUnit test:

public class JTest
{
  @org.junit.Test
  public void testname() throws Exception
  {    
  }
}

This one failed again, yielding the same error message as above. Well, at least that gave me evidence that not my JNI library but the JVM was the culprit of the crash — but still, the situation seemed weird. Running the test again under WinDBG, I could see that the AV occured during a heap free operation (As a side node, it is annoying that Sun does not supply symbols for its binaries):

00000000`0404ca88 00000000`7727e7e2 ntdll!RtlCaptureContext+0x8c
00000000`0404ca98 00000000`7727e72b ntdll!RtlpWalkFrameChain+0x52
00000000`0404d018 00000000`773352f2 ntdll!RtlCaptureStackBackTrace+0x4b
00000000`0404d048 00000000`772e1d35 ntdll!RtlpStackTraceDatabaseLogPrefix+0x42
00000000`0404d178 00000000`7715d9fa ntdll! ?? ::FNODOBFM::`string'+0xa93f
00000000`0404d1f8 000007fe`fef0175c kernel32!HeapFree+0xa
*** WARNING: Unable to verify checksum for C:\Program Files\Java\jdk1.6.0_06\jre\bin\server\jvm.dll
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Program Files\Java\jdk1.6.0_06\jre\bin\server\jvm.dll - 
00000000`0404d228 00000000`08101c09 msvcrt!free+0x1c
00000000`0404d258 00000000`081026cc jvm!JVM_EnqueueOperation+0x8c139
00000000`0404d288 00000000`040b4937 jvm!JVM_EnqueueOperation+0x8cbfc
00000000`0404d318 00000000`0404d338 0x40b4937

Well, could be a heap corruption — but it is interesting that crash did not occur during block coalescence or similar operations but during stack trace capturing. As a matter of fact, I always run my machine with user mode stack trace database creation enabled for debugging purposes. So I disabled the stack trace database in gflags, rebooted the machine and — voilà, the crash disappeared!

Wow. I think this is worth being filed as a bug.

Trace and Watch Data — How does it work

One of the builtin WinDBG commands is wt (Trace and Watch Data), which can be used to trace the execution flow of a function. Given source code like the following:

void foo()
{
}

void bar()
{
}

int main()
{
  // Some random code...
  int a = 1, b = 2;
  
  // Call a child function.
  foo();
  
  // More useless code...
  a+=b;
  if ( a == b) a = b;
  
  // Call another child function.
  bar();  
  
  return 0;
}

wt will produce the following output:

0:000> wt
Tracing test!main to return address 00401291
    6     0 [  0] test!main
    1     0 [  1]   test!ILT+5(_foo)
    4     0 [  1]   test!foo
   13     5 [  0] test!main
    1     0 [  1]   test!ILT+0(_bar)
    4     0 [  1]   test!bar
   17    10 [  0] test!main

27 instructions were executed in 26 events 
                                  (0 from other threads)

Function Name         Invocations MinInst MaxInst AvgInst
test!ILT+0(_bar)                1       1       1       1
test!ILT+5(_foo)                1       1       1       1
test!bar                        1       4       4       4
test!foo                        1       4       4       4
test!main                       1      17      17      17

0 system calls were executed

Although helpful, tracing a larger function calling a multitude of other functions slows down the debuggee significantly. An interesting question is thus how wt is implemented. Three possible implementation strategies come to mind:

  1. Use single-stepping. After each instruction executed, a debug trap is raised and the debugger is delivered a single-step debugging event. Though all non-branching instructions are probably irrelevant to wt, by intercepting each call and ret instruction, the debugger is able to trace function entry and exit.
  2. Explicitly set breakpoints. The debugger disassembles the function to be traced and places an ordinary breakpoint on each call instruction as well on as the return address of the function. Whenever one of the call-breakpoints fires, the debugger instruments the target function in the same way (i.e. place breakpoints on each call instruction as well as the return address) and continues execution (without single-stepping). By intercepting all function calls and returns, the debugger is able to deduce the call tree. This approach would be similar to UMSS.
  3. Use Last Branch Recording. This is a rather new additon to the IA-32 instruction set that allows setting breakpoints on taken branches, interrupts, and exceptions, and to single-step from one branch to the next.

In order to find out, we have to debug the debugger to observe how it debugs the target. We thus start WinDBG, choose our test application as target and let it break on main. We then start another WinDBG instance and attach it to the first WinDBG instance. In order to find out which debugging events are consumed by the first instance, we use the second debugger to trace function calls made by the first debugger.

All usermode debuggers eventually end up calling ntdll!NtWaitForDebugEvent in a loop — so to find out which debugging events are consumed, all we need to do is trace all calls to this function. While being an undocumented native function, there is an excellent summary on the inner workings of user mode debugging which also covers ntdll!NtWaitForDebugEvent. Given this information, all we need to do to check whether strategy #1 or strategy #2 has been implemented (I assume #3 may safely be neglected) is to put together a little breakpoint command like the following (line breaks added for clarity):

bp ntdll!NtWaitForDebugEvent "
   r @$t1=poi(esp+10); 
   g @$ra; 
   .if (poi(@$t1)==8) {.echo \"SingleStep\n\" } 
   .else {.printf \"Excp %p\\n\", poi(@$t1+c)};
   g "

When entering ntdll!NtWaitForDebugEvent, we store the address of the fourth parameter (which receives a PDBGUI_WAIT_STATE_CHANGE structure) in $t1 and step out of the function. Then we reach into the structure whose address is stored in $t1 and check if the first field marks the event of being of type DbgSingleStepStateChange (0×8) and output an appropriate message. If we receive about 30 single-step events, strategy #1 has probably been chosen. For #2 we would expect to receive 5 breakpoint events.

Back to the first debugger, we now opt to trace the main function by running wt. This yields the output shown above. Switching to the second debugger again, we now see the following output:

SingleStep
SingleStep
SingleStep

[...about 20 more...]

SingleStep
SingleStep
SingleStep
SingleStep
SingleStep

Quite obviously, wt implements strategy #1 — it does single stepping. Although this does not really come as a surprise, it is still unfortunate as it is most likely the slowest approach of tracing calls. And as anybody who has ever used wt can probably confirm, wt is really slow.

As an interesting side note, as of Linux kernel 2.6.25, ptrace on x86 has been enhanced to facilitate Last Branch Recording on CPUs that support it.

Walking the stack of the current thread

StackWalk64 provides a convenient and platform-independent way to walk the stack of a thread. Although useful and powerful indeed, it is definitively one of the less trivial to use functions. The additional fact that the MSDN documentation for StackWalk64 is rather light does not make things easier. There is, however, a decent article on CodeProject covering the usage of StackWalk64.

Although you can walk the stack of any thread using StackWalk64, this post only deals with walking the current thread’s stack. In order to walk the stack of the current thread, you first have to obtain the CONTEXT of the current thread. The naive way to obtain this context would be to call GetThreadContext( GetCurrentThread() ) — however, as the documentation for GetThreadContext clearly states, the result of this function is undefined if used on the current thread.

Obtaining a CONTEXT

64 Bit windows provides the function RtlCaptureContext for this purpose, so obtaining the context is trivial. 32 Bit Windows, however, does not implement this function, yet it is not hard to come up with a surregate.

One way to obtain the context is to set up a SEH frame, deliberately raise an exception and obtain the context in the exception filter using GetExceptionInformation. Although that works, it is probably the slowest way to do it. An easier and quicker solution is to use inline assembly (we only have to support 32 bit as we have RtlCaptureContext for x64/IA64, so using assembly is ok). Using appropriate instructions, we can manually populate the x86 CONTEXT structure. However, as eip cannot be accessed using a mov instruction, we have to use a little trick to obtain the value of the instruction pointer — use an (otherwise useless) label:

CONTEXT Context;
ZeroMemory( &Context, sizeof( CONTEXT ) );
Context.ContextFlags = CONTEXT_CONTROL;

__asm
{
Label:
  mov [Context.Ebp], ebp;
  mov [Context.Esp], esp;
  mov eax, [Label];
  mov [Context.Eip], eax;
}

Preparing to call StackWalk64

Having obtained the context, we have to initialize the STACKFRAME64 structure for the first call. This is trivial, but platform-specific.

In order to enable StackWalk64 to use symbol information, dbghelp has to be initialized by calling SymInitialize, passing the current process handle as parameter. Unless you plan to manually call SymLoadModule64 for all affected modules, it is crucial to pass TRUE for the fInvadeProcess argument. If you pass FALSE or forget to call SymInitialize altogether (After all, the documentation of StackWalk64 does not state that calling SymInitialize is required), you will notice that StackWalk64 works on x86 builds, but fails (with last error set to 0x1E7) on x64. Also, as dbghelp is not threadsafe, it is usually a good idea to protect any calls to dbghelp by a critical section

Calling StackWalk64

After these provisions have been made, calling StackWalk64 is a snap. The only unfortnate thing is that you cannot really distinguish between StackWalk64 returning FALSE because of a failure or because of the bottom of the stack having been reached as StackWalk64 does not reliably set the last Win32 error.

The following code puts it all together. Note that only a the minimal amount of stack trace information is gathered, i.e. only the return addresses are saved in the array.

typedef struct _STACKTRACE
{
  //
  // Number of frames in Frames array.
  //
  UINT FrameCount;

  //
  // PC-Addresses of frames. Index 0 contains the topmost frame.
  //
  ULONGLONG Frames[ ANYSIZE_ARRAY ];
} STACKTRACE, *PSTACKTRACE;

/*++
  Routine Description:
    Capture the stack trace based on the given CONTEXT.

  Parameters:
    Context    Thread context. If NULL, the current context is
               used.
    Event      Event structure to populate.
    MaxFrames  Max number of frames the structure can hold.
--*/
#ifdef _M_IX86
  //
  // Disable global optimization and ignore /GS waning caused by 
  // inline assembly.
  //
  #pragma optimize( "g", off )
  #pragma warning( push )
  #pragma warning( disable : 4748 )
#endif
HRESULT CaptureStackTrace(
  __in_opt CONST PCONTEXT InitialContext,
  __in PSTACKTRACE StackTrace,
  __in UINT MaxFrames 
  )
{
  DWORD MachineType;
  CONTEXT Context;
  HRESULT Hr;
  STACKFRAME64 StackFrame;

  ASSERT( StackTrace );
  ASSERT( MaxFrames > 0 );

  if ( InitialContext == NULL )
  {
    //
    // Use current context.
    //
    // N.B. GetThreadContext cannot be used on the current thread.
    // Capture own context - on i386, there is no API for that.
    //
#ifdef _M_IX86
    ZeroMemory( &Context, sizeof( CONTEXT ) );

    Context.ContextFlags = CONTEXT_CONTROL;
    
    //
    // Those three registers are enough.
    //
    __asm
    {
    Label:
      mov [Context.Ebp], ebp;
      mov [Context.Esp], esp;
      mov eax, [Label];
      mov [Context.Eip], eax;
    }
#else
    RtlCaptureContext( &Context );
#endif  
  }
  else
  {
    CopyMemory( &Context, InitialContext, sizeof( CONTEXT ) ); 
  }

  //
  // Set up stack frame.
  //
  ZeroMemory( &StackFrame, sizeof( STACKFRAME64 ) );
#ifdef _M_IX86
  MachineType                 = IMAGE_FILE_MACHINE_I386;
  StackFrame.AddrPC.Offset    = Context.Eip;
  StackFrame.AddrPC.Mode      = AddrModeFlat;
  StackFrame.AddrFrame.Offset = Context.Ebp;
  StackFrame.AddrFrame.Mode   = AddrModeFlat;
  StackFrame.AddrStack.Offset = Context.Esp;
  StackFrame.AddrStack.Mode   = AddrModeFlat;
#elif _M_X64
  MachineType                 = IMAGE_FILE_MACHINE_AMD64;
  StackFrame.AddrPC.Offset    = Context.Rip;
  StackFrame.AddrPC.Mode      = AddrModeFlat;
  StackFrame.AddrFrame.Offset = Context.Rsp;
  StackFrame.AddrFrame.Mode   = AddrModeFlat;
  StackFrame.AddrStack.Offset = Context.Rsp;
  StackFrame.AddrStack.Mode   = AddrModeFlat;
#elif _M_IA64
  MachineType                 = IMAGE_FILE_MACHINE_IA64;
  StackFrame.AddrPC.Offset    = Context.StIIP;
  StackFrame.AddrPC.Mode      = AddrModeFlat;
  StackFrame.AddrFrame.Offset = Context.IntSp;
  StackFrame.AddrFrame.Mode   = AddrModeFlat;
  StackFrame.AddrBStore.Offset= Context.RsBSP;
  StackFrame.AddrBStore.Mode  = AddrModeFlat;
  StackFrame.AddrStack.Offset = Context.IntSp;
  StackFrame.AddrStack.Mode   = AddrModeFlat;
#else
  #error "Unsupported platform"
#endif

  StackTrace->FrameCount = 0;

  //
  // Dbghelp is is singlethreaded, so acquire a lock.
  //
  // Note that the code assumes that 
  // SymInitialize( GetCurrentProcess(), NULL, TRUE ) has 
  // already been called.
  //
  EnterCriticalSection( &DbgHelpLock );

  while ( StackTrace->FrameCount < MaxFrames )
  {
    if ( ! StackWalk64(
      MachineType,
      GetCurrentProcess(),
      GetCurrentThread(),
      &StackFrame,
      MachineType == IMAGE_FILE_MACHINE_I386 
        ? NULL
        : &Context,
      NULL,
      SymFunctionTableAccess64,
      SymGetModuleBase64,
      NULL ) )
    {
      //
      // Maybe it failed, maybe we have finished walking the stack.
      //
      break;
    }

    if ( StackFrame.AddrPC.Offset != 0 )
    {
      //
      // Valid frame.
      //
      StackTrace->Frames[ StackTrace->FrameCount++ ] = 
        StackFrame.AddrPC.Offset;
    }
    else
    {
      //
      // Base reached.
      //
      break;
    }
  }
  
  LeaveCriticalSection( &DbgHelpLock );

  return S_OK;
}
#ifdef _M_IX86
  #pragma warning( pop )
  #pragma optimize( "g", on )
#endif

When HEAP_NO_SERIALIZE heap operations are not quite ‘not serialized’

While the RTL Heap performs all heap operation in an interlocked manner by default, it can be requested not to serialize operations by passing the HEAP_NO_SERIALIZE flag to HeapCreate or HeapAlloc. In this case, the caller is in charge to provide proper synchronization.

As it turns out, things change when Application Verifier heap checks are enabled for this process. To check heap operations, Application Verifier intercepts certain operations. Presumably to protect its own internal data structures, at some points, Avrf enters a critical section, as shown in this stack trace:

0:009> k
ChildEBP RetAddr  
0840f540 77d9cfad ntdll!ZwWaitForSingleObject+0x15
0840f5a4 77d9cf78 ntdll!RtlpWaitOnCriticalSection+0x154
0840f5cc 6db25f92 ntdll!RtlEnterCriticalSection+0x152
0840f5f0 6db29d99 vfbasics!AVrfpFreeMemLockChecks+0x42 
0840f60c 6db2bae8 vfbasics!AVrfpFreeMemNotify+0x39 
0840f668 77731d27 vfbasics!AVrfpRtlFreeHeap+0x148 
0840f67c 6d242757 kernel32!HeapFree+0x14
...

Entering a critical section is no big thing — nontheless, it effectvely means that heap operations, whether they are conducted with HEAP_NO_SERIALIZE flag set or not, must be considered to be at least partially serialized by Application Verifier.

Although this somewhat contradicts the idea of the HEAP_NO_SERIALIZE flag, this should not be a problem in most cases — especially when HEAP_NO_SERIALIZE has been passed for performance reasons only. However, if HEAP_NO_SERIALIZE has been used for correctness reasons, this might indeed bite you. In my specific case, I used HEAP_NO_SERIALIZE and did the synchronization myself in order to break a deadlock situation. And while the application ran fine (and deadlock-free) without Avrf enabled, it began deadlocking randomly as soon as I enabled Avrf heap checks — for the very reason just described. This is especially annoying as I am now left with two less-than-optimal choices: Leave Avrf enabled for my test suite and experience random deadlocks or forgo Avrf’s helpful heap checks in order not to deadlock. Grrr, I need more choices!

« Previous PageNext Page »


Categories

Try Visual Assert, the unit testing add-in for Visual Studio (R)


NTrace: Function Boundary Tracing for Windows on IA-32

About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Besides his consulting work, Johannes mainly focusses on Win32, COM, and NT kernel mode development, along with Java and .Net. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

Johannes' GPG fingerprint is BBB1 1769 B82D CD07 D90A 57E8 9FE1 D441 F7A0 1BB1.

LinkedIn LinkedIn Profile
Xing Xing Profile
Twitter Follow me on Twitter (new)

Follow

Get every new post delivered to your Inbox.